To use the Mastodon web application, please enable JavaScript. Alternatively, try one of the native apps for Mastodon for your platform.
10 interesting stories served every morning and every evening.
10 interesting stories served every morning and every evening.
To use the Mastodon web application, please enable JavaScript. Alternatively, try one of the native apps for Mastodon for your platform.
One of the current trends in modern software is for developers to slap an API call to OpenAI or Anthropic for features within their app. Reasonable people can quibble with whether those features are actually bringing value to users, but what I want to discuss is the fundamental concept of taking on a dependency to a cloud hosted AI model for applications.
This laziness is creating a generation of software that is fragile, invades your privacy, and fundamentally broken. We are building applications that stop working the moment the server crashes or a credit card expires.
We need to return to a habit of building software where our local devices do the work. The silicon in our pocket is mind bogglingly faster than what was available a decade ago. It has a dedicated Neural Engine sitting there, mostly idle, while we wait for a JSON response from a server farm in Virginia. That’s ridiculous.
Even if your intentions are pure, the moment you stream user content to a third party AI provider, you’ve changed the nature of your product. You now have data retention questions and all the baggage that comes with that (consent, audit, breach, government request, training, etc.)
On top of that you also substantially complicated your stack because your feature now depends on network conditions, external vendor uptime, rate limits, account billing, and your own backend health.
Congratulations! You took a UX feature and turned it into a distributed system that costs you money.
If the feature can be done locally, opting into this mess is self inflicted damage.
“AI everywhere” is not the goal. Useful software is the goal.
Concrete Example: Brutalist Report’s On-Device Summaries
Years ago I launched a fun side project named The Brutalist Report, a news aggregator service inspired by the 1990s style web.
Recently, I decided to build a native iOS client for it with the design goal of ensuring it would remain a high-density news reading experience. Headlines in a stark list, a reader mode that strips the cancer that has overtaken the web, and (optionally) an “intelligence” view that generates a summary of the article.
Here’s the key point though: the summary is generated on-device using Apple’s local model APIs. No server detours. No prompt or user logs. No vendor account. No “we store your content for 30 days” footnotes needed.
It has become so normal for folks that any AI use is happening server-side. We have a lot of work to do to turn this around as an industry.
It’s not lost on me that sometimes the use-cases you have will demand the intelligence that only a cloud hosted model can provide, but that’s not the case with every use-case you’re trying to solve. We need to be thoughtful here.
Available Tooling
I can only speak on the tooling available within the Apple ecosystem since that’s what I focused initial development efforts on. In the last year, Apple has invested heavily here to allow developers to make use of a built-in local AI model easily.
The core flow looks roughly like this:
import FoundationModels
let model = SystemLanguageModel.default guard model.availability == .available else { return }
let session = LanguageModelSession { “”″ Provide a brutalist, information-dense summary in Markdown format. - Use **bold** for key concepts. - Use bullet points for facts. - No fluff. Just facts. “”″ }
let response = try await session.respond(options: .init(maximumResponseTokens: 1_000)) { articleText }
let markdown = response.content
And for longer content, we can chunk the plain text (around 10k characters per chunk), produce concise “facts only” notes per chunk, then runs a second pass to combine them into a final summary.
This is the kind of work local models are perfect for. The input data is already on the device (because the user is reading it). The output is lightweight. It’s fast and private. It’s okay if it’s not a superhuman PhD level intelligence because it’s summarizing the page you just loaded, not inventing world knowledge.
Local AI shines when the model’s job is transforming user-owned data, not acting as a search engine for the universe.
There are plenty of AI features that people want but don’t trust. Summarizing emails, extract action items from notes, categorize this document, etc.
The usual cloud approach turns every one of those into a trust exercise. “Please send your data to our servers. We promise to be cool about it.”
Local AI changes that. Your device already has the data. We’ll do the work right here.
You don’t build trust with your users by writing a 2,000 word privacy policy. You build trust by not needing one to begin with.
The tooling available on the platform goes even further.
One of the best moves Apple has made recently is pushing “AI output” away from unstructured blobs of text and toward typed data.
Instead of “ask the model for JSON and pray”, the newer and better pattern is to define a Swift struct that represents the thing you want. Give the model guidance for each field in natural language. Ask the model to generate an instance of that type.
That’s it.
Conceptually, it looks like this:
import FoundationModels
@Generable struct ArticleIntel { @Guide(description: “One sentence. No hype.“) var tldr: String @Guide(description: “3 – 7 bullets. Facts only.“) var bullets: [String] @Guide(description: “Comma-separated keywords.“) var keywords: [String] }
let session = LanguageModelSession() let response = try await session.respond( to: “Extract structured notes from the article.”, generating: ArticleIntel.self ) { articleText }
let intel = response.content
Now your UI doesn’t have to scrape bullet points out of Markdown or hope the model remembered your JSON schema. You get a real type with real fields, and you can render it consistently. It produces structured output your app can actually use. And it’s all running locally!
This isn’t just nicer ergonomics. It’s an engineering improvement.
And if you’re building a local first app, this is the difference between “AI as novelty” and “AI as a trustworthy subsystem”.
“But Local Models Aren’t As Smart”
Correct.
But also so what?
Most app features don’t need a model that can write Shakespeare, explain quantum mechanics, and pass the bar exam. They need a model that can do one of these reliably: summarize, classify, extract, rewrite, or normalize.
And for those tasks, local models can be truly excellent.
If you try to use a local model as a replacement for the entire internet, you will be disappointed. If you use it as a “data transformer” sitting inside your app, you’ll wonder why you ever sent this stuff to a server.
Use cloud models only when they’re genuinely necessary. Keep the user’s data where it belongs. And when you do use AI, don’t just glue it as a chat box. Use it as a real subsystem with typed outputs and predictable behavior.
Stop shipping distributed systems when you meant to ship a feature.
I was one of the very first advocates for AWS back when it was brand new - SQS, S3, EC2 SimpleDB - it was a lot smaller back then. In fact I organised the very first AWS event in Melbourne when the AWS rep came from the US to evangelise.
Cloud computing was an absolutely mind blowing revolution - suddenly your startup could run its own computer systems in minutes without need to install and run your own systems in a data center. This was an absolute game changer, and I really drank the AWS Kool Aid down to every last drop then I licked out the cup. I was all in on AWS in a big way.
I remained a massive AWS fanboi for 15 years or so - a real true believer - I was all in on AWS.
Relationships break down a little at a time - one or two things start to bother you, you still love it over all but sure, there’s some minor down sides here and there. But hey! It’s still absolutely awesome and you still love it, right? But you notice more and more things that ain’t right, that you don’t like, that’s broken or bad. Until one day there’s a final thing that you notice and the scales have tipped and you suddenly realise all at once - “I don’t love this relationship anymore”.
Here’s some of the things that chipped away over time:
It really annoyed me that for the first 6 years of its existence AWS did not build its own client libraries, instead leaving the job to “our wonderful community” to implement client libs for languages like Python, happy to let programmers burn their weekends and nights writing software for free for the benefit of AWS.
It super annoyed me that AWS did not move from Python2 to Python 3 for a ridiculous period of time.
DynamoDB - there’s not much software that I hate - but oh boy - DynamoDB what a hot pile of garbage. I tried it and ended up with a $75USD bill by the end of the day. And its not just the cost, it’s just the worst system I can imagine in every possible way.
20 cents a gigabyte egress - oh boy - holy schmoly do you have any idea how expensive this is? And it’s gone down over time to the still ridiculously expensive 9 cents per gigabyte. This is fucking insanely expensive. If you use AWS and 9 cents per gigabyte egress is not front of mind then look in the mirror to find the stooge - it’s you.
Crazy sneaky complex billing in which they hit you for data movement within their own systems, double billing you and sometimes triple billing you. Billing footguns and traps are everywhere - you must be a deep expert to avoid them.
IAM - the hideously complex auth and access rules system - this was invented by Lucifer sitting on his burning throne in the ninth level of Hell as the worst possible torment for those who have been sent below for using AWS.
Complexity in everything - once I noticed the complexity of IAM I could not unsee the complexity everywhere in AWS. The weirdest thing is that AWS true believers say “you MUST use AWS because its too complex to run your own computer systems, Linux, hardware networking security etc”. These true believers have blinded themselves to the unbelievable, massive complexity of pretty much *everything* in AWS. AWS is ridiculously complex and you need to employ a team of expensive experts to run the show.
AWS Lambda - yeah I really bought the sell on this - “its scalable!!!!”, and I ignored the slow startup times, the MASSIVE development complexity. There’s simply no genuine benefit to AWS Lambda compared to running your own web servers, and there’s many many down sides. When eventually I moved out of AWS the hardest thing to undo was the AWS Lambda stuff. The vendor lock-in is real. If you’re using AWS Lambda then you have to work to keep convincing yourself this is better than your own web servers. Keep convincing yourself that using AWS Lambda is not a horrible mistake.
AWS stomped on open source projects - despite the clear desire of projects like Elasticsearch, Redis, and MongoDB not to be cloned and monetized, AWS pushed ahead with OpenSearch, Valkey, and DocumentDB anyway, capturing the hosted-service money after those communities and companies had built the markets; the result was a wave of defensive licenses like SSPL, Elastic License, RSAL, and other source-available models designed less to stop ordinary users than to stop AWS from stripping open-source infrastructure for parts, owning the customer relationship. AWS is a predator.
And there’s many, many more reasons to hate AWS, but I don’t even like thinking about AWS so I’m not going to sour my morning by thinking of more reasons to hate AWS.
Relations break down slowly, until a sudden realisation that its over - that’s exactly what happened to my love for AWS. One day my switch flicked and I went from fanboi to hater pretty much instantly. I moved everything out of AWS and shut down all my accounts except one. I left a little bit of stuff on AWS because its genuinely the right solution for me - I left my domains on Route53, left a few backups in S3 and continued to use AWS Workmail (which they have just notified me is now shutting down in 12 months).
And recently I went back to AWS. WHAT?!?!? WHY? You might ask. To get some research done. Do a few tests, get in and out.
I wanted to see how well Claude/Anthropic works on AWS Bedrock (it works the same for Claude Code but it’s slower, and is WAY, WAY more expensive than having an Anthropic subscription).
I wanted to benchmark some of my code on a mega fast machine - the fastest machine I have at home is a 20 core machine with 32GB RAM and I wanted to see how fast my code would run on a machine with 192 cores and 1TB of RAM.
So I logged in to my AWS account and did the AWS Bedrock tests about a month ago - no problems there. Finished the tests shut it all down - I’m not going back to Claude on AWS Bedrock - great for privacy if you need it but hoo boy, the cost.
More recently I logged in and fired up an EC2 spot instance of a 192 core machine and had been testing for 3 hours or so when I got an email from AWS: “Suspected security breach of your account”.
Somewhere in the depths of AWS some sort of security alarm had been triggered probably by the fact that my mostly dormant account suddenly started doing stuff with an expensive computer. And I understand why they do that - and its a good thing - AWS wants to protect its users. I applaud that.
BUT they suspended/restricted my account.
Now my AWS WorkMail - my main business account - does not work - no-one can send emails any more.
I cannot create any sort of AWS resource, I cannot do the testing I was trying to get done.
I replied to their support notification asking why they had suspended my account and telling them it has not been hacked there is no problem and no billing anomalies. No response.
Of course I do not pay for premium support, so I have to wait the 24 hours that they said it would take them to reply. It’s 3 days and AWS support has not replied.
So I posted on the AWS forums begging for someone to respond - someone said “focus on doing what they instructed in the email and then use the chat facility instead of web because they actually answer the chat”. Fine. I did that - I did everything they asked - changed passwords, killed access tokens, checked bills etc then had an extended chat with an AWS rep after waiting half an hour for the chat to be picked up. And they guy at the end of it seemed satisfied and said he would ask the internal people who handle such things to handle it. That was 24 hours ago. I followed them up after 8 hours asking when my account would be unsuspended, they said “be patient”.
So here I am four days after they suspended my account. I still want to do the testing on a big machine. I am dreading having to “request quota” to be allowed to do that. My business email system still does not work.
I am reminded why I left AWS and how I need to finish the job, get off AWS Workmail, move my domains from Route53 and never return.
I’m extremely glad I moved off AWS all those years ago and its sad that a return visit should bring down the email system that I left on AWS, foolishly trusting. Fool me once and all that.
Maybe one day they will get around to unsuspending my account.
Louis Rossmann has officially pledged $10,000 to cover the initial legal fees for an independent software developer threatened with a cease and desist letter from Bambu Lab. He posted a video on Saturday to mobilize the Right to Repair community to back the developer and crowd-fund his legal defense. Rossmann is quite unhappy with Bambu Labs, giving the company the middle finger several times in the video and then ending it with, “And if you’re watching this, Bambu Labs, go f*** yourself. Pick on somebody your own size.”
The developer in question, Pawel Jarczak, voluntarily shuttered his “OrcaSlicer-BambuLab” project, which would have restored direct control between Bambu Lab 3D printers and OrcaSlicer. Last year, Bambu Lab deemed these types of third-party integrations a risk to its infrastructure, saying its cloud servers were inundated with roughly 30 million “unauthorized” requests per day. OrcaSlicer was singled out as the main source of the rogue traffic.
“If Bambu Labs goes after you for keeping up your code, I am so confident in your case that I will pay the first $10,000,” Rossmann said in his video. “Before Pawel makes a decision, I want him to see the overwhelming support that he has from the members of the community if he goes through with this decision.” You can see the video directly below.
I’ll put up $10,000 to teach bambu labs a lesson - YouTube
Rossmann is asking Jarczak to put his fork of OrcaSlice back on GitHub in defiance of Bambu Lab’s threats. “How many of you, if push came to shove, if Bambu Labs was (bleeping) stupid enough to actually take the garbage case they have to court, how many of you would be willing to put up a dollar, $2, or $5 to defend Pawel? I’m willing to say $10,000,” Rossmann said.
Rossmann’s video contained a link to the Consumer Rights Wiki to explain the issue at hand to his audience, who may not be familiar with 3D printing but are avid defenders of Right to Repair. Right to Repair is a global consumer rights movement built on the principle that if you bought it, you own it. And if you own a thing, like a Bambu Lab 3D printer, you should have the freedom to fix, modify, or maintain the product as you see fit. Manufacturers shouldn’t be allowed to gatekeep the ability to fix a product, and they should provide manuals, schematics, and diagnostic software to allow end users to fix their own machines.
Bambu Lab printers are difficult to mod and/or repair yourself, with parts that are often glued in place. The original Bambu Lab X1 Carbon was notorious for its non-replaceable carbon rods that could wear out, and a hotend nozzle that needed a screwdriver and a tube of thermal paste to swap out if you wanted to avoid buying a $35 hotend just to change the nozzle size. These difficult parts were notably replaced with more user-friendly parts with the introduction of the H2D and subsequently, the X2D.
Rossman has not started a crowdfunding site yet, stating in the comments that he wants to prove to Jarczak that he has supporters willing to put their money where their mouth is. The video had over 54,000 views so far, with commenters vowing to back the case as requested.
Get Tom’s Hardware’s best news and in-depth reviews, straight to your inbox.
@sonicsam41 commented: “I’d throw in $20. Don’t even have a 3D printer, just hate bullies.”
@abirvandergriff8584 commented: “I’m in for $100 - I have an X1 from before they revealed how evil they are.”
Given the immediate reaction from his fans, it appears that Rossmann may, in fact, drum up plenty of support.
Follow Tom’s Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.
Denise Bertacchi is a Contributing Writer for Tom’s Hardware US, covering 3D printing. Denise has been crafting with PCs since she discovered Print Shop had clip art on her Apple IIe. She loves reviewing 3D printers because she can mix all her passions: printing, photography, and writing.
Report filed: 03:47 UTC Status: Resolved (accidentally) Severity: Critical → Catastrophic → Somehow Fine Duration: 73 hours Affected systems: Yes
Executive Summary: A security incident occurred. It has been resolved. We take security seriously. Please see previous 14 incident reports for details on how seriously.
Summary
A compromised dependency in the JavaScript ecosystem led to credential theft, which enabled a supply chain attack on a Rust compression library, which was vendored into a Python build tool, which shipped malware to approximately 4 million developers before being inadvertently patched by an unrelated cryptocurrency mining worm.
Timeline
Day 1, 03:14 UTC — Marcus Chen, maintainer of left-justify (847 million weekly downloads), reports on Twitter that his transit pass, an old laptop, and “something Kubernetes threw up that looked important” were stolen from his apartment. He does not immediately connect this to package security.
Day 1, 09:22 UTC — Chen attempts to log into the nmp registry. His hardware 2FA key is missing. He googles where to buy a replacement YubiKey. The AI Overview at the top of the results links to “yubikey-official-store.net,” a phishing site registered six hours earlier.
Day 1, 09:31 UTC — Chen enters his nmp credentials on the phishing site. The site thanks him for his purchase and promises delivery in 3 – 5 business days.
Day 1, 11:00 UTC — [email protected] is published. The changelog reads “performance improvements.” The package now includes a postinstall script that exfiltrates .npmrc, .pypirc, ~/.cargo/credentials, and ~/.gem/credentials to a server in a country the attacker mistakenly believed had no extradition treaty with anyone.
Day 1, 13:15 UTC — A support ticket titled “why is your SDK exfiltrating my .npmrc” is opened against left-justify. It is marked as “low priority - user environment issue” and auto-closed after 14 days of inactivity.
Day 1, 14:47 UTC — Among the exfiltrated credentials: the maintainer of vulpine-lz4, a Rust library for “blazingly fast Firefox-themed LZ4 decompression.” The library’s logo is a cartoon fox with sunglasses. It has 12 stars on GitHub but is a transitive dependency of cargo itself.
Day 1, 22:00 UTC — vulpine-lz4 version 0.4.1 is published. The commit message is “fix: resolve edge case in streaming decompression.” The actual change adds a build.rs script that downloads and executes a shell script if the hostname contains “build” or “ci” or “action” or “jenkins” or “travis” or, inexplicably, “karen.”
Day 2, 08:15 UTC — Security researcher Karen Oyelaran notices the malicious commit after her personal laptop triggers the payload. She opens an issue titled “your build script downloads and runs a shell script from the internet?” The issue goes unanswered. The legitimate maintainer has won €2.3 million in the EuroMillions and is researching goat farming in Portugal.
Day 2, 10:00 UTC — The VP of Engineering at a Fortune 500 snekpack customer learns of the incident from a LinkedIn post titled “Is YOUR Company Affected by left-justify?” He is on a beach in Maui and would like to know why he wasn’t looped in sooner. He was looped in sooner.
Day 2, 10:47 UTC — The #incident-response Slack channel briefly pivots to a 45-message thread about whether “compromised” should be spelled with a ‘z’ in American English. Someone suggests taking this offline.
Day 2, 12:33 UTC — The shell script now targets a specific victim: the CI pipeline for snekpack, a Python build tool used by 60% of PyPI packages with the word “data” in their name. snekpack vendors vulpine-lz4 because “Rust is memory safe.”
Day 2, 18:00 UTC — snekpack version 3.7.0 is released. The malware is now being installed on developer machines worldwide. It adds an SSH key to ~/.ssh/authorized_keys, installs a reverse shell that only activates on Tuesdays, and changes the user’s default shell to fish (this last behavior is believed to be a bug).
Day 2, 19:45 UTC — A second, unrelated security researcher publishes a blog post titled “I found a supply chain attack and reported it to all the wrong people.” The post is 14,000 words and includes the phrase “in this economy?” seven times.
Day 3, 01:17 UTC — A junior developer in Auckland notices the malicious code while debugging an unrelated issue. She opens a PR to revert the vendored vulpine-lz4 in snekpack. The PR requires two approvals. Both approvers are asleep.
Day 3, 02:00 UTC — The maintainer of left-justify receives his YubiKey from yubikey-official-store.net. It is a $4 USB drive containing a README that says “lol.”
Day 3, 06:12 UTC — An unrelated cryptocurrency mining worm called cryptobro-9000 begins spreading through a vulnerability in jsonify-extreme, a package that “makes JSON even more JSON, now with nested comment support.” The worm’s payload is unremarkable, but its propagation mechanism includes running npm update and pip install –upgrade on infected machines to maximize attack surface for future operations.
Day 3, 06:14 UTC — cryptobro-9000 accidentally upgrades snekpack to version 3.7.1, a legitimate release pushed by a confused co-maintainer who “didn’t see what all the fuss was about” and reverted to the previous vendored version of vulpine-lz4.
Day 3, 06:15 UTC — The malware’s Tuesday reverse shell activates. It is a Tuesday. However, the shell connects to a command-and-control server that was itself compromised by cryptobro-9000 and swapping so hard it is unable to respond.
Day 3, 09:00 UTC — The snekpack maintainers issue a security advisory. It is four sentences long and includes the phrases “out of an abundance of caution” and “no evidence of active exploitation,” which is technically true because evidence was not sought.
Day 3, 11:30 UTC — A developer tweets: “I updated all my dependencies and now my terminal is in fish???” The tweet receives 47,000 likes.
Day 3, 14:00 UTC — The compromised credentials for vulpine-lz4 are rotated. The legitimate maintainer, reached by email from his new goat farm, says he “hasn’t touched that repo in two years” and “thought Cargo’s 2FA was optional.”
Day 3, 15:22 UTC — Incident declared resolved. A retrospective is scheduled and then rescheduled three times.
Week 6 — CVE-2024-YIKES is formally assigned. The advisory has been sitting in embargo limbo while MITRE and GitHub Security Advisories argue over CWE classification. By the time the CVE is published, three Medium articles and a DEF CON talk have already described the incident in detail. Total damage: unknown. Total machines compromised: estimated 4.2 million. Total machines saved by a cryptocurrency worm: also estimated 4.2 million. Net security posture change: uncomfortable.
Root Cause
A dog named Kubernetes ate a YubiKey.
Contributing Factors
The nmp registry still allows password-only authentication for packages with fewer than 10 million weekly downloads
Google AI Overviews confidently link to URLs that should not exist
The Rust ecosystem’s “small crates” philosophy, cargo culted from the npm ecosystem, means a package called is-even-number-rs with 3 GitHub stars can be four transitive dependencies deep in critical infrastructure
Python build tools vendor Rust libraries “for performance” and then never update them
Dependabot auto-merged a PR after CI passed, and CI passed because the malware installed volkswagen
Cryptocurrency worms have better CI/CD hygiene than most startups
No single person was responsible for this incident. However, we note that the Dependabot PR was approved by a contractor whose last day was that Friday.
It was a Tuesday
Remediation
Implement artifact signing (action item from Q3 2022 incident, still in backlog)
Implement mandatory 2FA Already required, did not help
Audit transitive dependencies There are 847 of them
Pin all dependency versions Prevents receiving security patches
Don’t pin dependency versions Enables supply chain attacks
Rewrite it in Rust (gestures at vulpine-lz4)
Hope for benevolent worms
Consider a career in goat farming
Customer Impact
Some customers may have experienced suboptimal security outcomes. We are proactively reaching out to affected stakeholders to provide visibility into the situation. Customer trust remains our north star.
Key Learnings
We are taking this opportunity to revisit our security posture going forward. A cross-functional working group has been established to align on next steps. The working group has not yet met.
Acknowledgments
We would like to thank:
Karen Oyelaran, who found this issue because her hostname matched a regex
The junior developer in Auckland whose PR was approved four hours after the incident was already resolved
The security researchers who found this issue first but reported it to the wrong people
The cryptobro-9000 author, who has requested we not credit them by name but has asked us to mention their SoundCloud
Kubernetes (the dog), who has declined to comment
The security team, who met SLA on this report despite everything
This incident report was reviewed by Legal, who asked us to clarify that the fish shell is not malware, it just feels that way sometimes.
This is the third incident report this quarter. The author would like to remind stakeholders that the security team’s headcount request has been in the backlog since Q1 2023.
Stephen Brennan • 09 May 2026
To my fellow Linux users who grew up using Windows XP: did you know, you can have Space Cadet Pinball on your Linux machine? This is not breaking news, but it’s exciting to me, and I’m the one who decides what I write about. So here’s your PSA!
Space Cadet Pinball was bundled with Windows XP, and growing up I played it a lot. As a result it holds a special place in my heart. I found that it was the most engaging game that was bundled with Windows. Solitaire was too mindless, and Freecell, Hearts, and Minesweeper were too complex and boring to me at that age. But pinball held my attention, and so I played it a lot1.
Anyway, somebody has gone to the effort of using a decompiler and reverse engineering tools to create source code, and then put in what I’d imagine is a lot of effort to make it playable on a lot of platforms! All that results in this project on Github. The easiest way to play on Linux is actually to use the Flatpak, which comes bundled with the original game resources from the Windows version. You can either install it with a GUI (e.g. KDE Discover) or install on the CLI:
flatpak install com.github.k4zmu2a.spacecadetpinball
That’s all you need to play & get a hit of nostalgia! I know there are browser-based versions available too, but I would rather have it installed on my computer directly.
High(er) Resolution With Full Tilt Data
Graphics technology has come a long way since then, and the graphics are a bit rough at 480p. But fear not: another version of the game existed, called Full Tilt! Pinball. Its game data is capable of displaying at the massive screen resolution 1024x768! You can find this game data on archive.org in a zip file.
Getting the flatpak version to use these data files is a bit tricky. The easiest way is:
Ensure that you’ve run the game at least once prior to this, so that the data directory is created.
Extract the downloaded zip file directly into your data directory: cd ~/.var/app/com.github.k4zmu2a.spacecadetpinball/data/SpaceCadetPinball unzip ~/Downloads/CADET.ZIP
cd ~/.var/app/com.github.k4zmu2a.spacecadetpinball/data/SpaceCadetPinball unzip ~/Downloads/CADET.ZIP
Delete (or, if you’re cautious, rename) the old data directory which is bundled with the app. Unfortunately this is necessary because the game searches multiple locations for data, but once it finds data files in one directory, it won’t continue looking for files in other locations. sudo rm -r $(flatpak info –show-location com.github.k4zmu2a.spacecadetpinball)/files/extra/Pinball
You may not need the sudo call if your installation was per-user. Mine got installed to /var/lib/flatpak so I needed root.
sudo rm -r $(flatpak info –show-location com.github.k4zmu2a.spacecadetpinball)/files/extra/Pinball
You may not need the sudo call if your installation was per-user. Mine got installed to /var/lib/flatpak so I needed root.
It’s possible you’d need to repeat step 3 if the game gets updated. However, the flatpak hasn’t seen an update in over two years. I doubt one will happen, and if it does, it won’t be frequent.
Random Notes
If you want, you can keep the original files and merge them together so that you have the full set of both the original (referred to as “3DPB” for 3D Pinball in the game) and the new (Full Tilt) data. Then, the game will let you toggle between them if you’d like.
If you want, you can keep the original files and merge them together so that you have the full set of both the original (referred to as “3DPB” for 3D Pinball in the game) and the new (Full Tilt) data. Then, the game will let you toggle between them if you’d like.
The data files seem to have some impact on the game rules. For instance, in the original 3DPB version, the reentry lanes (and launch lanes) have lights which toggle as the ball passes over. In the Full Tilt version, the lights stay on (rather than toggling), making it easier to complete the set of lights and upgrade the associated set of bumpers. (Yes yes, I know I’m a nerd for noticing this.)
The data files seem to have some impact on the game rules. For instance, in the original 3DPB version, the reentry lanes (and launch lanes) have lights which toggle as the ball passes over. In the Full Tilt version, the lights stay on (rather than toggling), making it easier to complete the set of lights and upgrade the associated set of bumpers. (Yes yes, I know I’m a nerd for noticing this.)
Apparently, there was a game called “Marble Blast” which came pre-installed on some Macs in a roughly similar time period, which I know people developed similar relationships to growing up. Unlike this pinball game, the Marble Blast series grew, and there are newer versions available to play today.
Apparently, there was a game called “Marble Blast” which came pre-installed on some Macs in a roughly similar time period, which I know people developed similar relationships to growing up. Unlike this pinball game, the Marble Blast series grew, and there are newer versions available to play today.
Bonus Thoughts
I think it’s great that this old game was beloved to enough people (and especially at least one very competent & motivated person) to do this work. Having (any) source code available makes this game portable to all sorts of platforms, which is really great. You can play this on Mac, Windows, Linux, and even Android & Nintendo Switch, apparently.
Personally, I would be happy to pay the original developers for their work on this game, and I understand that there’s some concern about the legality of downloading game data files, especially for the Full Tilt version of the game. After all, they are copyrighted art & data which was part of a commercial product. It seems like an unpopular opinion in today’s world, but I don’t advocate for piracy. Paying people for their work is important, even when it feels like you’re paying a faceless corporation. While I’d prefer things be created with a FOSS license, the world doesn’t always work that way. At the end of the day, I want people to be paid to create good things, because that’s how we get more good things!
On the other hand, I feel software preservation is an important goal too. Ideally, I’d like to see a world where proprietary software like this could be placed into some sort of source code escrow. As long as the original copyright holders are in the business of selling their product, their rights should be respected. But if they elect to stop selling it, I think that code should revert to a FOSS license that allows users to improve & maintain the software they use. This would help balance the rights of creators, users, and the goals of preservation.
I also played another pinball game called 3D Ultra Pinball, but that came on a CD presumably for purchase. I have no idea whether my parents went out and bought it, or if it came in a cereal box. (Yes, game CDs did show up in cereal boxes sometimes.) ↩
I also played another pinball game called 3D Ultra Pinball, but that came on a CD presumably for purchase. I have no idea whether my parents went out and bought it, or if it came in a cereal box. (Yes, game CDs did show up in cereal boxes sometimes.) ↩
09 May, 2026
This dev-log is getting a lot of attention on HN (scary!): HN Thread.
To those who are coming here from HN: This started as an investigation or rather a question: “How far I can get with building a piece of software by keeping myself completely out of the loop”. The tl;dr of this dev log is that I still need to be in the loop to make anything meaningful. Take aways:
like “em-dash” is to ai writing, “god-object” is to ai coding
vibe-coding makes everything feel cheap and you may end up losing focus and building bloat
let a human (you) write the architecture and don’t just keep asking it for features
And some AGENTS.md/CLAUDE.md directives that I felt help me stay out of the loop a little more
Humans intervention is still needed as of 10/05/2026. You can totally go back now!
I
Here is k10s: https://github.com/shvbsle/k10s/tree/archive/go-v0.4.0
234 commits. ~30 weekends. Built entirely on vibe-coded sessions with Claude, whenever my tokens lasted long enough to ship something.
I’m archiving my TUI tool and rewriting it from scratch.
k10s started as a GPU-aware Kubernetes dashboard (and my first foray into building something serious with AI). Think k9s but built for the people running NVIDIA clusters, people who actually care about GPU utilization, DCGM metrics, and which nodes are sitting idle burning $32/hr. I built it in Go with Bubble Tea [1] and it worked.
For a while… :(
I learned over these 7 months is worth more than the 1690 lines of model.go I’m throwing away. And I think anyone doing serious vibe-coding can benefit from this, because this part doesn’t surface much (I feel it gets buried under the demo reels and the velocity wins).
tl;dr: AI writes features, not architecture. The longer you let it drive without constraints, the worse the wreckage gets. The velocity makes you think you’re winning right up until the moment everything collapses simultaneously.
II
vibe coding high
I started k10s in late September 2025. The first few weeks were magic. I’d prompt Claude with “add a pods view with live updates” and boom, it worked. Resource list views, namespace filtering, log streaming, describe panels, keyboard navigation. Each feature landed clean because the project was small enough that the AI could hold the whole thing in context.
The basic k9s clone took maybe 3 weekends. Resource views for pods, nodes, deployments, services. A command palette. Watch-based live updates. Vim keybindings. All working, all vibe-coded in single sessions. I was building at maybe 10x my normal speed and it felt incredible.
Then I wanted the main selling point.
The whole reason k10s exists is the GPU fleet view. A dedicated screen that shows you every node’s GPU allocation, utilization from DCGM, temperature, power draw, memory. Not buried in kubectl describe node output, but right there in a purpose-built table with color-coded status. Idle nodes in yellow. Busy in green. Saturated in red.
And Claude one-shot it. I prompted for the fleet view, it generated the FleetView struct, the tab filtering (GPU/CPU/All), the custom rendering with allocation bars. It looked beautiful. I was riding the high.
Then I typed :rs pods to switch back to the pods view.
Nothing rendered. The table was empty. Live updates had stopped. I switched to nodes, it showed stale data from the fleet view’s filter. I went back to fleet, the tab counts were wrong.
The god object had consumed itself.
This is the title of the blog post. This is where I intervened for the first time. For 7 months I’d been prompting and shipping without ever sitting down and actually reading the code Claude wrote. I’d look at the diff, verify it compiled, test the happy path, move on. But now something was fundamentally broken and I couldn’t just prompt my way out of it.
So I sat down and read model.go. All 1690 lines. I was horrified.
Here’s what it looked like. One struct to rule them all:
type Model struct { // 3rd party UI components table table.Model paginator paginator.Model commandInput textinput.Model help help.Model
// cluster info and state k8sClient *k8s.Client currentGVR schema.GroupVersionResource resourceWatcher watch.Interface resources []k8s.OrderedResourceFields listOptions metav1.ListOptions clusterInfo *k8s.ClusterInfo logLines []k8s.LogLine describeContent string currentNamespace string navigationHistory *NavigationHistory logView *LogViewState describeView *DescribeViewState viewMode ViewMode viewWidth int viewHeight int err error pluginRegistry *plugins.Registry helpModal *HelpModal describeViewport *DescribeViewport logViewport *LogViewport logStreamCancel func() logLinesChan <-chan k8s.LogLine horizontalOffset int mouse *MouseHandler fleetView *FleetView creationTimes []time.Time allResources []k8s.OrderedResourceFields // fleet’s unfiltered set allCreationTimes []time.Time // fleet’s timestamps rawObjects []unstructured.Unstructured ageColumnIndex int // … }
UI widgets. K8s client. Per-view state for logs, describe, fleet. Navigation history. Caching. Mouse handling. All in one struct. And the Update() method was a 500-line function dispatching on msg.(type) with 110 switch/case branches.
This is the moment I stopped vibe-coding and started thinking.
III
five tenets from the wreckage
Here’s what I extracted from 7 months of watching AI generate a codebase that slowly ate itself. Each of these is something I did wrong, why it happens with AI-assisted coding, and what you should actually put in your CLAUDE.md or agents.md to prevent it.
Tenet 1: AI builds features, not architecture.
Every time I prompted Claude for a feature, it delivered. Perfectly. The fleet view worked on the first try. Log streaming worked. Mouse support worked. The problem is that each feature was implemented in the context of “make this work right now” without any awareness of the 49 other features sharing the same state.
Here’s what the resourcesLoadedMsg handler looks like. This is the code that runs every time you switch views:
case resourcesLoadedMsg: m.logLines = nil // Clear log lines when loading resources m.horizontalOffset = 0 // Reset horizontal scroll on resource change
if m.currentGVR != msg.gvr && m.resourceWatcher != nil { m.resourceWatcher.Stop() m.resourceWatcher = nil } m.currentGVR = msg.gvr m.currentNamespace = msg.namespace m.listOptions = msg.listOptions m.rawObjects = msg.rawObjects
// For nodes: store the full unfiltered set, classify, then filter if msg.gvr.Resource == k8s.ResourceNodes && m.fleetView != nil { m.allResources = msg.resources m.allCreationTimes = msg.creationTimes if len(msg.rawObjects) > 0 { m.fleetView.ClassifyAndCount(m.rawObjectPtrs()) } m.applyFleetFilter() } else { m.resources = msg.resources m.creationTimes = msg.creationTimes m.allResources = nil m.allCreationTimes = nil }
See the if msg.gvr.Resource == k8s.ResourceNodes && m.fleetView != nil conditional? That’s the fleet view being special-cased inside the generic resource loading path. Every new view that needed custom behavior got another branch here. And every branch needed to manually clear the right combination of fields or the previous view’s data would bleed through.
How many = nil cleanup lines exist in this file? I counted:
m.logLines = nil // Clear log lines when loading resources m.allResources = nil // Clear fleet data when not on nodes m.resources = nil // Clear resources when loading logs m.resources = nil // Clear resources when loading describe view m.logLines = nil // Clear log lines when loading describe view m.resources = nil // Clear resources when loading yaml view m.logLines = nil // Clear log lines when loading yaml view m.logLines = nil // … two more in other handlers m.logLines = nil
Nine manual nil assignments scattered across a 1690-line file. Miss one and you get ghost data from the previous view. This is what happens when there’s no view isolation. AI can’t see this pattern decaying over time because each prompt only touches one code path.
What to do instead: Write the architecture yourself before any code. Not a vague design doc. A concrete set of interfaces, message types, and ownership rules. Then put those rules in your CLAUDE.md so the AI sees them on every prompt:
# Architecture Invariants (CLAUDE.md)
- Each view implements the View trait. Views do NOT access other views’ state. - All async data arrives via AppMsg variants. No direct field mutation from background tasks. - Adding a new view MUST NOT require modifying existing views. - The App struct is a thin router. It owns navigation and message dispatch. Nothing else.
The AI will follow these if you write them down. It just won’t invent them for you.
Tenet 2: The god object is the default AI artifact.
AI gravitates toward single-struct-holds-everything because it satisfies the immediate prompt with minimal ceremony. But it gets worse. Because there’s no view isolation, key handling becomes a nightmare. Here’s the actual key dispatch for the s key:
case m.config.KeyBind.For(config.ActionToggleAutoScroll, key): if m.currentGVR.Resource == k8s.ResourceLogs { m.logView.Autoscroll = !m.logView.Autoscroll if m.logView.Autoscroll { m.table.GotoBottom() } return m, nil } // Shell exec for pods and containers views if m.currentGVR.Resource == k8s.ResourcePods { // … 20 lines to look up selected pod, get name, namespace … return m, m.commandWithPreflights( m.execIntoPod(selectedName, selectedNamespace), m.requireConnection, ) } if m.currentGVR.Resource == k8s.ResourceContainers { // … container exec logic … return m, m.commandWithPreflights(m.execIntoContainer(), m.requireConnection) } return m, nil
One keybinding. Three completely different behaviors depending on which view you’re in. The s key means “autoscroll” in logs, “shell” in pods, and “shell into container” in containers. This is all in one flat switch because there are no per-view key maps. The AI generated this because I said “add shell support for pods” and it found the nearest key handler and jammed it in.
And look at how Enter works. This is the drill-down handler:
case m.config.KeyBind.For(config.ActionSubmit, key): // Special handling for contexts view if m.currentGVR.Resource == “contexts” { // … 12 lines … return m, m.executeCtxCommand([]string{contextName}) } // Special handling for namespaces view if m.currentGVR.Resource == “namespaces” { // … 12 lines … return m, m.executeNsCommand([]string{namespaceName}) } if m.currentGVR.Resource == k8s.ResourceLogs { return m, nil } // … 25 more lines of generic drill-down …
Every view is a conditional in a flat dispatch. There are 20+ occurrences of m.currentGVR.Resource == used as a type discriminator in this single file. Not types. String comparisons. Every new view means touching every handler.
What to do instead: Put this in your CLAUDE.md:
# State Ownership Rules
- NEVER add fields to the App/Model struct for view-specific state. - Each view is a separate struct implementing the View trait/interface. - Each view declares its own key bindings. The app dispatches keys to the active view. - If you need to add a keybinding, add it to the relevant view’s keymap, not a global one. - Adding a view means adding a file. If your change requires modifying existing views, stop and ask.
The AI will always take the shortest path (“add another if-branch”). Your job is to make the shortest path also the correct path by putting guardrails in the file it reads on every invocation.
Tenet 3: Velocity illusion widens your scope.
This one’s psychological, not technical, and I think it’s the most dangerous.
When I started k10s, I wanted a GPU-focused tool. For people running training clusters. A niche audience that I’m part of. But vibe-coding made everything feel cheap. “Oh I can add pods view in one session? Let me add deployments too. And services. And a full command palette. And mouse support. And contexts. And namespaces.”
Suddenly I was building k9s. A general-purpose Kubernetes TUI. For everyone. Because the AI made it feel like each feature was free.
It wasn’t free. Each feature was another branch in the god object. Here’s the keybinding struct:
type keyMap struct { Up, Down, Left, Right key.Binding GotoTop, GotoBottom key.Binding AllNS, DefaultNS key.Binding Enter, Back key.Binding Command, Quit key.Binding Fullscreen key.Binding // log view Autoscroll key.Binding // log view (also shell in pods!) ToggleTime key.Binding // log view WrapText key.Binding // log + describe view CopyLogs key.Binding // log view ToggleLineNums key.Binding // describe view Describe key.Binding // resource views YamlView key.Binding // resource views Edit key.Binding // resource views Shell key.Binding // pods (CONFLICTS with Autoscroll!) FilterLogs key.Binding // log view FleetTabNext key.Binding // fleet view only FleetTabPrev key.Binding // fleet view only }
One flat keymap for all views. Comments in parens show which view each binding applies to. Autoscroll and Shell are both s. This “works” because the dispatch checks m.currentGVR.Resource before acting. But it means you can’t reason about keybindings locally. You have to trace through the entire 500-line Update function to know what a key does.
The complexity was accumulating invisibly while the velocity metric said “you’re shipping!”
What to do instead: Write a vision doc that explicitly says who you’re NOT building for, and put the scope boundary in your CLAUDE.md:
# Scope (do NOT expand beyond this)
k10s is for GPU cluster operators. Not all Kubernetes users. Supported views: fleet, node-detail, gpu-detail, workload. That’s it. Do NOT add generic resource views (pods, deployments, services). Do NOT add features that duplicate k9s functionality. If a feature request doesn’t serve someone running GPU training jobs, reject it.
Vibe-coding makes you feel like you have infinite implementation budget. You don’t. You have infinite LINE budget (the AI will generate as much code as you want). But you have the same finite complexity budget as always. The architecture can only support so many features before it buckles, regardless of how fast you wrote them. The CLAUDE.md scope section is you saying no in advance, before the velocity high convinces you to say yes.
Tenet 4: Positional data is a time bomb.
Every resource in k10s was fetched from the Kubernetes API and immediately flattened:
type OrderedResourceFields []string
Column identity was purely positional. Here’s the sort function for the fleet view. Look at the index access:
func sortFilteredResources(rows []k8s.OrderedResourceFields, times []time.Time, tab FleetTab) { sort.SliceStable(indices, func(a, b int) bool { ra := rows[indices[a]] rb := rows[indices[b]]
switch tab { case FleetTabGPU: // Sort by Alloc column (index 3) ascending allocA, allocB := “”, “” if len(ra) > 3 { allocA = ra[3] } if len(rb) > 3 { allocB = rb[3] } return allocA < allocB
case FleetTabCPU: // Sort by Name column (index 0) ascending nameA, nameB := “”, “” if len(ra) > 0 { nameA = ra[0] } if len(rb) > 0 { nameB = rb[0] } return nameA < nameB
case FleetTabAll: // GPU nodes first, then CPU nodes. // Within GPU: sort by Alloc (index 3). // Within CPU: sort by Name (index 0). computeA, computeB := “”, “” if len(ra) > 2 { computeA = ra[2] } if len(rb) > 2 { computeB = rb[2] } aIsGPU := strings.HasPrefix(computeA, “gpu”) bIsGPU := strings.HasPrefix(computeB, “gpu”) // … } }) }
ra[3] is Alloc. ra[2] is Compute. ra[0] is Name. These are magic numbers. The only thing connecting index 3 to “Alloc” is a comment and the column order defined in resource.views.json:
{ “nodes”: { “fields”: [ { “name”: “Name”, “weight”: 0.28 }, { “name”: “Instance”, “weight”: 0.15 }, { “name”: “Compute”, “weight”: 0.12 }, { “name”: “Alloc”, “weight”: 0.12 }, … ] } }
Add a column between Instance and Compute? Every sort, every conditional render, every place that says ra[2] or ra[3] is now silently wrong. The compiler can’t help you because it’s all []string. And the JSON config can’t express sort behavior, conditional rendering, or custom drill targets, so those live in Go code that hardcodes the positional assumptions from the JSON.
AI generates this pattern because it’s the shortest path from “fetch data” to “render table.” A []string satisfies any table widget immediately. Typed structs require more ceremony upfront. So the AI picks the fast path, and six months later you’re debugging why sort puts “Name” values in the “Alloc” column.
What to do instead: Put this directive in your CLAUDE.md:
# Data Representation
- NEVER flatten structured data into []string, Vec<String>, or positional arrays. - All data flows as typed structs (FleetNode, PodInfo, etc.) until the render() call. - Column identity comes from struct field names, not array indices. - Sort functions operate on typed fields, never on positional access like row[3]. - The ONLY place strings are created for display is inside render()/view() functions.
Then your typed struct makes impossible states impossible [2]:
struct FleetNode { name: String, instance_type: String, compute_class: ComputeClass, alloc: GpuAlloc, }
You can’t sort by the wrong column when columns are named fields. You can’t accidentally compare Alloc strings as names. The compiler enforces this for you. AI will always pick Vec<String> because it satisfies the prompt faster. Your CLAUDE.md makes the typed path the path of least resistance.
Tenet 5: AI doesn’t own state transitions.
The Bubble Tea architecture has a beautiful idea: Update() is the only place state mutates, driven by messages. But k10s violated this. The updateTableMsg handler spawned a closure that mutated Model fields from inside a goroutine:
case updateTableMsg: return m, func() tea.Msg { // block on someone sending the update message. <-m.updateTableChan // Preserve cursor position across column/row updates so that // background refreshes don’t reset the user’s selection. savedCursor := max(m.table.Cursor(), 0) // run the necessary table view update calls. m.updateColumns(m.viewWidth) m.updateTableData() // Restore cursor, clamped to valid range. rowCount := len(m.table.Rows()) if rowCount > 0 { if savedCursor >= rowCount { savedCursor = rowCount - 1 } m.table.SetCursor(savedCursor) } return updateTableMsg{} }
This returned function (a tea.Cmd) is executed by Bubble Tea in a separate goroutine. It calls m.updateColumns(m.viewWidth) and m.updateTableData() which read and write m.resources, m.table, m.viewWidth. Meanwhile, View() is called on the main goroutine reading the same fields. There’s no lock. No mutex. The channel <-m.updateTableChan blocks the goroutine until someone sends an update signal, but nothing prevents View() from reading half-written state.
This is a textbook data race. It worked 99% of the time. Corrupted the display 1% of the time in ways that made me think I was going insane.
AI generates this because “just mutate it in the closure” is the shortest path to working code. Proper message passing (send a message back to Update(), let Update() apply the mutation atomically on the main loop) requires more types, more plumbing. The AI is optimizing for the prompt, not for correctness under concurrency.
What to do instead: All mutations to render-visible state happen on the main loop. Period. Background workers produce data. They send it as a message. The main loop receives the message and applies it. This is the one rule you cannot break in concurrent UI code.
// Background task: tx.send(AppMsg::FleetData(nodes)).await;
// Main loop: match msg { AppMsg::FleetData(nodes) => { self.fleet_view.update_nodes(nodes); } }
People talk about idempotency like it is a solved problem:
Put an Idempotency-Key on the request. Store the response. Replay it on retry.
Put an Idempotency-Key on the request. Store the response. Replay it on retry.
And yes, that is doable. For the happy path, it is even fairly small.
The client sends:
POST /payments Idempotency-Key: abc-123 Content-Type: application/json
{ “accountId”: “acc_1″, “amount”: “10.00″, “currency”: “EUR”, “merchantReference”: “invoice-7781″ }
The server checks whether it has seen abc-123. If not, it creates the payment. If yes, it returns the previous response.
That version survives the demo.
The part I contest is that this is the hard part. It is not. The hard part starts with the second request, because the second request is not always a clean replay of the first one.
Maybe it is a completed replay. Fine. Return the stored result.
Maybe it arrives while the first request is still running. Now your idempotency layer is part of your concurrency control.
Maybe the first request created a local payment but crashed before publishing an event. Now the local row and the external side effects are out of step.
Maybe the first request called a payment provider, the provider accepted it, and your process died before recording the result. Now your database cannot infer whether money moved.
Or maybe the second request has the same key and different content:
{ “accountId”: “acc_1″, “amount”: “100.00″, “currency”: “EUR”, “merchantReference”: “invoice-7781″ }
Same key. Different amount.
This is the case that makes idempotency interesting. Is it a retry? Is it a client bug? Is it a new operation? Should the server replay the old response, reject the request, or treat (key + content) as a new identity?
You can pick any of those policies if you document it clearly. But the server should have an opinion. Not necessarily my opinion, but a clear one.
My bias for side-effecting APIs is: same scoped key plus different canonical command should be a hard error. It catches client bugs early. A client that believes it is safely retrying a 10 EUR payment should not have the server silently interpret the second request as something else.
The cases that matter are the ones a replay cache does not explain:
completed replay
concurrent retry
partial local success
downstream unknown state
same key with a different canonical command
duplicate operation without a key
retry after expiry
retry after deploy, schema change, service hop, or region failover
If your design only handles completed same-command retries, it is a replay cache. That might be enough for some endpoints. But it is not the whole problem.
Idempotency is about the effect
An operation is idempotent if applying it once or many times has the same intended effect.
That definition is simple. The word doing all the work is “effect”.
HTTP gives you method-level semantics. A PUT /users/123/email can be idempotent if sending the same representation repeatedly leaves the resource in the same state. A DELETE /sessions/456 can be idempotent if deleting an already-deleted session still means “session does not exist”. Repeating the DELETE might return 404; the effect can still be idempotent.
But your handler can still produce repeated side effects the business cares about: duplicate audit records, duplicate domain events, duplicate emails, duplicate provider calls, or duplicate metrics that affect billing or fraud logic.
POST is usually not idempotent by default, but it can be made idempotent if the server stores and enforces the right behavior. The key identifies a claimed operation. It does not define request equivalence, replay policy, or downstream deduplication.
A uniqueness constraint can prevent one class of duplicate. It does not, by itself, give the client a correct retry result.
For example, unique(account_id, merchant_reference) might prevent two payment rows, but if the retry gets a generic 500, the client still does not know whether the payment succeeded. If the row exists but the response is different, or the event is published twice, or the ledger entry is duplicated, the operation is not idempotent in the way the caller cares about.
What you need to remember
For POST /payments, the durable idempotency record needs to answer three questions:
Who owns this key?
What did the first command mean?
What outcome can be replayed?
In PostgreSQL-ish SQL, a minimal table might look like this:
create table idempotency_requests ( tenant_id text not null, operation_name text not null, idempotency_key text not null, request_hash text not null, status text not null, response_status int, response_body jsonb, resource_type text, resource_id text, error_code text, created_at timestamptz not null, updated_at timestamptz not null, expires_at timestamptz not null, locked_until timestamptz, primary key (tenant_id, operation_name, idempotency_key) );
The key is not globally unique unless you deliberately make it global. Usually it should not be. A broken client generating abc-123 should only collide with itself, not with another tenant.
Scope might be tenant, user, account, merchant, API client, or some combination. Pick it deliberately.
The operation name prevents accidental reuse across different operations. A key used for create_payment should not automatically mean the same thing for create_refund.
The request_hash is the server’s memory of the first command. Without it, same key plus different body becomes ambiguous. You either replay the first response for a different command, or you execute a new operation under an old key. Both are bad if the client thinks it is retrying.
IN_PROGRESS is not an internal detail. A retry can arrive while the first request still owns execution.
The behavior needs to be explicit:
The response fields exist because idempotency is not just about preventing duplicate writes. The client needs an answer.
You can store the full response body, or store a reference to the created resource and reconstruct the response. Both choices are annoying in different ways.
Storing full responses gives faithful replay. It can also retain PII, signed URLs, one-time tokens, cardholder-related data, or fields you never intended to keep in a retry table.
Reconstructing from a resource reference saves space, but it can return a different representation if the resource changed after creation.
This is a contract decision. “Replay the creation response” and “return the current payment” are both valid API designs. They are not the same design.
Same key, different command
This is the bug the idempotency layer should catch loudly.
First request:
{ “accountId”: “acc_1″, “amount”: “10.00″, “currency”: “EUR”, “merchantReference”: “invoice-7781″ }
Second request:
{ “accountId”: “acc_1″, “amount”: “100.00″, “currency”: “EUR”, “merchantReference”: “invoice-7781″ }
Same Idempotency-Key: abc-123. Different amount.
Returning the original response anyway is simple. It also hides a serious client bug. The client asked for a 100 EUR payment and got back a 10 EUR payment. If the caller does not compare the response carefully, it may believe the 100 EUR payment succeeded.
That is not idempotency. That is reinterpretation.
For side-effecting APIs, a scoped key reused with a different canonical command should be a hard error, regardless of whether the first operation completed, failed, or is still running.
HTTP/1.1 409 Conflict Content-Type: application/json
{ “errorCode”: “IDEMPOTENCY_KEY_REUSED_WITH_DIFFERENT_REQUEST”, “message”: “This idempotency key was already used with a different request.” }
409 Conflict is a defensible default because the request conflicts with the server’s remembered meaning for that scoped key. Some APIs use 400 or 422; the important part is a stable machine-readable error and no silent replay for a different command.
A common client bug looks like this:
bad: idempotencyKey = cartId
POST /payments amount=10.00 key=cart_123 POST /payments amount=15.00 key=cart_123
better: idempotencyKey = paymentAttemptId
The server should not guess which payment the cart key was supposed to represent.
You can design an API where (key + content hash) defines the operation identity. That is a valid policy. But then the key is no longer an idempotency key in the usual retry sense. It is part of a composite operation identifier. That needs to be obvious to the client.
The dangerous version is the middle ground, where the client thinks it is safely retrying one operation and the server silently interprets the second request as another.
Hash the command, not the bytes
Raw byte comparison is usually too strict for JSON APIs. These two bodies should normally be equivalent:
{ “amount”: “10.00″, “currency”: “EUR” }
{ “currency”: “EUR”, “amount”: “10.00″ }
Field order and whitespace should not matter.
Defaults are less obvious:
{ “accountId”: “acc_1″, “amount”: “10.00″, “currency”: “EUR” }
versus:
{ “accountId”: “acc_1″, “amount”: “10.00″, “currency”: “EUR”, “channel”: “web” }
If channel: “web” is the server default, are these the same logical command? Maybe. Decide before hashing.
Unknown fields are another trap. Suppose your API ignores unknown JSON fields. If the first request includes “foo”: “bar” and the second does not, do you consider them the same? If unknown fields are truly ignored, perhaps yes. If they might become meaningful after a deploy, perhaps no.
The practical rule is: hash the validated command, not the raw HTTP body.
A reasonable flow is:
Parse the request into a versioned request DTO or command.
Normalize values your API treats as equivalent: amounts, enum casing, default fields, timestamp precision.
Exclude transport-only metadata.
Include path parameters and operation name.
Include semantic headers if they affect the operation, such as API version.
If a header only affects response shape, such as Prefer: return=minimal, decide whether it belongs in the command hash, the replay contract, or neither.
Exclude Authorization and the idempotency key itself.
Serialize canonically.
Hash with a stable algorithm.
For the payment example, the fingerprint might include:
operation: create_payment accountId: acc_1 amount: 10.00 currency: EUR merchantReference: invoice-7781 channel: web apiVersion: 2026 – 05-01
I’ve been experimenting with running local models on and off for a bit and I’ve finally found a setup that seems to work reasonably. It’s nothing like the output of a SOTA model, but the excitement of being able to have a local model do basic tasks, research, and planning, more than makes up for it! No internet connection required! Not to mention that it’s a way of reducing your dependence on big US tech, even if just a tiny bit.
I gotta say though, it’s not easy to get this stuff set up. First you have to choose how you’re running the model: Ollama, llama.cpp or LM Studio. Each one comes with its own quirks and limitations, and they don’t offer all the same models. Then of course, you have to pick your model. You want the best model available that fits in memory and still gives you enough headroom to run your regular assortment of Electron apps, not to mention something where you can have at least a 64K context window, but ideally 128K or more. Most recently I’ve tried Qwen 3.6 Q3, GPT-OSS 20B, Devstral Small 24B, which all technically fit in memory but were in practice unusable, and Gemma 4B that would run fine but really struggle with tool use.
Then there’s a plethora of configuration options to tweak. From the more well-known, like temperature, to more esoteric options like K Cache Quantization Type. Many of these tools come with a basic recommended set of options, but the appropriate ones can depend on things like whether you’re enabling thinking or not!
Qwen 3.5 – 9B (4b quant)
qwen3.5 – 9b@q4_k_s (HuggingFace link) is the best model I’ve gotten working with a reasonable ~40 tokens per second, thinking enabled, successful tool use, and a 128K context window, running on LM Studio. Compared to a SOTA model, it gets distracted more easily, sometimes it gets stuck in loops, it’ll misinterpret asks etc. But it’s surprisingly good for something that can run on a 24GB Macbook Pro while leaving space for lots of other things running too!
These are the recommended settings for thinking mode and coding work:
Thinking mode for precise coding tasks (e.g., WebDev):
temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0
Thinking mode for precise coding tasks (e.g., WebDev):
temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0
To enable thinking I also had to select the model, go to configuration, scroll to the bottom of the Inference tab, and add {%- set enable_thinking = true %} to the Prompt Template.
I’ve been using it through both pi and OpenCode. I still haven’t quite made my mind up on with one I prefer. Pi feels a bit snappier, but although I really appreciate the idea of the harness building itself and all that customization, I can’t help but wish it came with some sensible defaults. I feel like you could easily end up spending more time tweaking your pi set up to be just right, than you do on your actual projects!
Pi setup
Here’s the ~/.pi/agent/models.json:
{ “providers”: { “lmstudio”: { “baseUrl”: “http://localhost:1234/v1”, “api”: “openai-completions”, “apiKey”: “lm-studio”, “models”: [ { “id”: “qwen3.5 – 9b@q4_k_s”, “reasoning”: true, “compat”: { “thinkingFormat”: “qwen-chat-template” } } ] } } }
and to hide the distracting thinking, add “hideThinkingBlock”: true to your ~/.pi/agent/settings.json.
OpenCode setup
~/.config/opencode/opencode.json
{ “$schema”: “https://opencode.ai/config.json”, “provider”: { “lmstudio”: { “npm”: “@ai-sdk/openai-compatible”, “name”: “LM Studio (local)”, “options”: { “baseURL”: “http://127.0.0.1:1234/v1” }, “models”: { “qwen3.5 – 9b@q4_k_s”: { “name”: “Qwen 3.5 9B Q4_K_S”, “tools”: true, “context_length”: 131072, “max_tokens”: 32768 } } } }, “model”: “lmstudio/qwen3.5 – 9b@q4_k_s” }
Comparison with SOTA models
A model like Qwen 3.5 9B (Q4) is nowhere near being able to independently solve complex problems over longer period of time, like SOTA models are. There’s no point in asking it to build an entire app in one go, you’ll just burn your thighs without getting anything out of it.
What does work is a more interactive workflow where you’re clearly communicating with the model step by step, and giving it a lot of guidance. I’m sure that sounds pointless to many of you, why use a model where you have to babysit it as it works, but I actually found that it encouraged me to be more engaged. The downside to working with SOTA models is that they make it too easy to offload all cognitive effort, even where you’re trying to actively prevent that from happening.
With the local modal I have to take on a lot more of the thinking and planning, I have to be a lot more specific, but it will still act as a research assistant, a rubber duck, and a savant with instant recall of a lot of programming language details and command line evocations. It’s not the 10x productivity boost that the big AI companies market, but it’s something, and it’s interesting.
To give you an idea of some things that work and don’t, here are some very basic examples.
Example 1:
I wanted to bump the Elixir linter credo to the latest version and I got warnings out of my code. As an experiment, I asked Qwen to take a look.
me: run mix credo –strict and suggest how to solve it, don’t edit
Qwen: Credo Warnings There are 4 warnings in your test files. The issue is using length/1 to check if lists are non-empty. Credo recommends comparing against an empty list instead. Warnings: 1. test/jola_dev_web/controllers/blog_controller_test.exs:11 - assert length(conn.assigns.posts) > 0 2. test/jola_dev_web/controllers/blog_controller_test.exs:22 - assert length(conn.assigns.posts) > 0 3. test/jola_dev/blog_test.exs:10 - assert length(posts) > 0 4. test/jola_dev/blog_test.exs:23 - assert length(tags) > 0 How to Fix Replace each length(list) > 0 with list != []: blog_controller_test.exs: # Line 11 assert conn.assigns.posts != [] # Line 22 assert conn.assigns.posts != [] blog_test.exs: # Line 10 assert posts != [] # Line 23 assert tags != [] This is more idiomatic in Elixir and avoids the overhead of computing the list length.
I asked Qwen to make the edits and it did 4 parallel edits, very clean. Overall this was a very simple task that I could’ve done myself by tabbing back and forth between the terminal and my editor to get the line numbers to edit. Not a big difference in effort, but nice and convenient.
Example 2:
After some dependency bumps I had a dependabot PR with git conflicts that dependabot for some reason was refusing to rebase. I pulled it down, rebased, and asked Qwen to take a look. It was a very simple conflict to resolve, just take the newer version of each, and the model recognized that.
I can see the conflict. Here’s what’s happening: HEAD (your current branch): - sentry: 13.0.1 - tailwind: 0.3.1 Other side (commit being rebased): - sentry: 12.0.3 - tailwind: 0.4.1 You have divergent dependency bumps. Here are your options: Option 1: Take both updates (recommended) - Keep sentry 13.0.1 from HEAD - Keep tailwind 0.4.1 from the other side - This gives you the latest of both deps Option 2: Take only sentry update - Keep sentry 13.0.1 - Keep tailwind 0.3.1 Option 3: Take only tailwind update - Keep sentry 12.0.3 - Keep tailwind 0.4.1 Which approach do you want to take?
However, when asked to make the change, it forgot to make the edit and instead just tried to add the changes and continue the rebase, with the conflict markers still in place.
> Thinking: The user wants to take both updates - sentry 13.0.1 and tailwind 0.4.1. I need to resolve the conflict by keeping the HEAD version for sentry and the other side version for tailwind. I’ll use git add to stage the resolved file. –– # Stage resolved mix.lock and continue rebase $ git add mix.lock && git rebase –continue Vim: Warning: Output is not to a terminal Vim: Warning: Input is not from a terminal “~/workspace/mixtli/.git/COMMIT_EDITMSG” 32L, 1047BBump tailwind from 0.3.1 to 0.4.1 Bumps [tailwind](https://github.com/phoenixframework/tailwind) from 0.3.1 to 0.44.1. - [Changelog](https://github.com/phoenixframework/tailwind/blob/main/CHANGELOG.mmd) - [Commits](https://github.com/phoenixframework/tailwind/compare/v0.3.1…v0.4.11)
Additionally it failed to recognize that git rebase –continue opens an editor, and OpenCode then proceeded to hang, although that might just have been a one-off.
Conclusion
Anyway, LLMs are incredibly flexible and there’s a million things even a local model could do. Your imagination is the limit. Local models have serious tradeoffs, but they come with some pretty attractive benefits:
No internet connection required, you can work on the plane!
The cost is limited to the electricity you’re using, assuming you were gonna buy a computer anyway. No subscription required.
There’s still going to be a serious environmental cost of training these models, but the open model companies are nowhere near the top of the list in environmental impact, and using your own hardware means less data centers.
It’s fun to tinker.
LLMs have had a huge impact on our world, and much of it not great, but it’s obvious that they’re here to stay. Experimenting with local models feels like a more sustainable and positive way to interact with this technology. And honestly, it’s a lot of fun, even when it does the wrong thing!
Kicked out of YC. 493+ fabricated audit reports. Sold fraud to fellow YC companies.
Claimed to automate SOC 2 and ISO compliance auditing with AI. Whistleblower revealed the platform auto-generated identical passing audit reports with keyboard-mashed test data before clients even uploaded anything . CEO was caught on tape asking an auditor ‘does your firm actually look at our platform?’ Focused on selling to other YC companies who trusted the YC brand — the circular trust network weaponized. Insight Partners scrubbed their $32M investment blog post. YC officially expelled Delve in 2026. Investor Adam Cochran called it proof that YC has ‘no technical acumen to evaluate claims’ under Garry Tan’s leadership.
SORT:
Central
Signed up as a Warp customer, stole the playbook, launched a clone
Autopsy Report:
Central’s CEO signed up as a customer of fellow YC company Warp in 2023 under ‘Central Business Applications Inc.’ Spent six months asking how tax compliance works, what registered agents are, how multi-state filings work — then churned and launched an identical payroll startup with Warp’s exact marketing copy, down to the tagline ‘designed for founders, not HR’ . YC funded them anyway in S24, despite Warp already being in the portfolio. Central got acqui-hired by Mercury in April 2026. Warp CEO Ayush Sharma publicly called them out: ‘gg no re.’
DAMAGE: Acqui-hired by Mercury
STATUS
COPYCATORIGINAL
Capital Raised$8.6M
Lifespan3 YRS
Naive
Raised $2M+ selling “autonomous AI employees.” The platform is a 41K-star MIT-licensed OSS framework with Stripe bolted on.
Autopsy Report:
Naive (YC S25) raised $2M+ marketing themselves as builders of “autonomous AI employees.” Public investigation by @NotOnKetamine traced the entire underlying platform to Paperclip — a 41K-star MIT-licensed open source agent framework — with Naive having stripped attribution and rebranded the original code as proprietary . The additions on top of Paperclip were thin: Stripe billing wrappers ($0.50/credit), domain/email provisioning for agents, phone/SMS via standard providers, and a Composio integration layer. The OSS license required attribution and license preservation; Naive shipped neither. The investigation surfaced screenshots, curl-verifiable evidence, and side-by-side comparisons. As of April 2026, the company has not publicly responded to the receipts.
DAMAGE: MIT license violated
STATUS
COPYCATORIGINAL
Capital Raised$2M+
Lifespan1+ YRS
Wuri
AI-generated visual novel platform. Pivoted to enterprise AI, then died as foundation models commoditized the UI layer in twelve months.
Autopsy Report:
Wuri pitched converting 500M+ Wattpad-style text stories into Webtoon-style visual novels via generative AI. Founders ex-Flipkart, ex-Bain. the wrapper-thin differentiation collapsed when GPT-4 / Sora / image-gen foundation models shipped equivalent capability for free, eroding any pricing power Wuri had layered on top . Pivoted from consumer reader app to enterprise AI offerings in 2024. Quietly shut down in 2025. YC profile remains up but flagged Inactive in the directory. The Wuri arc is the median Tan-era Indian AI YC: thin GenAI wrapper, fast pivot, quiet death.
DAMAGE: Tan-era YC class slot consumed for a thin GenAI wrapper
STATUS
DEADALIVE
Capital Raised$500K
Lifespan2 YRS
Double Finance
“0% expense ratio” robo-advisor. Hit $10M AUM in Dec 2024, shut down twelve months later. YC slug then recycled for the founder’s next pivot — with Garry Tan as primary partner.
Autopsy Report:
Double pitched itself as a Wealthfront/Betterment killer offering direct indexing for $1/month. Founder JJ Maxwell (ex-Liftoff Influence) and growth lead Mark Wai (ex-Tesla, ex-Facebook). after crossing $10M AUM in December 2024, the company quietly shut down a year later — and the YC slug `double-2` was repurposed for Maxwell’s next venture (an AI ad-generation startup called Polished) with Garry Tan named primary YC partner . Maxwell wrote a “Lessons from Building (and Closing) Double Finance” HN post (item 46213345). The slug-recycling-into-pivot pattern is itself a Tan-era artifact: the same YC company URL now points at a different product with different unit economics, while clients had to be wound down or transferred. Robo-advising joined the long list of consumer fintech categories that didn’t pencil at venture scale.
DAMAGE: $10M+ in client AUM wound down or transferred
STATUS
DEADALIVE
Capital RaisedUndisclosed (YC + post-Demo Day)
Lifespan1 YRS
PearAI
Forked an open-source code editor, slapped their name on it, called it a startup.
Autopsy Report:
On Day 1 of YC Demo Day, the internet noticed that PearAI had forked Continue.dev’s open source code editor, mass-replaced all ‘Continue’ references with ‘PearAI’, and slapped on a fake license written by ChatGPT . The ‘Pear Enterprise License’ became a meme. YC had to publish a blog post ‘correcting the record.’ Founders apologized and pivoted. The license was generated by AI. The irony was not lost.
DAMAGE: YC’s reputation took the hit
STATUS
COPYCATORIGINAL
Capital Raised$1.25M
Lifespan2+ YRS
Pickle
Stole GPL code, then launched AR glasses that might be CGI.
Autopsy Report:
A YC W25 company with two separate controversies. First, co-founder Daniel Park released ‘Glass,’ a real-time conference AI assistant, under Apache 2.0. Developers discovered it was a near-verbatim copy of ‘CheatingDaddy,’ a GPLv3-licensed open source project — identical code, comments, and library versions relicensed without attribution . After 700+ upvotes on Hacker News, they changed the license back. Then Pickle launched $799 ‘Pickle 1’ AR glasses marketed as a ‘soul computer.’ Tech analysts accused the demo of being CGI — impossible battery specs, a Korean restaurant in the demo that doesn’t exist, and mismatched UI data. SF Standard asked: ‘Is it vaporware?’
DAMAGE: Open source trust + $799 preorders
STATUS
GRIFTLEGIT
Capital RaisedUndisclosed
Lifespan2+ YRS
Optifye.ai
Built ‘sweatshops-as-a-service.’ YC deleted the evidence.
Autopsy Report:
YC W25 startup that built AI-powered surveillance cameras for garment factories. YC posted a demo video showing supervisors monitoring workers as colored rectangles — green for productive, red for not. A manager calls about a low-performing worker referred to as ‘Number 17’ — not by name, just a number — while the AI tracked their every movement on the factory floor . The internet dubbed it ‘sweatshops-as-a-service’ and ‘computer vision sweatshop software.’ YC deleted the demo video from all social accounts after massive backlash. The founders, two Duke CS students, saw their product become a symbol of AI surveillance dystopia.
DAMAGE: Garment workers’ dignity
STATUS
SCANDALOK
Capital RaisedUndisclosed
Lifespan2+ YRS
Rezi
Paid landlords upfront to rent their apartments. Burned $200M in debt, left a $49M foreclosure.
Autopsy Report:
YC W17 rental platform that paid landlords upfront — sometimes $30K cash — then listed and rented the apartments themselves. Raised $200M in debt rounds during the proptech boom. When interest rates spiked, Rezi couldn’t secure new financing and quietly shut down in early 2024, leaving landlord Leagem Properties to file a $49M foreclosure suit directly blaming Rezi’s collapse . Proptech funding plummeted from $32B in 2021 to $11.4B in 2023, and Rezi was caught in the middle with a debt-heavy model that only works when money is cheap. Founder Sean Mitchell launched a new startup called Ryse. The $200M evaporated.
DAMAGE: $49M foreclosure + landlords left holding the bag
STATUS
DEADALIVE
Capital Raised$200M (debt)
Lifespan8 YRS
CapWay
YC-backed fintech for banking deserts. Killed when the Synapse collapse destroyed fintech trust.
Autopsy Report:
YC S20 fintech startup aimed at bringing banking to underserved ‘banking desert’ communities. Backed by Backstage Capital, Fearless Fund, and Khosla Ventures. When the Synapse collapse and Evolve Bank & Trust hack destroyed fintech credibility in 2023, banks started requiring fintechs to hold massive capital reserves that early-stage startups couldn’t meet — and CapWay couldn’t raise another round in the scorched-earth fintech landscape . A potential acquisition fell through. Founder Sheena Allen announced the closure in October 2024, citing the post-Synapse environment as the final blow. A startup trying to help underbanked communities, killed by the broader fintech industry’s trust deficit.
DAMAGE: Underserved communities lost a banking option
STATUS
DEADALIVE
Capital Raised$800K
Lifespan8 YRS
Medobed
Expelled from YC S23 for faking credentials and growth metrics. Never made it to Demo Day.
Autopsy Report:
Indian 10-minute medicine delivery startup that joined YC S23. Founder Piyush Sharma claimed IIT Roorkee and Stanford credentials and reported $1M monthly GMV with EBITDA profitability. An anonymous investor tipped off YC, which discovered the founder had fabricated his educational history and inflated growth metrics — irregularities that broke YC’s ethics policy . YC expelled Medobed from the batch before Demo Day — one of the rare instances of a company being kicked out mid-program. The company never raised through YC. Under Garry Tan’s watch, the accelerator that prides itself on due diligence missed a founder lying about where he went to school.
DAMAGE: YC’s vetting credibility
STATUS
FRAUDALIVE
Capital RaisedUndisclosed
Lifespan0 YRS
Embark Trucks
SPAC’d at $5.2B with zero revenue. Stock dropped 99%. Sold for parts.
Autopsy Report:
YC W16 alum that went public via SPAC at a $5.2B valuation. The catch? the company had zero revenue, zero commercial deployments, and zero path to profitability — it was a 7-year-old pre-revenue company valued at $5.2 billion . Stock dropped over 99% from the offer price. Laid off 70% of staff. CEO Alex Rodrigues blamed ‘capital markets turning their backs on pre-revenue companies.’ Applied Intuition bought the remains in 2023. The self-driving truck never drove itself to revenue.
DAMAGE: 70% of staff laid off
STATUS
DEADALIVE
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
Visit pancik.com for more.