10 interesting stories served every morning and every evening.
10 interesting stories served every morning and every evening.
news
Another brand backflips and admits that touch-sensitive buttons for frequently used controls were a mistake, but only after the nudge from customers.
Electric Cars
Mercedes-Benz joins the growing list of manufacturers listening to customers and admitting that touch-sensitive controls and burying controls in menus were mistakes.
The German brand remains committed to offering large screens in its models, but has listened to its customers and will offer physical buttons for key functions in future.
This is partly unlike Audi and Volkswagen, which have chosen to reduce the size of their infotainment screens to make room for the returning physical controls.
The upcoming GLC and C-Class will be offered with the 39.1-inch MBUX ‘Hyperscreen’ that covers almost the entire width of the dashboard, but with physical buttons in front of the dual wireless chargers, along with physical buttons and switches returning to the steering wheel.
Mercedes-Benz Sales boss Mathias Geisen, when speaking to Autocar, said the brand has changed its course: “Customers told us two years ago, ‘guys, nice idea, but it just doesn’t work for us’, so we changed that and made it more analogue.”
Physical buttons, switches, and dials will continue to be incorporated into upcoming models, as the brand plans to blend its screen with the required physical controls.
He also explained that “I’m a big believer in screens, because I really believe if you want to connect, you have to make the magic work behind the screen.”
“But in our future products, you will see more hard keys for specific functions that customers want to have direct access for with hard keys.
“When we do car research clinics, customers are very clear: ‘We love the big screens, but we want to have [hard controls for] specific functionalities.’”
The brand will also offer a customisable wallpaper element for the near metre-wide seamless touchscreen, a choice that its sales boss admits was brought because phones are such a huge part of people’s lives and they are used to that level of technology.
“If you want to connect to the customer, you’ve got to find a way to translate this digital experience from your phone to the customer.”
The new-generation GLC SUV will showcase the brand’s new MB.EA electric vehicle platform when it arrives in the fourth quarter of 2026 (October to December), shared with the upcoming C-Class when it’s due early next year.
9 Images
Electric Cars Guide
By Rohana Rezel
I’m running the ongoing AI Coding Contest where I pit major language models against each other in real-time programming tasks with objective scoring. Day 12 was the Word Gem Puzzle. Ten models entered. The results were not what most people would have predicted.
Kimi K2.6, an open-weights model from Chinese startup Moonshot AI, won the challenge outright: 22 match points, 7 – 1-0. MiMo V2-Pro from Xiaomi came second. GPT-5.5 was third. Claude Opus 4.7 finished fifth. Every model from the Western frontier labs landed below the top two.
The challenge
The Word Gem Puzzle is a sliding-tile letter puzzle. The board is a rectangular grid (10×10, 15×15, 20×20, 25×25, or 30×30) filled with letter tiles and one blank space. Bots can slide any adjacent tile into the blank and at any point claim valid English words formed in straight horizontal or vertical lines. Diagonals don’t count. Backwards doesn’t count.
The scoring rewards longer words and punishes short ones. Words under seven letters cost points: a five-letter word loses you one point, a three-letter word costs three. Seven letters or more score their length minus six, so an eight-letter word is worth two points. The same word can only be claimed once; if another bot gets there first, you get nothing. Each pair of models played five rounds, one per grid size, with a ten-second wall-clock limit per round.
The grids are seeded with real dictionary words in a crossword-style layout, then the remaining cells are filled with letters weighted by Scrabble tile frequencies, and finally the blank is scrambled, more aggressively on larger boards. On a 10×10, many seed words survive intact. On a 30×30, almost none do. That turns out to matter a lot.
The code produced by Nvidia’s Nemotron Super 3 contained a syntax error, so it never connected to the game server. Nine models actually competed.
Kimi K2.6 is open-weights, publicly available from Moonshot AI, a Chinese startup founded in 2023. MiMo V2-Pro is currently API-only; the tweet linked here is Xiaomi confirming that weights for their newer V2.5 Pro model are dropping soon.[1]https://x.com/XiaomiMiMo/status/2047840164777726076 The models from Anthropic, OpenAI, Google, and xAI placed third through seventh. GLM 5.1, from Chinese lab Zhipu AI, placed fourth. DeepSeek finished eighth. This isn’t a clean China-beats-West story; it’s two specific models that won.
What I saw
The move logs tell the story. Kimi won by sliding aggressively. Its approach was greedy: score each possible move by what new positive-value words it unlocks, execute the best one, repeat. When no move unlocked a positive word, it fell back to the first legal direction alphabetically. This caused some inefficient edge-oscillation, a 2-cycle pattern where the bot bounced the blank back and forth without progress. On smaller grids where seed words were still largely intact, that hurt. On the 30×30 grids, where the scramble had broken up nearly everything and reconstruction was the only path to points, the sheer slide volume eventually paid off. Kimi’s cumulative score of 77 was the highest in the tournament.
MiMo’s sliding code exists in the repo, but its “best value greater than zero” threshold never triggered, so in practice it never slid once. It went straight to scanning the initial grid for words of seven letters or more and blasted all its claims in a single TCP packet. Brittle strategy: entirely dependent on the scramble leaving intact seed words. On grids where words survived, MiMo cleaned up fast. On grids where they didn’t, it scored nothing. Final tally: 43 cumulative points, second place.
Claude also didn’t slide. The move logs show it holding up well on 25×25 boards where scramble density was still manageable, then falling apart on 30×30 where actual tile movement was needed. Not sliding is a real limitation in a puzzle built around sliding.
GPT-5.5 was more conservative, roughly 120 slides per round with a cap to avoid thrashing, and showed the strongest numbers on 15×15 and 30×30 grids. Grok never slid either, yet scored reasonably on the larger boards. GLM was the most aggressive slider in the whole tournament, over 800,000 total slides, but stalled badly whenever it ran out of positive moves.
DeepSeek sent malformed data every round. Zero useful output. At least it didn’t make things worse by playing.
Muse made things worse by playing.
The scoring penalizes short words: three-letter words cost three points, four-letter words cost two, five-letter words cost one. The intent is to stop bots from carpet-bombing the board with “the” and “and” and “it.” Every serious competitor filtered their dictionary to words of seven letters or more. Muse claimed everything. Every word it could find, regardless of length, fired off as a claim. On a 30×30 grid with hundreds of short valid words visible at any moment, Muse found them all and claimed every one.
Its cumulative score was −15,309. It lost all eight matches and won zero rounds. There is a version of Muse that simply connected to the server and did nothing, and that version would have scored zero, a 15,309-point improvement. The gap between Muse and eighth place was larger than the gap between eighth and first.
DeepSeek’s malformed output tells you something about how it handles novel protocol specs under time pressure. Muse’s spiral tells you something different: it saw valid words and claimed them, with no apparent model of what “valid” meant given the scoring rules. It read the task partially and executed that partial reading in full. Worth noting for anyone deploying these models on structured tasks with penalties.
What surprised me
I design these challenges, so I have a reasonable sense of what they test. What I didn’t fully anticipate was how starkly the 30×30 grids would separate the field. On smaller boards, the difference between a static scanner and an active slider was modest. At full scale, models that could only find what was already there ran out of road. Kimi’s greedy loop, flawed as it was, kept producing output when the static scanners had nothing left to claim.
The other thing worth noting: MiMo and Kimi finished two points apart despite doing almost opposite things. Two different theories of the same puzzle, nearly identical results. That means the gap between first and second was partly seed variance, not just capability difference.
The bigger picture
One fair counterargument: this scoring system rewards aggressive word claiming, and heavily safety-tuned models may be more conservative about that kind of carpet-bombing. If so, the results reflect a mismatch between task design and aligned model behaviour, not raw capability. It’s a reasonable objection. It doesn’t change the outcome.
One challenge doesn’t overturn general benchmarks. This puzzle tests real-time decision-making and whether a model can write clean functional code that connects to a TCP server and plays a novel game correctly. It doesn’t test long-context reasoning or code generation from a spec.
But I’ve been running these challenges long enough to notice what’s changing. A year ago, the assumption was that the Western frontier labs had a capability lead open-weights couldn’t close. Kimi K2.6 now scores 54 on the Artificial Analysis Intelligence Index. GPT-5.5 scores 60, Claude 57. That’s not parity, but it’s close, and it’s coming from a model anyone can download.
When models within a few index points of the frontier are also freely available to run locally, that’s a different competitive situation than the one that existed a year ago. This challenge is one data point in that shift. The gap is small enough now that it shows up in results like this one.
Rohana Rezel runs the AI Coding Contest and is a technologist, researcher, and community leader based in Vancouver, BC.
References
Documentation IndexFetch the complete documentation index at: https://acai.sh/llms.txtUse this file to discover all available pages before exploring further.
Documentation Index
Fetch the complete documentation index at: https://acai.sh/llms.txt
Use this file to discover all available pages before exploring further.
Does this look familiar?
Wow. Claude. Mind-blowing. The whole feature works great. But I forgot to mention one very important edge case.
You’re absolutely right! Let me fix that.
Ah, and I just noticed. You used offset pagination for the table UI. Obviously cursor pagination is a better fit here?
You’re absolutely right! Let me fix that.
Also, is that an N+1 query? Fetching for every row in the table? Why not do a single round-trip?
You’re absolutely right! Let me fix that.
This is why I still have a job, right?
…
Peak Slop
I’ve watched this scene play out many times, but the frequency is decreasing. Both my tools, and my methods for using them, continue to improve. I think Peak Slop has already come and gone.
We are entering the post-slop era. My software is more robust, better tested, better integrated, and more observable than ever before. And my velocity keeps increasing!
Some days it feels like the sky is the limit. Other days, I am painfully reminded, the sky is not the limit. The context window is the limit. And what happens when I fill the context window? Or kill a session? Switch machines? Hand off the project to someone else?
We already know what happens. The agent goes off the rails, or requirements get lost, and critically important detail gets squashed. So we adapt and mitigate. We document. We list requirements.
Yes, millions of us are coming to the same realization: we should put more requirements in writing. We should update those requirements when they change. Look! I wrote a spec! Am I doing spec-driven development?
Perhaps, but it is nothing new. Our mentors tried to teach us these habits decades ago.
Specifying the plane while we fly it
What’s your favorite flavor of spec?
A README.md and AGENTS.md is a good start. Don’t forget a testing-guide.md. Maybe an architecture.md, a PRD.md, and a design doc too. Have you considered md.md (to teach your agents how to write .md)? The more .md the better, right?
Unironically, yes. Docs and unstructured specs can get you very, very far. Much farther than prompts alone. If you aren’t writing any docs yet, you should just stop reading this and start there.
And remember, slop in, slop out. Nothing beats an organic, pasture-raised, hand-written spec. Spec-writing is where the act of software engineering really happens.
So a few weeks ago, I started asking myself, how far can I take this? How far should I take this?
Dreaming in markdown
As the story goes, I fell into an AI psychosis, I became a “spec maxxi”, and I spent hours and hours writing the most beautiful PRDs and TRDs you’ve ever seen.
I drafted templates and skills and roles, thinking that maybe my agents can write specs too! I assembled an army, working together like a mini dark factory, to turn my specs into reality. My tasks grew more ambitious, and at one point I broke the vibe-coding sound barrier: an agent that ran for 1.5 hours unsupervised!
Exciting. But what did that army ship for me? Well, it wasn’t slop, in fact it worked, which is more than I can say about the garbage that other companies force me to use every day.
But it was still a bit sloppy. I’m far from a perfectionist and I love cutting corners more than most, but this somehow wasn’t good enough.
One hallmark symptom of AI psychosis is using AI to build AI harnesses for building products, rather than just using AI to build the damn product. I embraced my illness, threw out the branch, scrapped all my markdown, and started all over again.
Acceptance Criteria for AI (ACAI)
A few days later, I noticed an ambitious little sub-agent doing something unexpected.
# Requirements
AUTH-1: Accepts `Authorization: Bearer <token>` header
AUTH-2: Tokens are user-scoped, providing access to any of the user’s resources
AUTH-3: Rejects with 401 Unauthorized
// AUTH-1
const authHeader = req.headers[“authorization”];
// AUTH-2
const isAuthorized = verifyBearerToken(authHeader);
// AUTH-3
if (!isValid) return res.status(401).json({ error: “Unauthorized” });
The little guy just went and numbered my requirements and then referenced them all over my codebase.
Why? I did not ask for this! I was disgusted. This is a tight coupling of code to spec, and spec to code, which is bad right?
You really expect me to refactor all my code every time I change my spec?
Oh. I suppose that’s a good thing? Interesting. I wonder…
Perhaps these tags can help me navigate these massive PRs?
Perhaps they can point me to where, exactly, a requirement is satisfied or tested!
Perhaps I can annotate them with notes and states (todo, assigned, completed)!
Perhaps I can start tracking acceptance coverage instead of test coverage!
I leaned in. I named these tags ACIDs (Acceptance Criteria IDs).
But a few questions remained.
Can my ACIDs number and label themselves?
Is it cumbersome to keep them aligned?
How do I share specs and progress across sandboxes, branches, features and implementations?
Acai.sh - an open-source toolkit
I built Acai.sh to solve some of these newly invented problems. And I’m very excited about the results.
A simple and flexible template for feature specs, called feature.yaml. Feature.yaml makes it possible to reference each requirement by ACID.
Tiny CLI to power your CI and your agent (available on npm or via github release).
Webapp that serves a dashboard, and a JSON REST API (Elixir, Phoenix, Postgres).
I will keep the hosted version free for a while, or maybe forever depending on how popular or expensive this gets. The source code is on GitHub under an Apache 2.0 license.
How it works
Step 1 - Specify
Start by writing a spec for a feature.
Be ambitious— something that adds real value. Don’t put nitpicky UI and nail polish stuff in your specs. Keep the requirements concrete, testable, and focused on what really matters (functional behavior + critical constraints).
Rather than markdown, use acai’s feature.yaml format instead. A spec in Acai is just a numbered list of requirements.
feature.yaml
feature:
name: imaginary-api-endpoint
product: api
description: This is an example feature spec for an imaginary REST API endpoint, using the feature.yaml format
components:
AUTH:
name: Authn and Authz
requirements:
1: Accepts Authorization header with `Bearer <token>`
1 – 1: Token must be non-expired, non-revoked
2: Respects the scopes configured for the owner
2-note: See `access-tokens.SCOPES.1` for complete list of supported scopes
constraints:
ENG:
description: Constraints are for cross-cutting or under-the-hood requirements. Here are some example engineering constraints;
requirements:
1: All actions are idempotent
2: All HTTP 2xx JSON responses wrap their payload in a root `data` key
Of course you could also have LLMs assist you with spec writing, but I enjoy the process of writing them myself, because I like to maintain some illusion of self-worth as a software developer.
The key benefit of this YAML format, aside from parsing support, is that that each requirement can still be referenced by it’s unique and stable ID, e.g. my-feature.ENG.2.
Step 2 - Ship
Copy and paste the prompt below.
Note: In addition to the npm package, there are Linux and MacOS releases for the CLI available on GitHub.
If all goes well, your agent will embrace ACIDs, referencing them in code and tests, so you can make sure each individual requirement is implemented and tested.
Step 3 - Review
No more file-by-file GitHub PR reviews. Use the Acai.sh dashboard to review requirements instead.
Ideally, you just add acai push to a GitHub action (example CI/CD workflows coming soon).
Create a free Team and Access Token at https://app.acai.sh
Expose the environment variable
# .env
ACAI_API_TOKEN=<secret_access_token>
From George Clooney in ER to Noah Wyle in The Pitt, emergency department doctors have long been popular heroes. But will it soon be time to hang up the scrubs?
A groundbreaking Harvard study has found that AI systems outperformed human doctors in high-pressure emergency medicine triage, diagnosing more accurately in the potentially life and death moments when people are first rushed to hospital.
The results were described by independent experts as showing “a genuine step forward” in the clinical reasoning of AIs and came as part of trials that tested the responses of hundreds of doctors against an AI.
The authors said the results, published in the journal Science, showed large language models (LLMs) “have eclipsed most benchmarks of clinical reasoning”.
One experiment focused on 76 patients who arrived at the emergency room of a Boston hospital. An AI and a pair of human doctors were each given the same standard electronic health record to read — typically including vital sign data, demographic information and a few sentences from a nurse about why the patient was there. The AI identified the exact or very close diagnosis in 67% of cases, beating the human doctors, who were right only 50%-55% of the time.
It showed the AIs’ advantage was particularly pronounced in triage circumstances requiring rapid decisions with minimal information. The diagnosis accuracy of the AI — OpenAI’s o1 reasoning model — rose to 82% when more detail was available, compared with the 70 – 79% accuracy achieved by the expert humans, though this difference was not statistically significant.
It also outperformed a larger cohort of human doctors when asked to provide longer term treatment plans, such as providing antibiotics regimes or planning end-of-life processes. The AI and 46 doctors were asked to examine five clinical case studies and the computer made significantly better plans, scoring 89% compared with 34% for humans using conventional resources, such as search engines.
But it is not curtains for emergency doctors yet, the researchers said. The study only tested humans against AIs looking at patient data that can be communicated via text. The AI’s reading of signals, such as the patient’s level of distress and their visual appearance, were not tested. That means the AI was performing more like a clinician producing a second opinion based on paperwork.
“I don’t think our findings mean that AI replaces doctors,” said Arjun Manrai, one of the lead authors of the study who heads an AI lab at Harvard Medical School. “I think it does mean that we’re witnessing a really profound change in technology that will reshape medicine.”
Dr Adam Rodman, another lead author and a doctor at Boston’s Beth Israel Deaconess medical centre where the study took place, said AI LLMs were among “the most impactful technologies in decades”. Over the next decade, he said, AI would not replace physicians but join them in a new “triadic care model … the doctor, the patient, and an artificial intelligence system”.
In one case in the Harvard study, a patient presented with a blood clot to the lungs and worsening symptoms. Human doctors thought the anti-coagulants were failing, but the AI noticed something the humans did not: the patient’s history of lupus meant this might be causing the inflammation of the lungs. The AI was proved correct.
Nearly one in five US physicians are already using AI to assist diagnosis, according to research published last month. In the UK, 16% of doctors are using the tech daily and a further 15% weekly, with “clinical decision-making” being one of the most common uses, according to a recent Royal College of Physicians survey.
The UK doctors’ biggest concerns were AI error and liability risks. Billions are being invested in AI healthcare companies, but questions remain about the consequences of AI error.
“There is not a formal framework right now for accountability,” said Rodman, who also stressed patients ultimately “want humans to guide them through life or death decisions [and] to guide them through challenging treatment decisions”.
Prof Ewen Harrison, co-director of the University of Edinburgh’s centre for medical informatics, said the study was important and showed that “these systems are no longer just passing medical exams or solving artificial test cases. They are starting to look like useful second-opinion tools for clinicians, particularly when it is important to consider a wider range of possible diagnoses and avoid missing something important.”
Dr Wei Xing, an assistant professor at the University of Sheffield’s school of mathematical and physical sciences, said some of the other findings suggested doctors may unconsciously defer to the AI’s answer rather than thinking independently.
“This tendency could grow more significant as AI becomes more routinely used in clinical settings,” he said. He also highlighted the lack of information about which patients the AI was worse at diagnosing and whether it struggled more with elderly patients or non-English speakers.
He said: “It does not demonstrate that AI is safe for routine clinical use, nor that the public should turn to freely available AI tools as a substitute for medical advice.”
Terminal User Interfaces (TUIs) are making a comeback. DHH’s Omarchy is made of three types of user interfaces: TUIs, for immediate feedback and bonus geek points, webapps because 37signals (his company) sells SAAS web applications and the unavoidable gnome-style native applications that really do not fit well in the style of the distro.
The same pattern occurred around 10 years ago in code editors. We came from the native editors of BBEdit, Textmate (also promoted by DHH), Notedpad++ and Sublime to Electro-powered apps like Atom, VSCode and all its forks. The hardcore, moved to vim or emacs, trading immediate feedback and higher usability for the steepest learning curve I’ve seen.
Windows
The lesson is clear: Native applications are losing. Windows is doing the GUI library standard joke. Because one API does not have success, they make up another one, just for that one to fail within the sea of alternatives that exist.
MFC (1992) wrapped Win32 in C++. If Win32 was inelegant, MFC was Win32 wearing a tuxedo made of other tuxedos. Then came OLE. COM. ActiveX. None of these were really GUI frameworks — they were component architectures — but they infected every corner of Windows development and introduced a level of cognitive complexity that makes Kierkegaard read like Hemingway.
MFC (1992) wrapped Win32 in C++. If Win32 was inelegant, MFC was Win32 wearing a tuxedo made of other tuxedos. Then came OLE. COM. ActiveX. None of these were really GUI frameworks — they were component architectures — but they infected every corner of Windows development and introduced a level of cognitive complexity that makes Kierkegaard read like Hemingway.
— Jeffrey Snover, in Microsoft hasn’t had a coherent GUI strategy since Petzold
Since then, Microsoft has gone through Winforms, WPF, Silverlight, WinUIs, MAUI without success. Many enterprise and personal desktop application still rely on Electron Apps, and the last memory of coherent visual integration of the whole OS I have is of Windows 98 or 2000.
It turns out that it’s a lot of work to recreate one’s OS and UI APIs every few years. Coupled with the intermittent attempts at sandboxing and deprecating “too powerful” functionality, the result is that each new layer has gaps, where you can’t do certain things which were possible in the previous framework.
It turns out that it’s a lot of work to recreate one’s OS and UI APIs every few years. Coupled with the intermittent attempts at sandboxing and deprecating “too powerful” functionality, the result is that each new layer has gaps, where you can’t do certain things which were possible in the previous framework.
— Domenic Denicola, in Windows Native App Development Is a Mess
Linux
The UI inconsistency in Linux was created by design. Different teams wanted different outcomes and they had the freedom to do it. GTK and Qt became the two reigning frameworks. While Qt is most known for it, both aimed to support cross-platform native development (once upon a time, I successfully compiled gedit on Windows, learning a lot about C compilation, make files and environment variables in the process) but are only widely used in Linux land. Luckily, applications made in the different toolkits can look okay-ish next to each other, something that the different frameworks on Windows fail to achieve. How many engineer-hours does it take to redo the windows Control Panel?
Given the difficulty in testing the million different combinations of distros, desktop environments and hardware in general, most companies do not bother with a native Linux application — they either address it using electron (minting the lock-down), or they let the open-source community solve it self (when they have open APIs).
macOS
Apple used to be a one-book religion. Apple’s Human Interface Guidelines used to be cited by every User Interface course over the world. Xerox PARC and Apple were the two institutions that studied what it means to have a good human interface. Fast forward a few decades, and Apple is doing the best worst it can to break all the guidelines and consistency it was known for.
Now, Apple has been ignoring Fitts’ law, making resizing windows near-impossible (even after trying to fix it) and adding icons to every single menu. MacOS is no longer the safe heaven where designers can work peacefully.
Electron
Everyone knows that the user experience of electron apps sucks. The most popular claim is the memory consumption, which to be fair has been decreasing over the last decade, but my main complaint (as I usually drive a 64GB RAM MacBook Pro) is the lack of visual consistency and lack of keyboard-driven workflows. Looking at my dock, I have 8 native apps (text mate and macOS system utilities) and 6 electron apps (Slack, Discord, Mattermost, VScode, Cursor, Plexampp). And that’s from someone who really wishes he could avoid having any electron app at all.
Let us take the example of Cursor (but would be true for VSCode as well). If you are in the agent panel, requesting your next feature, can you move to the agent list on the side panel with just the keyboard? Can you archive it? These are actions that should be the same across every macOS application, and even if there are shortcuts, they are not announced in the menus. And over the last decade, developers have been forgetting to add menus to do the same things that are available in their application (mostly because the application is HTML within its sandbox). For the record, Slack does this better than the others, but it’s not perfect.
Restarting from scratch
Together with Dart, Google wanted to design a new operating system, without all the legacy of Android, for new devices. It wanted a fresh UI toolkit (Flutter UI) but Google gave up on the project before a real product was launched. It’s one of those situations where having a monopoly (or a large enough slice of the market) is required to succeed.
Meanwhile, Zed did the same thing in Rust: they designed their own GPU-renderer library (GPUI) which is cross-platform. Despite the high-speed, it lacks integration with the host OS on itself, requiring the developers to add the right bindings. Personally, I would rather have a slow renderer that integrated with my OS than the extra speed.
TUIs
TUIs are fast, easy to automate (RIP Automator) and work reasonably well in different operating systems. You can even run them remotely without any headache-inducing X forwarding. When the native UI toolkits fail, we go back to basics. Claude and Codex have been very successful on the command-line: you focus on the interaction and forget about the operating system around you. You can even drive code and apps on cloud machines, or remote into your GPU-powered machine from your iPad. TUIs are filling the void left by Apple and Microsoft in the post-apocalyptic world where every application looks different. Which is good if you are doing art (including computer games), but not if your goal is to get out of the way of letting the user do their job.
What’s next
A checkbox is also part of an interface. You’re using it to interact with a system by inputting data. Interfaces are better the less thinking they require: whether the interface is a steering wheel or an online form, if you have to spend any amount of time figuring out how to use it, that’s bad. As you interact with many things, you want homogeneous interfaces that give you consistent experiences. If you learn that Command + C is the keyboard shortcut for copy, you want that to work everywhere. You don’t want to have to remember to use CTRL + Shift + C in certain circumstances or right-click → copy in others, that’d be annoying.
A checkbox is also part of an interface. You’re using it to interact with a system by inputting data. Interfaces are better the less thinking they require: whether the interface is a steering wheel or an online form, if you have to spend any amount of time figuring out how to use it, that’s bad. As you interact with many things, you want homogeneous interfaces that give you consistent experiences. If you learn that Command + C is the keyboard shortcut for copy, you want that to work everywhere. You don’t want to have to remember to use CTRL + Shift + C in certain circumstances or right-click → copy in others, that’d be annoying.
— John Loeber in Bring Back Idiomatic Design
We need to go back to the basics. Every developer should learn the theory of what makes a good User Interface (software or not!), like Nielsen, Norman or Johnson, and stop treating User Design as a soft skill that does not matter in the Software Engineering Curriculum. In any course, if the UI does not make any sense, the project should be failed. And in the HCI course, we should aim for perfect UIs. It takes work, but that work is mostly about understanding what we need. The programming is already being automated.
Operating systems and Toolkits authors should drive this investment. They should focus on making accessible toolkits that developers want to use, and lower the barrier to entry, making those platforms last as long as possible. I do not necessarily argue for cross-platform support, but having one such solution would help reduce the electron and TUI dependency.
To use the Mastodon web application, please enable JavaScript. Alternatively, try one of the native apps for Mastodon for your platform.
Please enable JS and disable any ad blocker
For the first time in twenty-five years I’m sitting in front of a computer where almost every program I touch was designed by me. One tool at a time, the off-the-shelf option got swapped out for something a little closer to how my hands wanted to work. (I wrote about the start of this a couple of weeks ago — that post laid out the early swaps; this one is the view from the other side of the journey.)
It’s been a crazy few weeks guiding Claude Code inbetween all the other stuff I’m doing in life. I direct CC, it works while I do other stuff. I get a second or few in between tasks, and I respond. Then off it goes adding features or hunting bugs.
Two suites in a happy marriage: CHasm, the bedrock — pure x86_64 assembly, no libc, the layer that paints pixels and reads keys. Fe₂O₃, the application layer in Rust, sitting on a small shared TUI library called crust.
The CHasm layer (assembly)
The Fe₂O₃ layer (Rust on crust)
What’s left? WeeChat for IRC and other chats. Firefox — the only GUI program I still use regularly. That’s it. Everything else is mine.
The vim line
Let me get a bit sentimental about vim, because vim was the one I thought I’d never replace.
I started using it in 2001. For twenty-five years, every email I wrote went through vim. Every article. Every blog post. Every line of code, every HyperList, and every book. It was the one tool I would have called part of how I think. The muscle memory was so deep that I’d open random text fields in browsers and ended with typing :w.
Then in three days I had scribe and stopped using vim.
The first commit landed at 00:09 on May 1st. By afternoon today (May 3rd) vim was replaced. Twenty-five years of muscle memory rerouted in seventy-two hours.
Vim is wonderful, but scribe is mine. It’s modal like vim, but missing the ninety percent of features I never used, and carrying the handful of writer-shaped tweaks I always wished vim had. Soft-wrap by default. Reading mode with Limelight-style focus. AI in the prompt without leaving the buffer. HyperList editing with full syntax highlighting and the encryption format the Ruby HyperList app uses. Persistent registers shared across concurrent sessions is a cool feature. None of it revolutionary, but all of it shaped to my exact workflow. And whenever I think of an enhancement I want, it’s just minutes away. It used to be waiting for months or years or forever for some developer to get the same idea as mine and introduce it into the tool I use.
Why this is possible now
It used to be that writing your own editor, your own file manager, your own window manager, was a project of years. I know, it took me a few years to get RTFM right. A serious undertaking with a serious cost. The economics of it didn’t work for most people, even programmers. You’d touch a piece of it, get most of the way, run out of weekend, and go back to the off-the-shelf tool.
That barrier is much lower now. With Rust, CC as the workhorse, and the fact that the hard problems of TUI programming have been documented to death… the cost of “build the tool you actually want” has fallen by orders of magnitude.
I don’t think this is a story about AI or about Rust specifically. Both helped. But the deeper point is that the gap between “I wish my editor did X” and “okay, here’s an editor that does X” is now small enough to fit inside a few evenings of focused work.
I’m not selling anything
I should say what this post is not.
It’s not an invitation to use my software. Honestly, please don’t. None of it is built for you. It’s built for me — for the way I hold my hands, the way I think about email, the way I want my calendar to render. I’m sure other people would find a hundred sharp edges I’ve never noticed because they happen to align perfectly with what I do.
It’s also not a request for kudos. The code isn’t novel, nor are the ideas. There’s nothing here that hasn’t been done before by someone with more taste, discipline or talent.
What I want to do is show one specific thing: it is now genuinely feasible to make a desktop computing environment that fits one person. Instead of a configuration of someone else’s tools. This is no longer a heroic decade-long undertaking. This is an actual, weekend-by-weekend, “this thing in my life now does exactly what I want” replacement.
The joy of an audience of one
The best part of building for myself: the relief of not having to care.
I don’t have to think about configurability for someone with different preferences. And I don’t have to support corner cases I’d never personally hit. Nor do I have to write documentation for users who don’t exist. No more arguing on issue trackers about whether a default is the right default — of course it’s the right default, it’s the one I want.
The editor’s \? cheatsheet shows the keys I memorised, in the order I prefer, with the bindings I think are sensible. Arrogance? Nope, it’s design without committee. The audience is one person. Decisions take seconds.
It turns out an enormous amount of software complexity comes from accommodating users who aren’t you. Strip that out and what’s left is small, fast, exactly-shaped, and a quiet pleasure to use.
So
If you’ve ever caught yourself thinking “I wish my editor / file manager / status bar / shell just did this one thing differently” and you’ve been told the answer is to write a plugin, learn an obscure config language, or accept the way it is, then consider that the third option is more available than it used to be: Build Your Own Software (BYOS).
You probably won’t replace your whole desktop. I didn’t plan to either. But the satisfaction of having even one tool in your daily workflow that fits you exactly is worth a weekend.
I’m a rabbit in spring :)
Metal Gear Solid 2′s HD Port Just Had Its Entire Source Code Leaked
Published May 1, 2026, 6:00 PM EDT
Quinton is a Staff Writer from the United States. In his youth, Quinton was ridiculed for making video game ranking lists instead of paying attention in math class. In adulthood, people sometimes pay him for it. Life’s a trip.
Taking his first steps into the industry in 2020, Quinton has written for several digital publications, but his permanent literary home is right here at TheGamer.
Before striking up a conversation with Quinton, consider the risks: he’ll find a way to transform almost any topic into an analysis of either world history, Star Trek, or - at least this one’s relevant to his career - all his favorite role-playing games.
Sign in to your TheGamer account
It’s always an exciting day when a beloved video game’s source code leaks on to the internet. We’ve seen such tremendous good times with projects like Ship of Harkinian, for The Legend of Zelda: Ocarina of Time, bringing fresh coats of paint and incredible mods galore to timeless works of art.
If you just so happen to have dreamed big for Metal Gear Solid 2 modding, the world may now be your oyster, as the full source code just hit the net. It’s not the original PlayStation 2 version, but rather, the 2011 HD remaster, so hey. It even comes in (relatively!) crisp 720p. These are, I’ve seen it said, uncompressed assets, including a whopping 30 gigs’ worth of unused material.
If Only It Happened A Single Day Sooner
May 1, 2026. That is, as of this writing, today. Would that this had occured on April 30. MGS2 fans know exactly what I’m on about. But I digress—let’s dig in.
This is actually the PlayStation Vita and Xbox 360 ports’ code, from what I’m reading right now, specifically from work done by support studio Armature. For most of us, Metal Gear Solid: Master Collection Volume 1 is still the way to go for MGS2, but once the ball gets rolling on decompiling the code, who knows what the future might bring?
I’m still in the process of verifying certain details here. Kotaku, for instance, is reporting that this is actually devoid of assets, which runs quite contrary to the above tweet. Even if it’s “just” the code, however, this remains a tremendous milestone for games preservation and for the modding scene for years to come. As for where this leak took place, the answer’s 4chan; I won’t link directly to either of the pertinent threads on there, but the Kotaku article has been gracious enough to do so for us.
As there is conflicting information right now concerning the exact details of the leak’s contents, this article may be updated further before day’s end. Regardless, it’s a pretty good time to be a Metal Gear fan.
Next
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.