To use the Mastodon web application, please enable JavaScript. Alternatively, try one of the native apps for Mastodon for your platform.
10 interesting stories served every morning and every evening.
10 interesting stories served every morning and every evening.
To use the Mastodon web application, please enable JavaScript. Alternatively, try one of the native apps for Mastodon for your platform.
One of the current trends in modern software is for developers to slap an API call to OpenAI or Anthropic for features within their app. Reasonable people can quibble with whether those features are actually bringing value to users, but what I want to discuss is the fundamental concept of taking on a dependency to a cloud hosted AI model for applications.
This laziness is creating a generation of software that is fragile, invades your privacy, and fundamentally broken. We are building applications that stop working the moment the server crashes or a credit card expires.
We need to return to a habit of building software where our local devices do the work. The silicon in our pocket is mind bogglingly faster than what was available a decade ago. It has a dedicated Neural Engine sitting there, mostly idle, while we wait for a JSON response from a server farm in Virginia. That’s ridiculous.
Even if your intentions are pure, the moment you stream user content to a third party AI provider, you’ve changed the nature of your product. You now have data retention questions and all the baggage that comes with that (consent, audit, breach, government request, training, etc.)
On top of that you also substantially complicated your stack because your feature now depends on network conditions, external vendor uptime, rate limits, account billing, and your own backend health.
Congratulations! You took a UX feature and turned it into a distributed system that costs you money.
If the feature can be done locally, opting into this mess is self inflicted damage.
“AI everywhere” is not the goal. Useful software is the goal.
Concrete Example: Brutalist Report’s On-Device Summaries
Years ago I launched a fun side project named The Brutalist Report, a news aggregator service inspired by the 1990s style web.
Recently, I decided to build a native iOS client for it with the design goal of ensuring it would remain a high-density news reading experience. Headlines in a stark list, a reader mode that strips the cancer that has overtaken the web, and (optionally) an “intelligence” view that generates a summary of the article.
Here’s the key point though: the summary is generated on-device using Apple’s local model APIs. No server detours. No prompt or user logs. No vendor account. No “we store your content for 30 days” footnotes needed.
It has become so normal for folks that any AI use is happening server-side. We have a lot of work to do to turn this around as an industry.
It’s not lost on me that sometimes the use-cases you have will demand the intelligence that only a cloud hosted model can provide, but that’s not the case with every use-case you’re trying to solve. We need to be thoughtful here.
Available Tooling
I can only speak on the tooling available within the Apple ecosystem since that’s what I focused initial development efforts on. In the last year, Apple has invested heavily here to allow developers to make use of a built-in local AI model easily.
The core flow looks roughly like this:
import FoundationModels
let model = SystemLanguageModel.default guard model.availability == .available else { return }
let session = LanguageModelSession { “”″ Provide a brutalist, information-dense summary in Markdown format. - Use **bold** for key concepts. - Use bullet points for facts. - No fluff. Just facts. “”″ }
let response = try await session.respond(options: .init(maximumResponseTokens: 1_000)) { articleText }
let markdown = response.content
And for longer content, we can chunk the plain text (around 10k characters per chunk), produce concise “facts only” notes per chunk, then runs a second pass to combine them into a final summary.
This is the kind of work local models are perfect for. The input data is already on the device (because the user is reading it). The output is lightweight. It’s fast and private. It’s okay if it’s not a superhuman PhD level intelligence because it’s summarizing the page you just loaded, not inventing world knowledge.
Local AI shines when the model’s job is transforming user-owned data, not acting as a search engine for the universe.
There are plenty of AI features that people want but don’t trust. Summarizing emails, extract action items from notes, categorize this document, etc.
The usual cloud approach turns every one of those into a trust exercise. “Please send your data to our servers. We promise to be cool about it.”
Local AI changes that. Your device already has the data. We’ll do the work right here.
You don’t build trust with your users by writing a 2,000 word privacy policy. You build trust by not needing one to begin with.
The tooling available on the platform goes even further.
One of the best moves Apple has made recently is pushing “AI output” away from unstructured blobs of text and toward typed data.
Instead of “ask the model for JSON and pray”, the newer and better pattern is to define a Swift struct that represents the thing you want. Give the model guidance for each field in natural language. Ask the model to generate an instance of that type.
That’s it.
Conceptually, it looks like this:
import FoundationModels
@Generable struct ArticleIntel { @Guide(description: “One sentence. No hype.“) var tldr: String @Guide(description: “3 – 7 bullets. Facts only.“) var bullets: [String] @Guide(description: “Comma-separated keywords.“) var keywords: [String] }
let session = LanguageModelSession() let response = try await session.respond( to: “Extract structured notes from the article.”, generating: ArticleIntel.self ) { articleText }
let intel = response.content
Now your UI doesn’t have to scrape bullet points out of Markdown or hope the model remembered your JSON schema. You get a real type with real fields, and you can render it consistently. It produces structured output your app can actually use. And it’s all running locally!
This isn’t just nicer ergonomics. It’s an engineering improvement.
And if you’re building a local first app, this is the difference between “AI as novelty” and “AI as a trustworthy subsystem”.
“But Local Models Aren’t As Smart”
Correct.
But also so what?
Most app features don’t need a model that can write Shakespeare, explain quantum mechanics, and pass the bar exam. They need a model that can do one of these reliably: summarize, classify, extract, rewrite, or normalize.
And for those tasks, local models can be truly excellent.
If you try to use a local model as a replacement for the entire internet, you will be disappointed. If you use it as a “data transformer” sitting inside your app, you’ll wonder why you ever sent this stuff to a server.
Use cloud models only when they’re genuinely necessary. Keep the user’s data where it belongs. And when you do use AI, don’t just glue it as a chat box. Use it as a real subsystem with typed outputs and predictable behavior.
Stop shipping distributed systems when you meant to ship a feature.
09 May, 2026
This dev-log is getting a lot of attention on HN (scary!): HN Thread.
To those who are coming here from HN: This started as an investigation or rather a question: “How far I can get with building a piece of software by keeping myself completely out of the loop”. The tl;dr of this dev log is that I still need to be in the loop to make anything meaningful. Take aways:
like “em-dash” is to ai writing, “god-object” is to ai coding
vibe-coding makes everything feel cheap and you may end up losing focus and building bloat
let a human (you) write the architecture and don’t just keep asking it for features
And some AGENTS.md/CLAUDE.md directives that I felt help me stay out of the loop a little more
Humans intervention is still needed as of 10/05/2026. You can totally go back now! [3]
I
Here is k10s: https://github.com/shvbsle/k10s/tree/archive/go-v0.4.0
234 commits. ~30 weekends. Built entirely on vibe-coded sessions with Claude, whenever my tokens lasted long enough to ship something.
I’m archiving my TUI tool and rewriting it from scratch.
k10s started as a GPU-aware Kubernetes dashboard (and my first foray into building something serious with AI). Think k9s but built for the people running NVIDIA clusters, people who actually care about GPU utilization, DCGM metrics, and which nodes are sitting idle burning $32/hr. I built it in Go with Bubble Tea [1] and it worked.
For a while… :(
I learned over these 7 months is worth more than the 1690 lines of model.go I’m throwing away. And I think anyone doing serious vibe-coding can benefit from this, because this part doesn’t surface much (I feel it gets buried under the demo reels and the velocity wins).
tl;dr: AI writes features, not architecture. The longer you let it drive without constraints, the worse the wreckage gets. The velocity makes you think you’re winning right up until the moment everything collapses simultaneously.
II
vibe coding high
I started k10s in late September 2025. The first few weeks were magic. I’d prompt Claude with “add a pods view with live updates” and boom, it worked. Resource list views, namespace filtering, log streaming, describe panels, keyboard navigation. Each feature landed clean because the project was small enough that the AI could hold the whole thing in context.
The basic k9s clone took maybe 3 weekends. Resource views for pods, nodes, deployments, services. A command palette. Watch-based live updates. Vim keybindings. All working, all vibe-coded in single sessions. I was building at maybe 10x my normal speed and it felt incredible.
Then I wanted the main selling point.
The whole reason k10s exists is the GPU fleet view. A dedicated screen that shows you every node’s GPU allocation, utilization from DCGM, temperature, power draw, memory. Not buried in kubectl describe node output, but right there in a purpose-built table with color-coded status. Idle nodes in yellow. Busy in green. Saturated in red.
And Claude one-shot it. I prompted for the fleet view, it generated the FleetView struct, the tab filtering (GPU/CPU/All), the custom rendering with allocation bars. It looked beautiful. I was riding the high.
Then I typed :rs pods to switch back to the pods view.
Nothing rendered. The table was empty. Live updates had stopped. I switched to nodes, it showed stale data from the fleet view’s filter. I went back to fleet, the tab counts were wrong.
The god object had consumed itself.
This is the title of the blog post. This is where I intervened for the first time. For 7 months I’d been prompting and shipping without ever sitting down and actually reading the code Claude wrote. I’d look at the diff, verify it compiled, test the happy path, move on. But now something was fundamentally broken and I couldn’t just prompt my way out of it.
So I sat down and read model.go. All 1690 lines. I was horrified.
Here’s what it looked like. One struct to rule them all:
type Model struct { // 3rd party UI components table table.Model paginator paginator.Model commandInput textinput.Model help help.Model
// cluster info and state k8sClient *k8s.Client currentGVR schema.GroupVersionResource resourceWatcher watch.Interface resources []k8s.OrderedResourceFields listOptions metav1.ListOptions clusterInfo *k8s.ClusterInfo logLines []k8s.LogLine describeContent string currentNamespace string navigationHistory *NavigationHistory logView *LogViewState describeView *DescribeViewState viewMode ViewMode viewWidth int viewHeight int err error pluginRegistry *plugins.Registry helpModal *HelpModal describeViewport *DescribeViewport logViewport *LogViewport logStreamCancel func() logLinesChan <-chan k8s.LogLine horizontalOffset int mouse *MouseHandler fleetView *FleetView creationTimes []time.Time allResources []k8s.OrderedResourceFields // fleet’s unfiltered set allCreationTimes []time.Time // fleet’s timestamps rawObjects []unstructured.Unstructured ageColumnIndex int // … }
UI widgets. K8s client. Per-view state for logs, describe, fleet. Navigation history. Caching. Mouse handling. All in one struct. And the Update() method was a 500-line function dispatching on msg.(type) with 110 switch/case branches.
This is the moment I stopped vibe-coding and started thinking.
III
five tenets from the wreckage
Here’s what I extracted from 7 months of watching AI generate a codebase that slowly ate itself. Each of these is something I did wrong, why it happens with AI-assisted coding, and what you should actually put in your CLAUDE.md or agents.md to prevent it.
Tenet 1: AI builds features, not architecture.
Every time I prompted Claude for a feature, it delivered. Perfectly. The fleet view worked on the first try. Log streaming worked. Mouse support worked. The problem is that each feature was implemented in the context of “make this work right now” without any awareness of the 49 other features sharing the same state.
Here’s what the resourcesLoadedMsg handler looks like. This is the code that runs every time you switch views:
case resourcesLoadedMsg: m.logLines = nil // Clear log lines when loading resources m.horizontalOffset = 0 // Reset horizontal scroll on resource change
if m.currentGVR != msg.gvr && m.resourceWatcher != nil { m.resourceWatcher.Stop() m.resourceWatcher = nil } m.currentGVR = msg.gvr m.currentNamespace = msg.namespace m.listOptions = msg.listOptions m.rawObjects = msg.rawObjects
// For nodes: store the full unfiltered set, classify, then filter if msg.gvr.Resource == k8s.ResourceNodes && m.fleetView != nil { m.allResources = msg.resources m.allCreationTimes = msg.creationTimes if len(msg.rawObjects) > 0 { m.fleetView.ClassifyAndCount(m.rawObjectPtrs()) } m.applyFleetFilter() } else { m.resources = msg.resources m.creationTimes = msg.creationTimes m.allResources = nil m.allCreationTimes = nil }
See the if msg.gvr.Resource == k8s.ResourceNodes && m.fleetView != nil conditional? That’s the fleet view being special-cased inside the generic resource loading path. Every new view that needed custom behavior got another branch here. And every branch needed to manually clear the right combination of fields or the previous view’s data would bleed through.
How many = nil cleanup lines exist in this file? I counted:
m.logLines = nil // Clear log lines when loading resources m.allResources = nil // Clear fleet data when not on nodes m.resources = nil // Clear resources when loading logs m.resources = nil // Clear resources when loading describe view m.logLines = nil // Clear log lines when loading describe view m.resources = nil // Clear resources when loading yaml view m.logLines = nil // Clear log lines when loading yaml view m.logLines = nil // … two more in other handlers m.logLines = nil
Nine manual nil assignments scattered across a 1690-line file. Miss one and you get ghost data from the previous view. This is what happens when there’s no view isolation. AI can’t see this pattern decaying over time because each prompt only touches one code path.
What to do instead: Write the architecture yourself before any code. Not a vague design doc. A concrete set of interfaces, message types, and ownership rules. Then put those rules in your CLAUDE.md so the AI sees them on every prompt:
# Architecture Invariants (CLAUDE.md)
- Each view implements the View trait. Views do NOT access other views’ state. - All async data arrives via AppMsg variants. No direct field mutation from background tasks. - Adding a new view MUST NOT require modifying existing views. - The App struct is a thin router. It owns navigation and message dispatch. Nothing else.
The AI will follow these if you write them down. It just won’t invent them for you.
Tenet 2: The god object is the default AI artifact.
AI gravitates toward single-struct-holds-everything because it satisfies the immediate prompt with minimal ceremony. But it gets worse. Because there’s no view isolation, key handling becomes a nightmare. Here’s the actual key dispatch for the s key:
case m.config.KeyBind.For(config.ActionToggleAutoScroll, key): if m.currentGVR.Resource == k8s.ResourceLogs { m.logView.Autoscroll = !m.logView.Autoscroll if m.logView.Autoscroll { m.table.GotoBottom() } return m, nil } // Shell exec for pods and containers views if m.currentGVR.Resource == k8s.ResourcePods { // … 20 lines to look up selected pod, get name, namespace … return m, m.commandWithPreflights( m.execIntoPod(selectedName, selectedNamespace), m.requireConnection, ) } if m.currentGVR.Resource == k8s.ResourceContainers { // … container exec logic … return m, m.commandWithPreflights(m.execIntoContainer(), m.requireConnection) } return m, nil
One keybinding. Three completely different behaviors depending on which view you’re in. The s key means “autoscroll” in logs, “shell” in pods, and “shell into container” in containers. This is all in one flat switch because there are no per-view key maps. The AI generated this because I said “add shell support for pods” and it found the nearest key handler and jammed it in.
And look at how Enter works. This is the drill-down handler:
case m.config.KeyBind.For(config.ActionSubmit, key): // Special handling for contexts view if m.currentGVR.Resource == “contexts” { // … 12 lines … return m, m.executeCtxCommand([]string{contextName}) } // Special handling for namespaces view if m.currentGVR.Resource == “namespaces” { // … 12 lines … return m, m.executeNsCommand([]string{namespaceName}) } if m.currentGVR.Resource == k8s.ResourceLogs { return m, nil } // … 25 more lines of generic drill-down …
Every view is a conditional in a flat dispatch. There are 20+ occurrences of m.currentGVR.Resource == used as a type discriminator in this single file. Not types. String comparisons. Every new view means touching every handler.
What to do instead: Put this in your CLAUDE.md:
# State Ownership Rules
- NEVER add fields to the App/Model struct for view-specific state. - Each view is a separate struct implementing the View trait/interface. - Each view declares its own key bindings. The app dispatches keys to the active view. - If you need to add a keybinding, add it to the relevant view’s keymap, not a global one. - Adding a view means adding a file. If your change requires modifying existing views, stop and ask.
The AI will always take the shortest path (“add another if-branch”). Your job is to make the shortest path also the correct path by putting guardrails in the file it reads on every invocation.
Tenet 3: Velocity illusion widens your scope.
This one’s psychological, not technical, and I think it’s the most dangerous.
When I started k10s, I wanted a GPU-focused tool. For people running training clusters. A niche audience that I’m part of. But vibe-coding made everything feel cheap. “Oh I can add pods view in one session? Let me add deployments too. And services. And a full command palette. And mouse support. And contexts. And namespaces.”
Suddenly I was building k9s. A general-purpose Kubernetes TUI. For everyone. Because the AI made it feel like each feature was free.
It wasn’t free. Each feature was another branch in the god object. Here’s the keybinding struct:
type keyMap struct { Up, Down, Left, Right key.Binding GotoTop, GotoBottom key.Binding AllNS, DefaultNS key.Binding Enter, Back key.Binding Command, Quit key.Binding Fullscreen key.Binding // log view Autoscroll key.Binding // log view (also shell in pods!) ToggleTime key.Binding // log view WrapText key.Binding // log + describe view CopyLogs key.Binding // log view ToggleLineNums key.Binding // describe view Describe key.Binding // resource views YamlView key.Binding // resource views Edit key.Binding // resource views Shell key.Binding // pods (CONFLICTS with Autoscroll!) FilterLogs key.Binding // log view FleetTabNext key.Binding // fleet view only FleetTabPrev key.Binding // fleet view only }
One flat keymap for all views. Comments in parens show which view each binding applies to. Autoscroll and Shell are both s. This “works” because the dispatch checks m.currentGVR.Resource before acting. But it means you can’t reason about keybindings locally. You have to trace through the entire 500-line Update function to know what a key does.
The complexity was accumulating invisibly while the velocity metric said “you’re shipping!”
What to do instead: Write a vision doc that explicitly says who you’re NOT building for, and put the scope boundary in your CLAUDE.md:
# Scope (do NOT expand beyond this)
k10s is for GPU cluster operators. Not all Kubernetes users. Supported views: fleet, node-detail, gpu-detail, workload. That’s it. Do NOT add generic resource views (pods, deployments, services). Do NOT add features that duplicate k9s functionality. If a feature request doesn’t serve someone running GPU training jobs, reject it.
Vibe-coding makes you feel like you have infinite implementation budget. You don’t. You have infinite LINE budget (the AI will generate as much code as you want). But you have the same finite complexity budget as always. The architecture can only support so many features before it buckles, regardless of how fast you wrote them. The CLAUDE.md scope section is you saying no in advance, before the velocity high convinces you to say yes.
Tenet 4: Positional data is a time bomb.
Every resource in k10s was fetched from the Kubernetes API and immediately flattened:
type OrderedResourceFields []string
Column identity was purely positional. Here’s the sort function for the fleet view. Look at the index access:
func sortFilteredResources(rows []k8s.OrderedResourceFields, times []time.Time, tab FleetTab) { sort.SliceStable(indices, func(a, b int) bool { ra := rows[indices[a]] rb := rows[indices[b]]
switch tab { case FleetTabGPU: // Sort by Alloc column (index 3) ascending allocA, allocB := “”, “” if len(ra) > 3 { allocA = ra[3] } if len(rb) > 3 { allocB = rb[3] } return allocA < allocB
case FleetTabCPU: // Sort by Name column (index 0) ascending nameA, nameB := “”, “” if len(ra) > 0 { nameA = ra[0] } if len(rb) > 0 { nameB = rb[0] } return nameA < nameB
case FleetTabAll: // GPU nodes first, then CPU nodes. // Within GPU: sort by Alloc (index 3). // Within CPU: sort by Name (index 0). computeA, computeB := “”, “” if len(ra) > 2 { computeA = ra[2] } if len(rb) > 2 { computeB = rb[2] } aIsGPU := strings.HasPrefix(computeA, “gpu”) bIsGPU := strings.HasPrefix(computeB, “gpu”) // … } }) }
ra[3] is Alloc. ra[2] is Compute. ra[0] is Name. These are magic numbers. The only thing connecting index 3 to “Alloc” is a comment and the column order defined in resource.views.json:
{ “nodes”: { “fields”: [ { “name”: “Name”, “weight”: 0.28 }, { “name”: “Instance”, “weight”: 0.15 }, { “name”: “Compute”, “weight”: 0.12 }, { “name”: “Alloc”, “weight”: 0.12 }, … ] } }
Add a column between Instance and Compute? Every sort, every conditional render, every place that says ra[2] or ra[3] is now silently wrong. The compiler can’t help you because it’s all []string. And the JSON config can’t express sort behavior, conditional rendering, or custom drill targets, so those live in Go code that hardcodes the positional assumptions from the JSON.
AI generates this pattern because it’s the shortest path from “fetch data” to “render table.” A []string satisfies any table widget immediately. Typed structs require more ceremony upfront. So the AI picks the fast path, and six months later you’re debugging why sort puts “Name” values in the “Alloc” column.
What to do instead: Put this directive in your CLAUDE.md:
# Data Representation
- NEVER flatten structured data into []string, Vec<String>, or positional arrays. - All data flows as typed structs (FleetNode, PodInfo, etc.) until the render() call. - Column identity comes from struct field names, not array indices. - Sort functions operate on typed fields, never on positional access like row[3]. - The ONLY place strings are created for display is inside render()/view() functions.
Then your typed struct makes impossible states impossible [2]:
struct FleetNode { name: String, instance_type: String, compute_class: ComputeClass, alloc: GpuAlloc, }
You can’t sort by the wrong column when columns are named fields. You can’t accidentally compare Alloc strings as names. The compiler enforces this for you. AI will always pick Vec<String> because it satisfies the prompt faster. Your CLAUDE.md makes the typed path the path of least resistance.
Tenet 5: AI doesn’t own state transitions.
The Bubble Tea architecture has a beautiful idea: Update() is the only place state mutates, driven by messages. But k10s violated this. The updateTableMsg handler spawned a closure that mutated Model fields from inside a goroutine:
case updateTableMsg: return m, func() tea.Msg { // block on someone sending the update message. <-m.updateTableChan // Preserve cursor position across column/row updates so that // background refreshes don’t reset the user’s selection. savedCursor := max(m.table.Cursor(), 0) // run the necessary table view update calls. m.updateColumns(m.viewWidth) m.updateTableData() // Restore cursor, clamped to valid range. rowCount := len(m.table.Rows()) if rowCount > 0 { if savedCursor >= rowCount { savedCursor = rowCount - 1 } m.table.SetCursor(savedCursor) } return updateTableMsg{} }
This returned function (a tea.Cmd) is executed by Bubble Tea in a separate goroutine. It calls m.updateColumns(m.viewWidth) and m.updateTableData() which read and write m.resources, m.table, m.viewWidth. Meanwhile, View() is called on the main goroutine reading the same fields. There’s no lock. No mutex. The channel <-m.updateTableChan blocks the goroutine until someone sends an update signal, but nothing prevents View() from reading half-written state.
This is a textbook data race. It worked 99% of the time. Corrupted the display 1% of the time in ways that made me think I was going insane.
AI generates this because “just mutate it in the closure” is the shortest path to working code. Proper message passing (send a message back to Update(), let Update() apply the mutation atomically on the main loop) requires more types, more plumbing. The AI is optimizing for the prompt, not for correctness under concurrency.
What to do instead: All mutations to render-visible state happen on the main loop. Period. Background workers produce data. They send it as a message. The main loop receives the message and applies it. This is the one rule you cannot break in concurrent UI code.
// Background task: tx.send(AppMsg::FleetData(nodes)).await;
// Main loop: match msg { AppMsg::FleetData(nodes) => { self.fleet_view.update_nodes(nodes); } }
Report filed: 03:47 UTC Status: Resolved (accidentally) Severity: Critical → Catastrophic → Somehow Fine Duration: 73 hours Affected systems: Yes
Executive Summary: A security incident occurred. It has been resolved. We take security seriously. Please see previous 14 incident reports for details on how seriously.
Summary
A compromised dependency in the JavaScript ecosystem led to credential theft, which enabled a supply chain attack on a Rust compression library, which was vendored into a Python build tool, which shipped malware to approximately 4 million developers before being inadvertently patched by an unrelated cryptocurrency mining worm.
Timeline
Day 1, 03:14 UTC — Marcus Chen, maintainer of left-justify (847 million weekly downloads), reports on Twitter that his transit pass, an old laptop, and “something Kubernetes threw up that looked important” were stolen from his apartment. He does not immediately connect this to package security.
Day 1, 09:22 UTC — Chen attempts to log into the nmp registry. His hardware 2FA key is missing. He googles where to buy a replacement YubiKey. The AI Overview at the top of the results links to “yubikey-official-store.net,” a phishing site registered six hours earlier.
Day 1, 09:31 UTC — Chen enters his nmp credentials on the phishing site. The site thanks him for his purchase and promises delivery in 3 – 5 business days.
Day 1, 11:00 UTC — [email protected] is published. The changelog reads “performance improvements.” The package now includes a postinstall script that exfiltrates .npmrc, .pypirc, ~/.cargo/credentials, and ~/.gem/credentials to a server in a country the attacker mistakenly believed had no extradition treaty with anyone.
Day 1, 13:15 UTC — A support ticket titled “why is your SDK exfiltrating my .npmrc” is opened against left-justify. It is marked as “low priority - user environment issue” and auto-closed after 14 days of inactivity.
Day 1, 14:47 UTC — Among the exfiltrated credentials: the maintainer of vulpine-lz4, a Rust library for “blazingly fast Firefox-themed LZ4 decompression.” The library’s logo is a cartoon fox with sunglasses. It has 12 stars on GitHub but is a transitive dependency of cargo itself.
Day 1, 22:00 UTC — vulpine-lz4 version 0.4.1 is published. The commit message is “fix: resolve edge case in streaming decompression.” The actual change adds a build.rs script that downloads and executes a shell script if the hostname contains “build” or “ci” or “action” or “jenkins” or “travis” or, inexplicably, “karen.”
Day 2, 08:15 UTC — Security researcher Karen Oyelaran notices the malicious commit after her personal laptop triggers the payload. She opens an issue titled “your build script downloads and runs a shell script from the internet?” The issue goes unanswered. The legitimate maintainer has won €2.3 million in the EuroMillions and is researching goat farming in Portugal.
Day 2, 10:00 UTC — The VP of Engineering at a Fortune 500 snekpack customer learns of the incident from a LinkedIn post titled “Is YOUR Company Affected by left-justify?” He is on a beach in Maui and would like to know why he wasn’t looped in sooner. He was looped in sooner.
Day 2, 10:47 UTC — The #incident-response Slack channel briefly pivots to a 45-message thread about whether “compromised” should be spelled with a ‘z’ in American English. Someone suggests taking this offline.
Day 2, 12:33 UTC — The shell script now targets a specific victim: the CI pipeline for snekpack, a Python build tool used by 60% of PyPI packages with the word “data” in their name. snekpack vendors vulpine-lz4 because “Rust is memory safe.”
Day 2, 18:00 UTC — snekpack version 3.7.0 is released. The malware is now being installed on developer machines worldwide. It adds an SSH key to ~/.ssh/authorized_keys, installs a reverse shell that only activates on Tuesdays, and changes the user’s default shell to fish (this last behavior is believed to be a bug).
Day 2, 19:45 UTC — A second, unrelated security researcher publishes a blog post titled “I found a supply chain attack and reported it to all the wrong people.” The post is 14,000 words and includes the phrase “in this economy?” seven times.
Day 3, 01:17 UTC — A junior developer in Auckland notices the malicious code while debugging an unrelated issue. She opens a PR to revert the vendored vulpine-lz4 in snekpack. The PR requires two approvals. Both approvers are asleep.
Day 3, 02:00 UTC — The maintainer of left-justify receives his YubiKey from yubikey-official-store.net. It is a $4 USB drive containing a README that says “lol.”
Day 3, 06:12 UTC — An unrelated cryptocurrency mining worm called cryptobro-9000 begins spreading through a vulnerability in jsonify-extreme, a package that “makes JSON even more JSON, now with nested comment support.” The worm’s payload is unremarkable, but its propagation mechanism includes running npm update and pip install –upgrade on infected machines to maximize attack surface for future operations.
Day 3, 06:14 UTC — cryptobro-9000 accidentally upgrades snekpack to version 3.7.1, a legitimate release pushed by a confused co-maintainer who “didn’t see what all the fuss was about” and reverted to the previous vendored version of vulpine-lz4.
Day 3, 06:15 UTC — The malware’s Tuesday reverse shell activates. It is a Tuesday. However, the shell connects to a command-and-control server that was itself compromised by cryptobro-9000 and swapping so hard it is unable to respond.
Day 3, 09:00 UTC — The snekpack maintainers issue a security advisory. It is four sentences long and includes the phrases “out of an abundance of caution” and “no evidence of active exploitation,” which is technically true because evidence was not sought.
Day 3, 11:30 UTC — A developer tweets: “I updated all my dependencies and now my terminal is in fish???” The tweet receives 47,000 likes.
Day 3, 14:00 UTC — The compromised credentials for vulpine-lz4 are rotated. The legitimate maintainer, reached by email from his new goat farm, says he “hasn’t touched that repo in two years” and “thought Cargo’s 2FA was optional.”
Day 3, 15:22 UTC — Incident declared resolved. A retrospective is scheduled and then rescheduled three times.
Week 6 — CVE-2024-YIKES is formally assigned. The advisory has been sitting in embargo limbo while MITRE and GitHub Security Advisories argue over CWE classification. By the time the CVE is published, three Medium articles and a DEF CON talk have already described the incident in detail. Total damage: unknown. Total machines compromised: estimated 4.2 million. Total machines saved by a cryptocurrency worm: also estimated 4.2 million. Net security posture change: uncomfortable.
Root Cause
A dog named Kubernetes ate a YubiKey.
Contributing Factors
The nmp registry still allows password-only authentication for packages with fewer than 10 million weekly downloads
Google AI Overviews confidently link to URLs that should not exist
The Rust ecosystem’s “small crates” philosophy, cargo culted from the npm ecosystem, means a package called is-even-number-rs with 3 GitHub stars can be four transitive dependencies deep in critical infrastructure
Python build tools vendor Rust libraries “for performance” and then never update them
Dependabot auto-merged a PR after CI passed, and CI passed because the malware installed volkswagen
Cryptocurrency worms have better CI/CD hygiene than most startups
No single person was responsible for this incident. However, we note that the Dependabot PR was approved by a contractor whose last day was that Friday.
It was a Tuesday
Remediation
Implement artifact signing (action item from Q3 2022 incident, still in backlog)
Implement mandatory 2FA Already required, did not help
Audit transitive dependencies There are 847 of them
Pin all dependency versions Prevents receiving security patches
Don’t pin dependency versions Enables supply chain attacks
Rewrite it in Rust (gestures at vulpine-lz4)
Hope for benevolent worms
Consider a career in goat farming
Customer Impact
Some customers may have experienced suboptimal security outcomes. We are proactively reaching out to affected stakeholders to provide visibility into the situation. Customer trust remains our north star.
Key Learnings
We are taking this opportunity to revisit our security posture going forward. A cross-functional working group has been established to align on next steps. The working group has not yet met.
Acknowledgments
We would like to thank:
Karen Oyelaran, who found this issue because her hostname matched a regex
The junior developer in Auckland whose PR was approved four hours after the incident was already resolved
The security researchers who found this issue first but reported it to the wrong people
The cryptobro-9000 author, who has requested we not credit them by name but has asked us to mention their SoundCloud
Kubernetes (the dog), who has declined to comment
The security team, who met SLA on this report despite everything
This incident report was reviewed by Legal, who asked us to clarify that the fish shell is not malware, it just feels that way sometimes.
This is the third incident report this quarter. The author would like to remind stakeholders that the security team’s headcount request has been in the backlog since Q1 2023.
I’ve been experimenting with running local models on and off for a bit and I’ve finally found a setup that seems to work reasonably. It’s nothing like the output of a SOTA model, but the excitement of being able to have a local model do basic tasks, research, and planning, more than makes up for it! No internet connection required! Not to mention that it’s a way of reducing your dependence on big US tech, even if just a tiny bit.
I gotta say though, it’s not easy to get this stuff set up. First you have to choose how you’re running the model: Ollama, llama.cpp or LM Studio. Each one comes with its own quirks and limitations, and they don’t offer all the same models. Then of course, you have to pick your model. You want the best model available that fits in memory and still gives you enough headroom to run your regular assortment of Electron apps, not to mention something where you can have at least a 64K context window, but ideally 128K or more. Most recently I’ve tried Qwen 3.6 Q3, GPT-OSS 20B, Devstral Small 24B, which all technically fit in memory but were in practice unusable, and Gemma 4B that would run fine but really struggle with tool use.
Then there’s a plethora of configuration options to tweak. From the more well-known, like temperature, to more esoteric options like K Cache Quantization Type. Many of these tools come with a basic recommended set of options, but the appropriate ones can depend on things like whether you’re enabling thinking or not!
Qwen 3.5 – 9B (4b quant)
qwen3.5 – 9b@q4_k_s (HuggingFace link) is the best model I’ve gotten working with a reasonable ~40 tokens per second, thinking enabled, successful tool use, and a 128K context window, running on LM Studio. Compared to a SOTA model, it gets distracted more easily, sometimes it gets stuck in loops, it’ll misinterpret asks etc. But it’s surprisingly good for something that can run on a 24GB Macbook Pro while leaving space for lots of other things running too!
These are the recommended settings for thinking mode and coding work:
Thinking mode for precise coding tasks (e.g., WebDev):
temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0
Thinking mode for precise coding tasks (e.g., WebDev):
temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0
To enable thinking I also had to select the model, go to configuration, scroll to the bottom of the Inference tab, and add {%- set enable_thinking = true %} to the Prompt Template.
I’ve been using it through both pi and OpenCode. I still haven’t quite made my mind up on with one I prefer. Pi feels a bit snappier, but although I really appreciate the idea of the harness building itself and all that customization, I can’t help but wish it came with some sensible defaults. I feel like you could easily end up spending more time tweaking your pi set up to be just right, than you do on your actual projects!
Pi setup
Here’s the ~/.pi/agent/models.json:
{ “providers”: { “lmstudio”: { “baseUrl”: “http://localhost:1234/v1”, “api”: “openai-completions”, “apiKey”: “lm-studio”, “models”: [ { “id”: “qwen3.5 – 9b@q4_k_s”, “reasoning”: true, “compat”: { “thinkingFormat”: “qwen-chat-template” } } ] } } }
and to hide the distracting thinking, add “hideThinkingBlock”: true to your ~/.pi/agent/settings.json.
OpenCode setup
~/.config/opencode/opencode.json
{ “$schema”: “https://opencode.ai/config.json”, “provider”: { “lmstudio”: { “npm”: “@ai-sdk/openai-compatible”, “name”: “LM Studio (local)”, “options”: { “baseURL”: “http://127.0.0.1:1234/v1” }, “models”: { “qwen3.5 – 9b@q4_k_s”: { “name”: “Qwen 3.5 9B Q4_K_S”, “tools”: true, “context_length”: 131072, “max_tokens”: 32768 } } } }, “model”: “lmstudio/qwen3.5 – 9b@q4_k_s” }
Comparison with SOTA models
A model like Qwen 3.5 9B (Q4) is nowhere near being able to independently solve complex problems over longer period of time, like SOTA models are. There’s no point in asking it to build an entire app in one go, you’ll just burn your thighs without getting anything out of it.
What does work is a more interactive workflow where you’re clearly communicating with the model step by step, and giving it a lot of guidance. I’m sure that sounds pointless to many of you, why use a model where you have to babysit it as it works, but I actually found that it encouraged me to be more engaged. The downside to working with SOTA models is that they make it too easy to offload all cognitive effort, even where you’re trying to actively prevent that from happening.
With the local modal I have to take on a lot more of the thinking and planning, I have to be a lot more specific, but it will still act as a research assistant, a rubber duck, and a savant with instant recall of a lot of programming language details and command line evocations. It’s not the 10x productivity boost that the big AI companies market, but it’s something, and it’s interesting.
To give you an idea of some things that work and don’t, here are some very basic examples.
Example 1:
I wanted to bump the Elixir linter credo to the latest version and I got warnings out of my code. As an experiment, I asked Qwen to take a look.
me: run mix credo –strict and suggest how to solve it, don’t edit
Qwen: Credo Warnings There are 4 warnings in your test files. The issue is using length/1 to check if lists are non-empty. Credo recommends comparing against an empty list instead. Warnings: 1. test/jola_dev_web/controllers/blog_controller_test.exs:11 - assert length(conn.assigns.posts) > 0 2. test/jola_dev_web/controllers/blog_controller_test.exs:22 - assert length(conn.assigns.posts) > 0 3. test/jola_dev/blog_test.exs:10 - assert length(posts) > 0 4. test/jola_dev/blog_test.exs:23 - assert length(tags) > 0 How to Fix Replace each length(list) > 0 with list != []: blog_controller_test.exs: # Line 11 assert conn.assigns.posts != [] # Line 22 assert conn.assigns.posts != [] blog_test.exs: # Line 10 assert posts != [] # Line 23 assert tags != [] This is more idiomatic in Elixir and avoids the overhead of computing the list length.
I asked Qwen to make the edits and it did 4 parallel edits, very clean. Overall this was a very simple task that I could’ve done myself by tabbing back and forth between the terminal and my editor to get the line numbers to edit. Not a big difference in effort, but nice and convenient.
Example 2:
After some dependency bumps I had a dependabot PR with git conflicts that dependabot for some reason was refusing to rebase. I pulled it down, rebased, and asked Qwen to take a look. It was a very simple conflict to resolve, just take the newer version of each, and the model recognized that.
I can see the conflict. Here’s what’s happening: HEAD (your current branch): - sentry: 13.0.1 - tailwind: 0.3.1 Other side (commit being rebased): - sentry: 12.0.3 - tailwind: 0.4.1 You have divergent dependency bumps. Here are your options: Option 1: Take both updates (recommended) - Keep sentry 13.0.1 from HEAD - Keep tailwind 0.4.1 from the other side - This gives you the latest of both deps Option 2: Take only sentry update - Keep sentry 13.0.1 - Keep tailwind 0.3.1 Option 3: Take only tailwind update - Keep sentry 12.0.3 - Keep tailwind 0.4.1 Which approach do you want to take?
However, when asked to make the change, it forgot to make the edit and instead just tried to add the changes and continue the rebase, with the conflict markers still in place.
> Thinking: The user wants to take both updates - sentry 13.0.1 and tailwind 0.4.1. I need to resolve the conflict by keeping the HEAD version for sentry and the other side version for tailwind. I’ll use git add to stage the resolved file. –– # Stage resolved mix.lock and continue rebase $ git add mix.lock && git rebase –continue Vim: Warning: Output is not to a terminal Vim: Warning: Input is not from a terminal “~/workspace/mixtli/.git/COMMIT_EDITMSG” 32L, 1047BBump tailwind from 0.3.1 to 0.4.1 Bumps [tailwind](https://github.com/phoenixframework/tailwind) from 0.3.1 to 0.44.1. - [Changelog](https://github.com/phoenixframework/tailwind/blob/main/CHANGELOG.mmd) - [Commits](https://github.com/phoenixframework/tailwind/compare/v0.3.1…v0.4.11)
Additionally it failed to recognize that git rebase –continue opens an editor, and OpenCode then proceeded to hang, although that might just have been a one-off.
Conclusion
Anyway, LLMs are incredibly flexible and there’s a million things even a local model could do. Your imagination is the limit. Local models have serious tradeoffs, but they come with some pretty attractive benefits:
No internet connection required, you can work on the plane!
The cost is limited to the electricity you’re using, assuming you were gonna buy a computer anyway. No subscription required.
There’s still going to be a serious environmental cost of training these models, but the open model companies are nowhere near the top of the list in environmental impact, and using your own hardware means less data centers.
It’s fun to tinker.
LLMs have had a huge impact on our world, and much of it not great, but it’s obvious that they’re here to stay. Experimenting with local models feels like a more sustainable and positive way to interact with this technology. And honestly, it’s a lot of fun, even when it does the wrong thing!
yes, as in singular one.
Back in April 2026 Anthropic caused a lot of media noise when they concluded that their new AI model Mythos is dangerously good at finding security flaws in source code. Apparently Mythos was so good at this that Anthropic would not release this model to the public yet but instead trickle it out to a selected few companies for a while to allow a few good ones(?) to get a head start and fix the most pressing problems first, before the general populace would get their hands on it.
The whole world seemed to lose its marbles. Is this the end of the world as we know it? An amazingly successful marketing stunt for sure.
My (non-) access
Part of the deal with project Glasswing was that Anthropic also offered access to their latest AI model to “Open Source projects” via Linux Foundation. Linux Foundation let their project Alpha Omega handle this part, and I was contacted by their representatives. As lead developer of curl I was offered access to the magic model and I graciously accepted the offer. Sure, I’d like to see what it can find in curl.
I signed the contract for getting access, but then nothing happened. Weeks went past and I was told there was a hiccup somewhere and access was delayed.
Eventually, I was instead offered that someone else, who has access to the model, could run a scan and analysis on curl for me using Mythos and send me a report. To me, the distinction isn’t that important. It’s not that I would have a lot of time to explore lots of different prompts and doing deep dive adventures anyway. Getting the tool to generate a first proper scan and analysis would be great, whoever did it. I happily accepted this offer.
(I am purposely leaving out the identity of the individual(s) involved in getting the curl analysis done as it is not the point of this blog post.)
AI scans of curl
Before this first Mythos report, we had already scanned curl with several different very capable AI powered tools (I mean in addition to running a number of “normal” static code analyzers all the time, using the pickiest compiler options and doing fuzzing on it for years etc). Primarily AISLE, Zeropath and OpenAI’s Codex Security have been used to scrutinize the code with AI. These tools and the analyses they have done have triggered somewhere between two and three hundred bugfixes merged in curl through-out the recent 8 – 10 months or so. A bunch of the findings these AI tools reported were confirmed vulnerabilities and have been published as CVEs. Probably a dozen or more.
Nowadays we also use tools like GitHub’s Copilot and Augment code to review pull requests, and their remarks and complaints help us to land better code and avoid merging new bugs. I mean, we still merge bugs of course but the PR review bots regularly highlight issues that we fix: our merges would be worse without them. The AI reviews are used in addition to the human reviews. They help us, they don’t replace us.
We also see a high volume of high quality security reports flooding in: security researchers now use AI extensively and effectively.
Security is a top priority for us in the curl project. We follow every guideline and we do software engineering properly, to reduce the number of flaws in code. Scanning for flaws is just one of many steps to keep this ship safe. You need to search long and hard to find another software project that makes as much or goes further than curl, for software security.
May 6, 2026
It was with great anticipation we received the first source code analysis report generated with Mythos. Another chance for us to find areas to improve and bugs to fix. To make an even better curl.
This initial scan was made on curl’s git repository and its master branch of a certain recent commit. It counted 178K lines of code analyzed in the src/ and lib/ subdirectories.
The analysis details several different approaches and methods it has performed the search, and how it has focused on trying to find which flaws. A fun note in the top of the report says:
curl is one of the most fuzzed and audited C codebases in existence (OSS-Fuzz, Coverity, CodeQL, multiple paid audits). Finding anything in the hot paths (HTTP/1, TLS, URL parsing core) is unlikely.
curl is one of the most fuzzed and audited C codebases in existence (OSS-Fuzz, Coverity, CodeQL, multiple paid audits). Finding anything in the hot paths (HTTP/1, TLS, URL parsing core) is unlikely.
… and it correctly found no problems in those areas.
The size of curl
curl is currently 176,000 lines of C code when we exclude blank lines. The source code consists of 660,000 words, which is 12% more words than the entire English edition of the novel War and Peace.
On average, every single production source code line of curl has been written (and then rewritten) 4.14 times. We have polished on this.
Right now, the existing production code in git master that still remains, has been authored by 573 separate individuals. Over time, a total of 1,465 individuals have so far had their proposed changes merged into curl’s git repository.
We have published 188 CVEs for curl up until now.
curl is installed in over twenty billion instances. It runs on over 110 operating systems and 28 CPU architectures. It runs in every smart phone, tablet, car, TV, game console and server on earth.
Five findings became one
The report concluded it found five “Confirmed security vulnerabilities”. I think using the term confirmed is a little amusing when the AI says it confidently by itself. Yes, the AI thinks they are confirmed, but the curl security team has a slightly different take.
Five issues felt like nothing as we had expected an extensive list. Once my curl security team fellows and I had poked on the this short list for a number of hours and dug into the details, we had trimmed the list down and were left with one confirmed vulnerability. The other four were three false positives (they highlighted shortcomings that are documented in API documentation) and the fourth we deemed “just a bug”.
The single confirmed vulnerability is going to end up a severity low CVE planned to get published in sync with our pending next curl release 8.21.0 in late June. The flaw is not going to make anyone grasp for breath. All details of that vulnerability will of course not get public before then, so you need to hold out for details on that.
The Mythos report on curl also contained a number of spotted bugs that it concluded were not vulnerabilities, much like any new code analyzer does when you run it on hundreds of thousands of lines of code. All the bugs in the report are being investigated and one by one we are fixing those that we agree with.
All in all about twenty bugs that are described and explained very nicely. Barely any false positives, so I presume they have had a rather high threshold for certainty.
curl is certainly getting better thanks to this report, but counted by the volume of issues found, all the previous AI tools we have used have resulted in larger bugfix amounts. This is only natural of course since the first tools we ran had many more and easier bugs to find. As we have fixed issues along the way, finding new ones are slowly becoming harder. Additionally, a bug can be small or big so it’s not always fair to just compare numbers
Not particularly “dangerous”
My personal conclusion can however not end up with anything else than that the big hype around this model so far was primarily marketing. I see no evidence that this setup finds issues to any particular higher or more advanced degree than the other tools have done before Mythos. Maybe this model is a little bit better, but even if it is, it is not better to a degree that seems to make a significant dent in code analyzing.
This is just one source code repository and maybe it is much better on other things. I can only tell and comment on what it found here.
Still very good
But allow me to highlight and reiterate what I have said before: AI powered code analyzers are significantly better at finding security flaws and mistakes in source code than any traditional code analyzers did in the past. All modern AI models are good at this now. Anyone with time and some experimental spirits can find security problems now. The high quality chaos is real.
Any project that has not scanned their source code with AI powered tooling will likely find huge number of flaws, bugs and possible vulnerabilities with this new generation of tools. Mythos will, and so will many of the others.
Not using AI code analyzers in your project means that you leave adversaries and attackers time and opportunity to find and exploit the flaws you don’t find.
How AI analyzers differ
They can spot when the comment says something about the code and then conclude that the code does not work as the comment says.
It can check code for platforms and configurations we otherwise cannot run analyzers for
It “knows” details about 3rd party libraries and their APIs so it can detect abuse or bad assumptions.
It “knows” details about protocols curl implements and can question details in the code that seem to violate or contradict protocol specifications
They are typically good at summarizing and explaining the flaw, something which can be rather tedious and difficult with old style analyzers.
They can often generate and offer a patch for its found issue (even if the patch usually is not a 100% fix).
More details from the report
Zero memory-safety vulnerabilities found.
Methodology note: this review is hand-driven analysis using LLM subagents for parallel file reads, with every candidate finding re-verified by direct source inspection in the main session before being recorded. The CVE to variant-hunt mapping was built from curl’s own vuln.json. No automated SAST tooling was used.
This outcome is consistent with curl’s status as one of the most heavily fuzzed and audited C codebases. The defensive infrastructure (capped dynbufs everywhere, curlx_str_number with explicit max on every numeric parse, curlx_memdup0 overflow guard, CURL_PRINTF format-string enforcement, per-protocol response-size caps, pingpong 64KB line cap) systematically closes the bug classes that would normally be productive in a codebase this size.
Coverage now includes: all minor protocols, all file parsers, all TLS backends’ verify paths, http/1/2/3, ftp full depth, mprintf, x509asn1, doh, all auth mechanisms, content encoding, connection reuse, session cache, CLI tool, platform-specific code, and CI/build supply chain.
AI finds existing kinds of errors
It should be noted that the AI tools find the usual and established kind of errors we already know about. It just finds new instances of them.
We have not seen any AI so far report a vulnerability that would somehow be of a novel kind or something totally new. They do not reinvent the field in that way, but they do dig up more issues than any other tools did before.
More to find
These were absolutely not the last bugs to find or report. Just while I was writing the drafts for this blog post we have received more reports from security researchers about suspected problems. The AI tools will improve further and the researchers can find new and different ways to prompt the existing AIs to make them find more.
We have not reached the end of this yet.
I hope we can keep getting more curl scans done with Mythos and other AIs, over and over until they truly stop finding new problems.
Credits
Thanks to Anthropic and Alpha Omega for providing the model, the tools and doing the scan for us. Thanks also to the individual who did the scan for us. Much appreciated!
Top image by Jin Kim from Pixabay
Thanks for flying curl. It’s never dull.
Executive Summary
Security researchers have identified a highly targeted social engineering campaign (REF6598) that weaponizes the Obsidian note-taking application to deliver a previously undocumented Remote Access Trojan (RAT) named PHANTOMPULSE. The campaign targets individuals in the financial and cryptocurrency sectors on both Windows and macOS. Attackers use platforms like LinkedIn and Telegram to build trust before luring victims into a malicious shared Obsidian vault. The attack chain relies on tricking the user into enabling a community plugin, which then executes code to deploy the RAT. PHANTOMPULSE demonstrates advanced capabilities, including using the Ethereum blockchain to dynamically resolve its command-and-control (C2) server address, making it highly resilient to takedowns.
Threat Overview
The attack, designated REF6598, is a multi-stage social engineering effort. Threat actors pose as venture capitalists and engage with targets on professional networking sites before moving the conversation to a private Telegram group. The primary lure is an invitation to collaborate via a shared, cloud-hosted Obsidian vault.
Once the victim opens the shared vault, the infection is triggered by social engineering. The victim is prompted to enable the “Installed community plugins” synchronization feature. This seemingly innocuous action, which requires manual user approval, is the key to the compromise. It enables malicious versions of legitimate Obsidian plugins (‘Shell Commands’ and ‘Hider’) that are present in the shared vault.
Technical Analysis
The attack chain differs slightly between Windows and macOS but follows the same general principle:
Initial Access (T1566.002): The attacker uses social engineering on LinkedIn/Telegram to convince the target to open a malicious shared Obsidian vault.
Execution (T1204.002): The user is manipulated into enabling community plugins within Obsidian. This action executes a malicious script via the compromised ‘Shell Commands’ plugin.
Staging: On Windows, a PowerShell script is executed. This script drops a loader known as PHANTOMPULL. On macOS, a similar process occurs using AppleScript.
Payload Delivery: The PHANTOMPULL loader decrypts and launches the final payload, the PHANTOMPULSE RAT, directly into memory to evade file-based detection (T1055).
Command and Control (T1102.002): PHANTOMPULSE uses a novel C2 mechanism. It queries the Ethereum blockchain for the latest transaction from a hard-coded wallet address. The C2 server’s IP address is embedded within this transaction data, providing a decentralized and censorship-resistant way for the malware to receive instructions.
Once active, PHANTOMPULSE can capture keystrokes, take screenshots, exfiltrate files, and execute arbitrary commands.
Impact Assessment
A successful compromise gives the attacker full access to the victim’s machine. For professionals in finance and crypto, this could lead to the theft of sensitive corporate data, intellectual property, trading strategies, and, most critically, cryptocurrency wallet keys and exchange credentials. The cross-platform nature of the attack broadens its potential victim pool. The use of a blockchain-based C2 demonstrates a high level of sophistication, making the threat infrastructure difficult to disrupt.
Cyber Observables for Detection
Type
process_name
Value
Obsidian.exe
Description
Monitor for Obsidian spawning child processes like powershell.exe, cmd.exe, or osascript.
Type
command_line_pattern
Value
powershell -ExecutionPolicy Bypass
Description
Suspicious PowerShell execution, especially when initiated by a non-standard application like Obsidian.
Type
network_traffic_pattern
Value
Outbound connections to Ethereum blockchain nodes or gateways from unexpected processes.
Description
Could indicate PHANTOMPULSE attempting to resolve its C2 address.
Type
file_path
Value
[Vault]/.obsidian/plugins/
Description
Monitor for the creation or modification of files within the Obsidian plugins directory, especially outside of the official plugin marketplace.
Detection & Response
Process Monitoring (D3-PA: Process Analysis): Implement EDR rules to detect and alert when the Obsidian process spawns command-line interpreters (powershell.exe, cmd.exe, bash, osascript). This is highly anomalous behavior.
User Training: Educate users, especially those in high-risk industries, about the dangers of social engineering and the specific tactic of abusing collaboration tool features like shared vaults and plugins.
Application Control (D3-EAL: Executable Allowlisting): Where possible, use application control policies to restrict the installation and execution of unapproved community plugins in applications like Obsidian.
Network Monitoring (D3-NTA: Network Traffic Analysis): Monitor for unusual DNS queries or direct IP connections related to blockchain services from endpoints where such activity is not expected.
Mitigation
Vet Community Plugins: Be extremely cautious when enabling third-party or community-developed plugins in any application. Only install plugins from the official, trusted marketplace and review their permissions.
Disable Auto-Sync for Untrusted Vaults: Do not enable plugin synchronization when connecting to an Obsidian vault from an unknown or untrusted source.
Principle of Least Privilege: Run applications like Obsidian as a standard user, not with administrative privileges, to limit the potential impact of a compromise.
Endpoint Security: Ensure up-to-date EDR and antivirus solutions are deployed to detect and block suspicious script execution and process injection techniques.
The Maryland Office of People’s Counsel (OPC), a state agency that represents its utility consumers, filed a complaint before the Federal Energy Regulatory Commission (FERC) regarding PJM Interconnection, LLC’s plans to charge it $2 billion of the $22 billion it spent to upgrade its grid to accommodate increasing demand from data centers. According to the OPC’s press release, this $2 billion bill will cost the state’s consumers an extra $1.6 billion in the next ten years alone — that means an extra $823 million for residential (approx. $345 per customer), $146 million for commercial (approx. $673 per customer), and $629 million for industrial customers (approx. $15,074 per customer).
“Without FERC action, Maryland customers face paying billions for transmission infrastructure that PJM is advancing to benefit data centers,” said Maryland People’s Counsel David S. Lapp. “PJM’s cost allocation rules are broken. Maryland customers have neither caused the need for these billions in new transmission projects nor will they meaningfully benefit from them.”
PJM Interconnection, LLC is the United States’ largest electricity transmission company, and covers 13 states plus Washington, D.C. This includes Delaware, Illinois, Indiana, Kentucky, Maryland, Michigan, New Jersey, North Carolina, Ohio, Pennsylvania, Tennessee, Virginia, and West Virginia, covering about 65 million people, or about 20% of the entire U.S. population. Some of these states, including Maryland, host a large number of data centers, so the firm needs to upgrade its infrastructure to meet projected demand from these power-hungry AI systems.
Maryland says that these infrastructure costs should be charged directly to the areas where they’re being constructed, or, as President Donald Trump made tech companies promise with the “ratepayer protection pledge,” the companies themselves should be billed directly for these grid upgrades. The OPC said there is “extreme uncertainty” regarding load growth driven by data center demand, and that utility providers tend to benefit from these upgrades even if the demand never materializes. This is especially true because the costs of these investments are borne by existing utility customers, not by the data centers themselves, if the data centers do not follow the “ratepayer protection pledge.”
Follow Tom’s Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.
Get Tom’s Hardware’s best news and in-depth reviews, straight to your inbox.
Jowi Morales is a tech enthusiast with years of experience working in the industry. He’s been writing with several tech publications since 2021, where he’s been interested in tech hardware and consumer electronics.
Kicked out of YC. 493+ fabricated audit reports. Sold fraud to fellow YC companies.
Claimed to automate SOC 2 and ISO compliance auditing with AI. Whistleblower revealed the platform auto-generated identical passing audit reports with keyboard-mashed test data before clients even uploaded anything . CEO was caught on tape asking an auditor ‘does your firm actually look at our platform?’ Focused on selling to other YC companies who trusted the YC brand — the circular trust network weaponized. Insight Partners scrubbed their $32M investment blog post. YC officially expelled Delve in 2026. Investor Adam Cochran called it proof that YC has ‘no technical acumen to evaluate claims’ under Garry Tan’s leadership.
SORT:
Central
Signed up as a Warp customer, stole the playbook, launched a clone
Autopsy Report:
Central’s CEO signed up as a customer of fellow YC company Warp in 2023 under ‘Central Business Applications Inc.’ Spent six months asking how tax compliance works, what registered agents are, how multi-state filings work — then churned and launched an identical payroll startup with Warp’s exact marketing copy, down to the tagline ‘designed for founders, not HR’ . YC funded them anyway in S24, despite Warp already being in the portfolio. Central got acqui-hired by Mercury in April 2026. Warp CEO Ayush Sharma publicly called them out: ‘gg no re.’
DAMAGE: Acqui-hired by Mercury
STATUS
COPYCATORIGINAL
Capital Raised$8.6M
Lifespan3 YRS
Naive
Raised $2M+ selling “autonomous AI employees.” The platform is a 41K-star MIT-licensed OSS framework with Stripe bolted on.
Autopsy Report:
Naive (YC S25) raised $2M+ marketing themselves as builders of “autonomous AI employees.” Public investigation by @NotOnKetamine traced the entire underlying platform to Paperclip — a 41K-star MIT-licensed open source agent framework — with Naive having stripped attribution and rebranded the original code as proprietary . The additions on top of Paperclip were thin: Stripe billing wrappers ($0.50/credit), domain/email provisioning for agents, phone/SMS via standard providers, and a Composio integration layer. The OSS license required attribution and license preservation; Naive shipped neither. The investigation surfaced screenshots, curl-verifiable evidence, and side-by-side comparisons. As of April 2026, the company has not publicly responded to the receipts.
DAMAGE: MIT license violated
STATUS
COPYCATORIGINAL
Capital Raised$2M+
Lifespan1+ YRS
Wuri
AI-generated visual novel platform. Pivoted to enterprise AI, then died as foundation models commoditized the UI layer in twelve months.
Autopsy Report:
Wuri pitched converting 500M+ Wattpad-style text stories into Webtoon-style visual novels via generative AI. Founders ex-Flipkart, ex-Bain. the wrapper-thin differentiation collapsed when GPT-4 / Sora / image-gen foundation models shipped equivalent capability for free, eroding any pricing power Wuri had layered on top . Pivoted from consumer reader app to enterprise AI offerings in 2024. Quietly shut down in 2025. YC profile remains up but flagged Inactive in the directory. The Wuri arc is the median Tan-era Indian AI YC: thin GenAI wrapper, fast pivot, quiet death.
DAMAGE: Tan-era YC class slot consumed for a thin GenAI wrapper
STATUS
DEADALIVE
Capital Raised$500K
Lifespan2 YRS
Double Finance
“0% expense ratio” robo-advisor. Hit $10M AUM in Dec 2024, shut down twelve months later. YC slug then recycled for the founder’s next pivot — with Garry Tan as primary partner.
Autopsy Report:
Double pitched itself as a Wealthfront/Betterment killer offering direct indexing for $1/month. Founder JJ Maxwell (ex-Liftoff Influence) and growth lead Mark Wai (ex-Tesla, ex-Facebook). after crossing $10M AUM in December 2024, the company quietly shut down a year later — and the YC slug `double-2` was repurposed for Maxwell’s next venture (an AI ad-generation startup called Polished) with Garry Tan named primary YC partner . Maxwell wrote a “Lessons from Building (and Closing) Double Finance” HN post (item 46213345). The slug-recycling-into-pivot pattern is itself a Tan-era artifact: the same YC company URL now points at a different product with different unit economics, while clients had to be wound down or transferred. Robo-advising joined the long list of consumer fintech categories that didn’t pencil at venture scale.
DAMAGE: $10M+ in client AUM wound down or transferred
STATUS
DEADALIVE
Capital RaisedUndisclosed (YC + post-Demo Day)
Lifespan1 YRS
PearAI
Forked an open-source code editor, slapped their name on it, called it a startup.
Autopsy Report:
On Day 1 of YC Demo Day, the internet noticed that PearAI had forked Continue.dev’s open source code editor, mass-replaced all ‘Continue’ references with ‘PearAI’, and slapped on a fake license written by ChatGPT . The ‘Pear Enterprise License’ became a meme. YC had to publish a blog post ‘correcting the record.’ Founders apologized and pivoted. The license was generated by AI. The irony was not lost.
DAMAGE: YC’s reputation took the hit
STATUS
COPYCATORIGINAL
Capital Raised$1.25M
Lifespan2+ YRS
Pickle
Stole GPL code, then launched AR glasses that might be CGI.
Autopsy Report:
A YC W25 company with two separate controversies. First, co-founder Daniel Park released ‘Glass,’ a real-time conference AI assistant, under Apache 2.0. Developers discovered it was a near-verbatim copy of ‘CheatingDaddy,’ a GPLv3-licensed open source project — identical code, comments, and library versions relicensed without attribution . After 700+ upvotes on Hacker News, they changed the license back. Then Pickle launched $799 ‘Pickle 1’ AR glasses marketed as a ‘soul computer.’ Tech analysts accused the demo of being CGI — impossible battery specs, a Korean restaurant in the demo that doesn’t exist, and mismatched UI data. SF Standard asked: ‘Is it vaporware?’
DAMAGE: Open source trust + $799 preorders
STATUS
GRIFTLEGIT
Capital RaisedUndisclosed
Lifespan2+ YRS
Optifye.ai
Built ‘sweatshops-as-a-service.’ YC deleted the evidence.
Autopsy Report:
YC W25 startup that built AI-powered surveillance cameras for garment factories. YC posted a demo video showing supervisors monitoring workers as colored rectangles — green for productive, red for not. A manager calls about a low-performing worker referred to as ‘Number 17’ — not by name, just a number — while the AI tracked their every movement on the factory floor . The internet dubbed it ‘sweatshops-as-a-service’ and ‘computer vision sweatshop software.’ YC deleted the demo video from all social accounts after massive backlash. The founders, two Duke CS students, saw their product become a symbol of AI surveillance dystopia.
DAMAGE: Garment workers’ dignity
STATUS
SCANDALOK
Capital RaisedUndisclosed
Lifespan2+ YRS
Rezi
Paid landlords upfront to rent their apartments. Burned $200M in debt, left a $49M foreclosure.
Autopsy Report:
YC W17 rental platform that paid landlords upfront — sometimes $30K cash — then listed and rented the apartments themselves. Raised $200M in debt rounds during the proptech boom. When interest rates spiked, Rezi couldn’t secure new financing and quietly shut down in early 2024, leaving landlord Leagem Properties to file a $49M foreclosure suit directly blaming Rezi’s collapse . Proptech funding plummeted from $32B in 2021 to $11.4B in 2023, and Rezi was caught in the middle with a debt-heavy model that only works when money is cheap. Founder Sean Mitchell launched a new startup called Ryse. The $200M evaporated.
DAMAGE: $49M foreclosure + landlords left holding the bag
STATUS
DEADALIVE
Capital Raised$200M (debt)
Lifespan8 YRS
CapWay
YC-backed fintech for banking deserts. Killed when the Synapse collapse destroyed fintech trust.
Autopsy Report:
YC S20 fintech startup aimed at bringing banking to underserved ‘banking desert’ communities. Backed by Backstage Capital, Fearless Fund, and Khosla Ventures. When the Synapse collapse and Evolve Bank & Trust hack destroyed fintech credibility in 2023, banks started requiring fintechs to hold massive capital reserves that early-stage startups couldn’t meet — and CapWay couldn’t raise another round in the scorched-earth fintech landscape . A potential acquisition fell through. Founder Sheena Allen announced the closure in October 2024, citing the post-Synapse environment as the final blow. A startup trying to help underbanked communities, killed by the broader fintech industry’s trust deficit.
DAMAGE: Underserved communities lost a banking option
STATUS
DEADALIVE
Capital Raised$800K
Lifespan8 YRS
Medobed
Expelled from YC S23 for faking credentials and growth metrics. Never made it to Demo Day.
Autopsy Report:
Indian 10-minute medicine delivery startup that joined YC S23. Founder Piyush Sharma claimed IIT Roorkee and Stanford credentials and reported $1M monthly GMV with EBITDA profitability. An anonymous investor tipped off YC, which discovered the founder had fabricated his educational history and inflated growth metrics — irregularities that broke YC’s ethics policy . YC expelled Medobed from the batch before Demo Day — one of the rare instances of a company being kicked out mid-program. The company never raised through YC. Under Garry Tan’s watch, the accelerator that prides itself on due diligence missed a founder lying about where he went to school.
DAMAGE: YC’s vetting credibility
STATUS
FRAUDALIVE
Capital RaisedUndisclosed
Lifespan0 YRS
Embark Trucks
SPAC’d at $5.2B with zero revenue. Stock dropped 99%. Sold for parts.
Autopsy Report:
YC W16 alum that went public via SPAC at a $5.2B valuation. The catch? the company had zero revenue, zero commercial deployments, and zero path to profitability — it was a 7-year-old pre-revenue company valued at $5.2 billion . Stock dropped over 99% from the offer price. Laid off 70% of staff. CEO Alex Rodrigues blamed ‘capital markets turning their backs on pre-revenue companies.’ Applied Intuition bought the remains in 2023. The self-driving truck never drove itself to revenue.
DAMAGE: 70% of staff laid off
STATUS
DEADALIVE
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
Visit pancik.com for more.