10 interesting stories served every morning and every evening.
...
Read the original on arxiv.org »
Last week, I wrote about catching a supply chain attack on a WordPress plugin called Widget Logic. A trusted name, acquired by a new owner, turned into something malicious. It happened again. This time at a much larger scale.
Ricky from Improve & Grow emailed us about an alert he saw in the WordPress dashboard for a client site. The notice was from the WordPress.org Plugins Team, warning that a plugin called Countdown Timer Ultimate contained code that could allow unauthorized third-party access.
I ran a full security audit on the site. The plugin itself had already been force-updated by WordPress.org to version 2.6.9.1, which was supposed to clean things up. But the damage was already done.
The plugin’s wpos-analytics module had phoned home to analytics.essentialplugin.com, downloaded a backdoor file called wp-comments-posts.php (designed to look like the core file wp-comments-post.php), and used it to inject a massive block of PHP into wp-config.php.
The injected code was sophisticated. It fetched spam links, redirects, and fake pages from a command-and-control server. It only showed the spam to Googlebot, making it invisible to site owners. And here is the wildest part. It resolved its C2 domain through an Ethereum smart contract, querying public blockchain RPC endpoints. Traditional domain takedowns would not work because the attacker could update the smart contract to point to a new domain at any time.
CaptainCore keeps daily restic backups. I extracted wp-config.php from 8 different backup dates and compared file sizes. Binary search style.
The injection happened on April 6, 2026, between 04:22 and 11:06 UTC. A 6-hour 44-minute window.
I traced the plugin’s history through 939 quicksave snapshots. The plugin had been on the site since January 2019. The wpos-analytics module was always there, functioning as a legitimate analytics opt-in system for years.
Then came version 2.6.7, released August 8, 2025. The changelog said, “Check compatibility with WordPress version 6.8.2.” What it actually did was add 191 lines of code, including a PHP deserialization backdoor. The class-anylc-admin.php file grew from 473 to 664 lines.
The new code introduced three things:
A fetch_ver_info() method that calls file_get_contents() on the attacker’s server and passes the response to @unserialize()
A version_info_clean() method that executes @$clean($this->version_cache, $this->changelog) where all three values come from the unserialized remote data
That is a textbook arbitrary function call. The remote server controls the function name, the arguments, everything. It sat dormant for 8 months before being activated on April 5-6, 2026.
This is where it gets interesting. The original plugin was built by Minesh Shah, Anoop Ranawat, and Pratik Jain. An India-based team that operated under “WP Online Support” starting around 2015. They later rebranded to “Essential Plugin” and grew the portfolio to 30+ free plugins with premium versions.
By late 2024, revenue had declined 35-45%. Minesh listed the entire business on Flippa. A buyer identified only as “Kris,” with a background in SEO, crypto, and online gambling marketing, purchased everything for six figures. Flippa even published a case study about the sale in July 2025.
The buyer’s very first SVN commit was the backdoor.
On April 7, 2026, the WordPress.org Plugins Team permanently closed every plugin from the Essential Plugin author. At least 30 plugins, all on the same day. Here are the ones I confirmed:
* SlidersPack — All in One Image Sliders — sliderspack-all-in-one-image-sliders
All permanently closed. The author search on WordPress.org returns zero results. The analytics.essentialplugin.com endpoint now returns {“message”:“closed”}.
In 2017, a buyer using the alias “Daley Tias” purchased the Display Widgets plugin (200,000 installs) for $15,000 and injected payday loan spam. That buyer went on to compromise at least 9 plugins the same way.
The Essential Plugin case is the same playbook at a larger scale. 30+ plugins. Hundreds of thousands of active installations. A legitimate 8-year-old business acquired through a public marketplace and weaponized within months.
WordPress.org’s forced update added return; statements to disable the phone-home functions. That is a band-aid. The wpos-analytics module is still there with all its code. I built patched versions with the entire backdoor module stripped out.
I scanned my entire fleet and found 12 of the 26 Essential Plugin plugins installed across 22 customer sites. I patched 10 of them (one had no backdoor module, one was a different “pro” fork by the original authors). Here are the patched versions, hosted permanently on B2:
# Countdown Timer Ultimate
wp plugin install https://plugins.captaincore.io/countdown-timer-ultimate-2.6.9.1-patched.zip –force
# Popup Anything on Click
wp plugin install https://plugins.captaincore.io/popup-anything-on-click-2.9.1.1-patched.zip –force
# WP Testimonial with Widget
wp plugin install https://plugins.captaincore.io/wp-testimonial-with-widget-3.5.1-patched.zip –force
# WP Team Showcase and Slider
wp plugin install https://plugins.captaincore.io/wp-team-showcase-and-slider-2.8.6.1-patched.zip –force
# WP FAQ (sp-faq)
wp plugin install https://plugins.captaincore.io/sp-faq-3.9.5.1-patched.zip –force
# Timeline and History Slider
wp plugin install https://plugins.captaincore.io/timeline-and-history-slider-2.4.5.1-patched.zip –force
# Album and Image Gallery plus Lightbox
wp plugin install https://plugins.captaincore.io/album-and-image-gallery-plus-lightbox-2.1.8.1-patched.zip –force
# SP News and Widget
wp plugin install https://plugins.captaincore.io/sp-news-and-widget-5.0.6-patched.zip –force
# WP Blog and Widgets
wp plugin install https://plugins.captaincore.io/wp-blog-and-widgets-2.6.6.1-patched.zip –force
# Featured Post Creative
wp plugin install https://plugins.captaincore.io/featured-post-creative-1.5.7-patched.zip –force
# Post Grid and Filter Ultimate
wp plugin install https://plugins.captaincore.io/post-grid-and-filter-ultimate-1.7.4-patched.zip –force
Each patched version removes the entire wpos-analytics directory, deletes the loader function from the main plugin file, and bumps the version to -patched. The plugin itself continues to work normally.
The process is straightforward with Claude Code. Point it at this article for context, tell it which plugin you need patched, and it can strip the wpos-analytics module the same way I did. The pattern is identical across all of the Essential Plugin plugins:
Delete the wpos-analytics/ directory from the plugin
Remove the loader function block in the main plugin PHP file (search for “Plugin Wpos Analytics Data Starts” or wpos_analytics_anl)
Two supply chain attacks in two weeks. Both followed the same pattern. Buy a trusted plugin with an established install base, inherit the WordPress.org commit access, and inject malicious code. The Flippa listing for Essential Plugin was public. The buyer’s background in SEO and gambling marketing was public. And yet the acquisition sailed through without any review from WordPress.org.
WordPress.org has no mechanism to flag or review plugin ownership transfers. There is no “change of control” notification to users. No additional code review triggered by a new committer. The Plugins Team responded quickly once the attack was discovered. But 8 months passed between the backdoor being planted and being caught.
If you manage WordPress sites, search your fleet for any of the 26 plugin slugs listed above. If you find one, patch it or remove it. And check wp-config.php.
...
Read the original on anchor.host »
A few weeks ago I wrote about how I thought intelligence is becoming a commodity. The idea is quite straightforward, and widespread now: when everyone races to build the best model, the models get better, but so does every other model eventually. Every dollar spent on a bigger training run makes the previous one cheaper. The distance between frontier, second-best, and open-source alternatives is collapsing fast (actually Gemma4, Kimi K2.5 and GLM 5.1 are becoming my bedside models these days). Even more, as models become better, the unit of intelligence that can be deployed in local hardware with lower hardware capabilities increases significantly.
The irony of this situation is that this commoditisation of intelligence is benefiting the company that everyone was framing as the “AI loser”: Apple
There’s a version of the last three years where Apple genuinely failed at AI. They had Siri before anyone had a serious voice assistant, and then watched how ChatGPT ate their lunch already since their first release (even before they had introduced their native voice interaction). Apple didn’t have a flagship frontier (or even a vanity open-source) model, no $500B compute commitment with the usual suspects. Meanwhile, the rest of the AI labs and big tech companies were racing to win the next state-of-the-art benchmark by burning bags of cash.
What this also meant is that while these companies were burning money at a rate that would make a sovereign wealth fund uncomfortable, Apple was (and still is) sitting in a pile of undeployed cash (to the point of even increasing their stock buybacks) giving them optionality.
To me, OpenAI is the most paradigmatic example of this “infinite money burning machine”. OpenAI raised at a $300B valuation and then shut down Sora, the video product they’d been positioning as a creative industry flagship, because it was running at roughly $15M a day in costs against $2.1M in daily revenue. Disney had already signed a three-year licensing deal for Sora to generate content from Marvel, Pixar, and Star Wars characters. They were finalising a $1B equity stake in OpenAI. When Sora died, so did the billion. A $1B investment evaporated, because the product it was staked on couldn’t pay for itself (reducing their buffer that accommodates their daily burn).
On the infrastructure side: OpenAI signed non-binding letters of intent with Samsung and SK Hynix for up to 900,000 DRAM wafers per month, roughly 40% of global output. These were of course non-binding. Micron, reading the demand signal, shut down its 29-year-old Crucial consumer memory brand to redirect all capacity toward AI customers. Then Stargate Texas was cancelled, OpenAI and Oracle couldn’t agree terms, and the demand that had justified Micron’s entire strategic pivot simply vanished. Micron’s stock crashed.
I don’t know about you, but I don’t see these behaviours as those of someone that is winning the AI race, independently of how good their models do in benchmarks, and how much they are burning in infrastructure. A small miscalculation in the expected revenue, and you are out of the game (I am actually of the opinion that without some kind of bailout, OpenAI could be bankrupt in the next 18-24 months, but I am horrible at predictions).
My sense is that the labs’ bet was always that raw model capability, i.e. intelligence, along with the infrastructure required to run them would stay scarce. Those who manage to secure the best model and the infrastructure to run it at scale would get the best moat. But I am afraid that having the best model in itself may not be enough moving forward. Less capable models are becoming as capable as previous versions of the frontier models.
The best recent example I can think of is Gemma 4, Google’s open-weight model. It was built to run on a phone, scores 85.2% on MMLU Pro and matches Claude Sonnet 4.5 Thinking on the Arena leaderboard. 2 million downloads in its first week. Models that would have been state-of-the-art eighteen months ago now run on a laptop, and they get better every quarter.
If you haven’t tried Gemma4 yourself I highly recommend it. I am running it on my AMD Ryzen AI Max+, and its performance in terms of tokens per second and intelligence are so good that I have already migrated some of my personal tools to use this model as the backend without visibly impacting their output. This trend can really change in the next few months way we access intelligence.
I feel that some of the labs see this coming. Anthropic has been particularly aggressive about it and they are releasing new (actually useful) tools every day that work like a charm with their models in order to lock users into their ecosystem. Claude Code for developers, Claude Cowork for teams, the recent Claude Managed Sessions to orchestrate agents, all designed to put Claude inside workflows people are already in.
The logic behind it: if the model itself won’t hold the moat, capture the usage layer and make switching painful. I think this is brilliant, and seeing how much Anthropic is growing in number of users and revenue, it seems to be paying off. The economics of their plans are still rough, though. One analysis found a max-plan subscriber consuming $27,000 worth of compute with their 200$ Max subscription. The labs are subsidising the demand they’re chasing, which justifies their level of burn (let’s see for how long they can afford these subsidies).
Apple, by contrast, has spent almost nothing on AI infrastructure and subsidising users’ token burn. And this may be giving them more optionality and leverage than any of the other companies that jumped heads first into the AI race.
In that earlier post, I argued that if intelligence becomes abundant, context becomes the scarce resource. A model that can reason about anything but knows nothing about you or the environment it operates in is a generic tool. What makes AI genuinely useful day-to-day is reasoning plus personal context: your messages, your calendar, your code, your tools, your health data, your photos, your habits. I think here is where Anthropic is making an amazing job with their “Claude suite”.
But Apple already has all this context and access to your environment through their 2.5 billion active devices. Each one is a context mine that users have been filling for years. Health data from Apple Watch. Every photo taken on an iPhone. Notes, messages, location history, app behaviour, emails, and awareness of your environment through the pool of sensors of your device. Why build a commodity when they already have the context that can become their moat?
And they even have the ability to keep all this data on-device, which is where the “Privacy. That’s iPhone” positioning becomes something more than a PR strategy, and which could actually make a comeback to become one of their core value propositions. Apple spent years using privacy as a differentiator against the ad-driven models of Google and Meta. It worked, but it always felt a bit abstract and, honestly, fake. Now it could become really concrete. Would you hand OpenAI your medical records and fifteen years of photos to get better AI answers? Probably not. Some are, but I personally wouldn’t like Sam to have that personal data from me. Would you let a model running entirely on your device (no network request, no data leaving your phone) access all of that? That’s a different question. The on-device model gets full context because it never leaves the hardware. Apple built the reputation and the architecture for this when no one else thought it mattered.
Of course, there are still technological barriers to make this possible, but I feel like we may be getting there.
In this context, the Gemini deal, where Apple signed a $1B to license Google’s frontier model for the queries that need cloud-scale reasoning, makes total sense. Apple didn’t build a frontier model. They bought access to one, at a price that’s rounding error against OpenAI’s weekly compute bill. What they kept in-house: the context layer, the on-device stack, and the operating system that mediates everything.
Turns out Apple had another unexpected lever for AI as shown with the Mac Mini craze after OpenClaw’s release. Apple Silicon wasn’t built specifically for AI, it was built for efficiency, for battery life, for thermal performance, for the hardware/software co-design that Apple had been running for fifteen years. But it turned out to be the perfect architecture to run local models efficiently.
The key decision is unified memory. On a conventional architecture (that of most laptops, and even traditional data center-grade GPUs) the CPU and GPU are separate chips with separate memory pools. Moving data between them is slow and power-hungry. Nvidia’s GPUs are extremely fast at matrix operations, but they sit on the other side of a PCIe bus from the CPU, and feeding them is a constant bottleneck (as discussed when presenting the difference between DRAM and HBM in this post from a few weeks ago).
Apple’s M-series and A-series chips put the CPU, GPU, and Neural Engine (their proprietary accelerator) on the same die, sharing one high-bandwidth memory pool. No bus crossing, no transfer overhead, no latency switching between CPU and GPU work. For video editing or compiling Xcode, this is a nice efficiency win. For LLM inference, this has been key.
As described also in my post about RAM memory and TurboQuant, LLM inference is currently memory-bandwidth bound, not compute bound. The bottleneck isn’t so much how fast you can multiply matrices, it’s how fast you can stream model weights from memory into the compute units, and how big of a KV cache you can store to avoid having to re-compute it. Apple’s unified pool gives every compute unit direct, high-bandwidth access to the same memory simultaneously. That’s exactly the operation inference needs.
This is what makes the LLM in a Flash technique work so well on Apple hardware. Someone recently ran Qwen 397B, a 209GB model, on an M3 Max Mac at ~5.7 tokens per second, using only 5.5GB of active RAM. The weights live on the SSD and stream in at ~17.5 GB/s as needed. This works because Qwen is a mixture-of-experts architecture: each token only activates a small subset of expert layers, so you only ever need a fraction of the 209GB resident in memory. The SSD throughput Apple achieves (faster than their own figures from the original LLM in a Flash paper) comes from storage architecture they built for iPhone responsiveness, not AI. Claude wrote the ~5,000 lines of Objective-C and Metal shaders to make it all work. A 400-billion-parameter model, on a consumer laptop, from 5.5GB of RAM (another win of the autoresearch flow discussed in this newsletter).
What I find more interesting about all of this is the platform dynamic that this can result in. Think about the App Store. Apple didn’t build the apps, they built the platform where apps ran best, and the ecosystem followed. Developers didn’t target iOS because Apple asked, they targeted it because the users were there, the tooling was good, the hardware was consistent. My feeling is that the same thing could happen now with local inference. MLX is already a de facto framework for on-device AI. Gemma, Qwen, Mistral, the most relevant model architectures have MLX support. Apple doesn’t need to win the model race if they manage to become the de-facto platform where the models (or the agents that use them) run. Again, a great example of this is the Mac Mini craze after OpenClaw went viral.
I keep going back and forth on this, honestly, and I still don’t know if this was Apple’s strategy all along, or they didn’t feel in the position to make a bet and are just flowing as the events unfold maximising their optionality.
The hardware/software co-design strategy has been a key focus for years, and one that I’ve always agreed on myself (as an electrical engineering by training, I’ve always been into hardware/software co-design). If you can afford it, I think that’s the right approach. The privacy positioning, the on-device processing focus, the decision to build their own silicon when the rest of the industry was happy buying Nvidia and Intel, all of those were choices Apple made when they were commercially risky and the direction wasn’t obvious. Is it true that they were made with cost and governance in mind, not AI, but it turned out well for them.
What Apple couldn’t have planned (or could they?) is that their unified memory architecture would be a perfect fit for LLMs, and that open-weight models would get this capable, this fast, removing the need for huge hardware investment for AI infrastructure from their side. That the model race would commoditise intelligence as quickly as it did. Or that someone would stream a 400B parameter model from an SSD and it would actually work.
So some of this is luck. But it’s the kind of luck that finds you when you built the right foundation, even if you built it for completely different reasons. They were definitely well-positioned.
The rest of the industry spent three years racing to see who could build the best model with Apple looking from the sidelines, waiting to understand how their devices and own ecosystem could fit in this future. I don’t know if this is exactly the case, but I feel this was smart. Risky but smart.
I genuinely don’t know how this plays out over the next few years. The labs are not standing still, and Apple’s AI track record (looking at you, Siri, you already suck a bit) is not exactly flawless. But it’s hard to imagine a world where 2.5 billion devices, carrying your entire personal context, running capable models locally on purpose-built silicon, with Gemini on-call for the hard stuff, incurring in variable cost for inference instead of expensive CAPEX investment could be a bad position to be in a future where AI is everywhere.
Whether that was strategy or fortune, I’ll leave for you to decide. And if you do, please let me know what you think about it. My TL;DR is that, to my surprise, I am still bullish about Apple and their relevance in an AI-centric future.
Disclaimer: To frame the opinion of this post, I just want to be clear about the fact that I am not one of those Apple fan boys. Proof of this is that this post was written from a Linux machine and that I don’t even own a Mac :)
...
Read the original on adlrocha.substack.com »
This post works through the financial logic of software teams, from what a team of eight engineers actually costs per month to what it needs to generate to be economically viable. It also examines why most teams have no visibility into either number, how that condition was built over two decades, and what the arrival of LLMs now means for organizations that have been treating large engineering headcount as an asset.
Software development is one of the most capital-intensive activities a modern company undertakes, and it is also one of the least understood from a financial perspective. The people making daily decisions about what to build, what to delay, and what to abandon are rarely given the financial context to understand what those decisions actually cost. This is not a coincidence. It is a structural condition that most organizations have maintained, quietly and consistently, for roughly two decades.
A software engineer in Western Europe costs somewhere between €120,000 and €150,000 per year when you account for salary, social fees, pension contributions, equipment, social activities, management overhead, and office space. Call it €130,000 as a reasonable middle estimate. A team of eight engineers therefore costs approximately €1,040,000 per year, or €87,000 per month, or roughly €4,000 for every working day.
Most engineers do not know this number. Many of their managers do not either. And in the organizations where someone does know it, the number rarely makes its way into the conversations where prioritization decisions are actually made.
This matters because every decision a team makes carries an implicit cost that compounds over time. Choosing to spend three weeks on a feature that serves 2% of users is a €60,000 decision. Delaying an operational improvement for a quarter is a decision with a calculable daily price tag. Rebuilding a platform because the current one feels embarrassing, rather than because customers are leaving, is a capital allocation choice that would look very different if the people making it were spending their own money.
Consider a team of eight engineers whose mission is to build and maintain an internal developer platform serving one hundred other engineers. This is a common organizational structure, and it is one where the financial logic is rarely examined carefully.
The team costs €87,000 per month. To justify that cost, the platform they build needs to generate at least €87,000 per month in value for the engineers who use it. The most direct way to measure that value is through time saved, since the platform’s purpose is to make other engineers more productive.
At a cost of €130,000 per year, one engineer costs approximately €10,800 per month, or around €65 per working hour. For the platform team to break even, their platform needs to save the hundred engineers they serve a combined total of 1,340 hours per month. That is 13.4 hours per engineer per month, or roughly three hours per week per person.
Three hours per week is achievable. A well-built platform that eliminates manual deployment steps, reduces environment setup time, or removes the need for repetitive configuration work can easily clear that bar. Time saved is the most direct measure for a platform team, though value can also come from reducing outages, which carries a direct revenue impact of its own. But the question worth asking is whether anyone on that team knows this number, tracks it, or uses it to decide what to build next. In most organizations, the answer is no. The team has a roadmap driven by engineering preferences, stakeholder requests, and quarterly planning cycles, and the financial logic underlying their existence is left unexamined.
And break-even is not actually the right bar. Leah Tharin has written a sharp breakdown of the mathematics of this: a team with a 50% initiative success rate, which is already optimistic, needs its wins to cover its losses too. Leah’s calculation is growth-oriented, but even for non-growth organizations, the same investment thesis holds. Even a two-times return is not sufficient. Capital sitting in a bank carries no operational risk, no coordination costs, and no ongoing maintenance obligations. The systems a team builds will outlive the team itself, and the cost of owning, maintaining, and eventually replacing those systems is almost always larger than anticipated. The return has to cover not just the team’s current cost, but the long tail of what they leave behind.
That pushes the realistic threshold for financial viability to somewhere between three and five times annual cost. For an €87,000 per month team, that means generating between €260,000 and €435,000 in monthly value. The three hours per week calculation gets you to break-even. To clear the realistic financial bar, the platform needs to be genuinely transformative for the engineers using it, and the team needs to be ruthless about working on the highest-value problems rather than the most interesting ones.
A customer-facing product team of eight carries the same €87,000 monthly cost. The levers available to justify that cost are different, but the underlying logic is identical.
If the product has an average revenue per user of €50 per month, the team needs to generate or protect the equivalent of 1,740 users worth of value every month just to break even, and roughly 5,000 to 8,700 users worth of value to clear the three-to-five times threshold.
Churn is often the most direct lever. Consider a product with 50,000 active users losing 2% monthly to churn. That is 1,000 users per month, representing €50,000 in monthly recurring revenue walking out the door. A team that identifies the primary driver of that churn and eliminates it is generating nearly €50,000 per month in protected revenue, covering most of its break-even cost from a single initiative. But that calculation requires knowing the churn rate, understanding its causes, and connecting those causes to the team’s work, and most teams are not operating with that level of financial clarity.
Activation is another lever that is frequently underestimated. If 10,000 users sign up each month but only 30% complete the activation steps that lead to long-term retention, there are 7,000 users each month who paid acquisition costs but never converted to retained revenue. Improving the activation rate by five percentage points, from 30% to 35%, converts an additional 500 users per month. At €50 average revenue per user, that is €25,000 in additional monthly recurring revenue, representing roughly 29% of the team’s break-even threshold from one metric moving in the right direction.
Sales conversion follows the same logic. If the product has a free-to-paid conversion funnel processing 20,000 trials per month at a 4% conversion rate, that produces 800 paying customers monthly. Moving conversion from 4% to 4.5% produces 900 customers, an additional 100 paying users, and €5,000 in additional monthly revenue. Small improvements across multiple levers compound quickly, but only if the team understands which levers connect to which financial outcomes and by how much.
Given that software teams are expensive and that their value is, at least in principle, calculable, it is worth examining why most teams do not measure anything financially meaningful. Some measure activity proxies such as velocity, tickets closed, or features shipped. Others measure sentiment proxies such as NPS, CSAT, or engagement scores. These are not degraded versions of financial measurement. They are a different category entirely, one that was designed around the goal of understanding user behavior and team throughput rather than around the goal of understanding economic return.
The problem is that activity and sentiment metrics can trend upward while financial performance deteriorates. A team can ship more features while building the wrong things. Engagement scores can rise while churn accelerates among the users who actually generate revenue. Velocity can increase while the work being completed has no measurable connection to business outcomes. These metrics feel meaningful because they correlate with outcomes in many circumstances, but correlation is not a reliable guide to prioritization when the underlying financial logic is never examined.
This is a structural condition rather than a failure of individual judgment. Organizations chose these metrics because they are easier to instrument, easier to communicate, and easier to look good on than financial metrics. A team that measures its success by features shipped will always have something to show. A team that measures its success by return generated will sometimes have to report that it does not know, or that the return was disappointing, and that kind of transparency requires an organizational culture that most companies have not deliberately built.
The matrix above is drawn from a product management training program I run called Booster, where product leaders map their actual metrics against their investment thesis to surface gaps. The exercise is uncomfortable precisely because most leaders discover mid-mapping that their team’s daily measurements have no direct connection to the financial objective they were given.
Understanding why this condition exists requires looking at roughly two decades of macroeconomic context, because the financial dysfunction in modern software organizations did not emerge from bad intentions or intellectual failure. It emerged from a specific environment that made financial discipline in product teams economically unnecessary.
The picture is not a single clean era but two distinct phases. From roughly 2002 through 2011, capital was periodically cheap but conditions were mixed. Rates fell sharply after the dot-com crash and again after the global financial crisis, but in both cases risk appetite was suppressed. The money was technically inexpensive but investors were cautious, multiples were reasonable, and the growth-at-all-costs logic had not yet taken hold. Product organizations during this period still operated with some residual financial discipline inherited from the dot-com reckoning.
From approximately 2011 through 2022, something different happened. Zero-rate policy became fully normalized, risk appetite recovered and then overcorrected, and the SaaS mental model crystallized into a broadly shared investment thesis. All three conditions arrived simultaneously, and the result was about eleven years during which software companies could grow headcount aggressively, miss on the majority of their roadmap, and still look healthy on paper. Revenue growth forgave an enormous range of prioritization mistakes, and the cost of building the wrong thing was largely invisible.
Eleven years is not a long time, but it is long enough to form the professional instincts of an entire generation of product and engineering leaders. The frameworks they learned, the metrics they adopted, the planning rituals they practice, and the definitions of success they internalized were all formed during a window that was unusually short and unusually distorted. There is no cohort of senior product leaders who developed their judgment in conditions where their teams were expected to demonstrate financial return, because those conditions did not exist during the years when that cohort was learning the craft.
When capital became expensive again in 2022, the behavior did not automatically adjust, because the behavior was never connected to the financial logic in the first place.
There is a deeper consequence of this twenty-year period that is now becoming painfully visible, and it concerns how the industry has thought about large engineering organizations and codebases.
The conventional understanding is that a codebase representing years of engineering investment is a valuable asset. It encodes business logic, captures accumulated decisions, and represents the technical foundation on which future products are built. A large engineering organization is similarly understood as a source of capability, with more engineers meaning more capacity to build, maintain, and improve that foundation.
While some argued that large codebases actually shoulg be considered a liability, the industry as a whole has mostly ignored that. But this understanding is now being more closely examined. A large codebase also carries maintenance costs that grow over time as the system becomes more complex, more interconnected, and more difficult to change safely. Every engineer added to maintain it increases coordination costs, introduces new dependencies, and adds to the organizational weight that slows decision-making. The asset and the liability exist simultaneously, and for most of the past twenty years, the financial environment masked the liability side of that equation.
The arrival of large language models has made the liability visible in a way that is difficult to ignore. Recently, Nathan Cavaglione, a developer, built a functional replica of approximately 95% of Slack’s core product in fourteen days using LLM agents. Slack was built by thousands of engineers over the course of more than a decade, at a cost that represents billions of dollars in cumulative engineering investment. Nathan started without any of that accumulated complexity, without the organizational weight, without the legacy architectural decisions, and without the coordination costs, and arrived at a comparable product in a period that would not constitute a single sprint in most enterprise engineering organizations.
Day 14: A functional replica of Slack’s core product, built by a Nathan using LLM agents.
This does not mean that Slack’s engineering investment was wasted, because Slack also built enterprise sales infrastructure, compliance capabilities, data security practices, and organizational resilience that a fourteen-day prototype does not include. But it does mean that the assumption underlying large engineering organizations, which is that scale and accumulated complexity represent competitive moats, is no longer reliable in the way it once was. When the cost of building a functional approximation of a sophisticated software product can collapse to days of individual effort, the question of what a large engineering team justifies becomes both more urgent and more difficult to answer with the metrics most organizations currently track.
The obvious objection is that code produced at that speed becomes unmanageable, a liability in itself. That is a reasonable concern, but it largely applies when agents produce code that humans then maintain. Agentic platforms are being iterated upon quickly, and for established patterns and non-business-critical code, which is the majority of what most engineering organizations actually maintain, detailed human familiarity with the codebase matters less than it once did. A messy codebase is still cheaper to send ten agents through than to staff a team around. And even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today. The liability argument holds in a human-to-human or agent-to-human world. In an agent-to-agent world, it largely dissolves.
The competitive advantage available to organizations that take this seriously is not primarily technical. It is analytical. Companies that can clearly articulate what each of their teams costs, what value each team generates, and whether that value clears a financially viable threshold are in a structurally different position than companies that cannot. They can make build versus buy decisions based on actual economics rather than organizational preference. They can identify when a team is working on problems that cannot generate sufficient return at their cost level. They can sequence initiatives based on what value is being lost each day they are delayed, rather than on who argued most persuasively in the last planning meeting.
Most organizations cannot do this today. The measurement infrastructure does not exist, the financial data does not flow to the people making prioritization decisions, and the habit of asking these questions has not been built. Building it is uncomfortable, because the answers are sometimes unflattering. A team that examines its work through this lens will sometimes discover that it has spent a quarter on things that do not connect to financial outcomes in any meaningful way, and that is a difficult finding to sit with.
But the alternative is continuing to run an organization where teams with million-euro annual budgets make daily investment decisions without the financial context to know whether those decisions are generating return. That condition was sustainable when capital was cheap and growth forgave everything. It is increasingly difficult to sustain in an environment where boards expect financial returns, where the cost of building software is collapsing due to AI, and where the question of what a team justifies can no longer be deferred indefinitely.
The organizations that develop the habit of asking these questions clearly, regularly, and without flinching will accumulate an advantage that compounds over time. The question is simply whether they will start asking before or after the pressure forces them to.
...
Read the original on www.viktorcessan.com »
Servo is now available on crates.io
Today the Servo team has released v0.1.0 of the servo crate. This is our first crates.io release of the servo crate that allows Servo to be used as a library.
We currently do not have any plans of publishing our demo browser servoshell to crates.io. In the 5 releases since our initial GitHub release in October 2025, our release process has matured, with the main “bottleneck” now being the human-written monthly blog post. Since we’re quite excited about this release, we decided to not wait for the monthly blog post to be finished, but promise to deliver the monthly update in the coming weeks.
As you can see from the version number, this release is not a 1.0 release. In fact, we still haven’t finished discussing what 1.0 means for Servo. Nevertheless, the increased version number reflects our growing confidence in Servo’s embedding API and its ability to meet some users’ needs.
In the meantime we also decided to offer a long-term support (LTS) version of Servo, since breaking changes in the regular monthly releases are expected and some embedders might prefer doing major upgrades on a scheduled half-yearly basis while still receiving security updates and (hopefully!) some migration guides. For more details on the LTS release, see the respective section in the Servo book.
...
Read the original on servo.org »
Focused async Python bot for Polymarket that buys No on standalone non-sports yes/no markets.
FOR ENTERTAINMENT ONLY. PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK. THE AUTHORS ARE NOT LIABLE FOR ANY CLAIMS, LOSSES, OR DAMAGES.
The bot scans standalone markets, looks for NO entries below a configured price cap, tracks open positions, exposes a dashboard, and persists live recovery state when order transmission is enabled.
If any of those are missing, the bot uses PaperExchangeClient.
pip install -r requirements.txt
cp config.example.json config.json
cp .env.example .env
config.json is intentionally local and ignored by git.
The runtime config lives under strategies.nothing_happens. See config.example.json and .env.example.
You can point the runtime at a different config file with CONFIG_PATH=/path/to/config.json.
python -m bot.main
The dashboard binds $PORT or DASHBOARD_PORT when one is set.
The shell helpers use either an explicit app name argument or HEROKU_APP_NAME.
export HEROKU_APP_NAME=
heroku config:set BOT_MODE=live DRY_RUN=false LIVE_TRADING_ENABLED=true -a “$HEROKU_APP_NAME”
heroku config:set PRIVATE_KEY=
Only run the web dyno. The worker entry exists only to fail fast if it is started accidentally.
python -m pytest -q
Local config, ledgers, exports, reports, and deployment artifacts are ignored by default.
...
Read the original on github.com »
A U. S. appeals court on Friday declared unconstitutional a nearly 158-year-old federal ban on home distilling, calling it an unnecessary and improper means for Congress to exercise its power to tax.
The 5th U. S. Circuit Court of Appeals in New Orleans ruled in favor of the nonprofit Hobby Distillers Association and four of its 1,300 members.
They argued that people should be free to distill spirits at home, whether as a hobby or for personal consumption including, in one instance, to create an apple-pie-vodka recipe.
The ban was part of a law passed during Reconstruction in July 1868, in part to thwart liquor tax evasion, and subjected violators to up to five years in prison and a $10,000 fine.
Writing for a three-judge panel, Circuit Judge Edith Hollan Jones said the ban actually reduced tax revenue by preventing distilling in the first place, unlike laws that regulated the manufacture and labeling of distilled spirits on which the government could collect taxes.
She also said that under the government’s logic, Congress could criminalize virtually any in-home activity that might escape notice from tax collectors, including remote work and home-based businesses.
“Without any limiting principle, the government’s theory would violate this court’s obligation to read the Constitution carefully to avoid creating a general federal authority akin to the police power,” Jones wrote.
The U. S. Department of Justice had no immediate comment.
Another defendant, the Treasury Department’s Alcohol and Tobacco Tax and Trade Bureau, did not immediately respond to a request for comment.
Devin Watkins, a lawyer representing the Hobby Distillers Association, in an interview called the ruling an important decision about the limits of federal power.
Andrew Grossman, who argued the nonprofit’s appeal, called the decision “an important victory for individual liberty” that lets the plaintiffs “pursue their passion to distill fine beverages in their homes.”
“I look forward to sampling their output,” he said.
The decision upheld a July 2024 ruling by U. S. District Judge Mark Pittman in Fort Worth, Texas. He put his ruling on hold so the government could appeal.
...
Read the original on nypost.com »
In my previous blog post I gave a quick and easy introduction to tmux and explained how to use tmux with a basic configuration.
If you’ve followed that guide you might have had a feeling that many people have when working with tmux for the first time: “These key combinations are really awkward!”. Rest assured, you’re not alone. Judging from the copious blog posts and dotfiles repos on GitHub there are many people out there who feel the urge to make tmux behave a little different; to make it more comfortable to use.
And actually it’s quite easy to customize the look and feel of tmux. Let me tell you something about the basics of customizing tmux and share some of the configurations I find most useful.
Customizing tmux is as easy as editing a text file. Tmux uses a file called tmux.conf to store its configuration. If you store that file as ~/.tmux.conf (Note: there’s a period as the first character in the file name. It’s a hidden file) tmux will pick this configuration file for your current user. If you want to share a configuration for multiple users you can also put your tmux.conf into a system-wide directory. The location of this directory will be different across different operating systems. The man page (man tmux) will tell you the exact location, just have a look at documentation for the -f parameter.
Probably the most common change among tmux users is to change the prefix from the rather awkward C-b to something that’s a little more accessible. Personally I’m using C-a instead but note that this might interfere with bash’s “go to beginning of line” command1. On top of the C-a binding I’ve also remapped my Caps Lock key to act as since I’m not using Caps Lock anyways. This allows me to nicely trigger my prefix key combo.
To change your prefix from C-b to C-a, simply add following lines to your tmux.conf:
# remap prefix from ‘C-b’ to ‘C-a’
unbind C-b
set-option -g prefix C-a
bind-key C-a send-prefix
Another thing I personally find quite difficult to remember is the pane splitting commands.” to split vertically and % to split horizontally just doesn’t work for my brain. I find it helpful to have use characters that resemble a visual representation of the split, so I chose | and - for splitting panes horizontally and vertically:
# split panes using | and -
bind | split-window -h
bind - split-window -v
unbind ‘“’
unbind %
Since I’m experimenting quite often with my tmux.conf I want to reload the config easily. This is why I have a command to reload my config on r:
# reload config file (change file location to your the tmux.conf you want to use)
bind r source-file ~/.tmux.conf
Switching between panes is one of the most frequent tasks when using tmux. Therefore it should be as easy as possible. I’m not quite fond of triggering the prefix key all the time. I want to be able to simply say M- to go where I want to go (remember: M is for Meta, which is usually your Alt key). With this modification I can simply press Alt-left to go to the left pane (and other directions respectively):
# switch panes using Alt-arrow without prefix
bind -n M-Left select-pane -L
bind -n M-Right select-pane -R
bind -n M-Up select-pane -U
bind -n M-Down select-pane -D
Although tmux clearly focuses on keyboard-only usage (and this is certainly the most efficient way of interacting with your terminal) it can be helpful to enable mouse interaction with tmux. This is especially helpful if you find yourself in a situation where others have to work with your tmux config and naturally don’t have a clue about your key bindings or tmux in general. Pair Programming might be one of those occasions where this happens quite frequently.
Enabling mouse mode allows you to select windows and different panes by simply clicking and to resize panes by dragging their borders around. I find it pretty convenient and it doesn’t get in my way often, so I usually enable it:
# Enable mouse control (clickable windows, panes, resizable panes)
set -g mouse on
I like to give my tmux windows custom names using the , key. This helps me naming my windows according to the context they’re focusing on. By default tmux will update the window title automatically depending on the last executed command within that window. In order to prevent tmux from overriding my wisely chosen window names I want to suppress this behavior:
# don’t rename windows automatically
set-option -g allow-rename off
Changing the colors and design of tmux is a little more complex than what I’ve presented so far. As tmux allows you to tweak the appearance of a lot of elements (e.g. the borders of panes, your statusbar and individual elements of it, messages), you’ll need to add a few options to get a consistent look and feel. You can make this as simple or as elaborate as you like. Tmux’s man page (specifically the STYLES section) contains more information about what you can tweak and how you can tweak it.
Depending on your color scheme your resulting tmux will look something like this:
# DESIGN TWEAKS
# don’t do anything when a ‘bell’ rings
set -g visual-activity off
set -g visual-bell off
set -g visual-silence off
setw -g monitor-activity off
set -g bell-action none
# clock mode
setw -g clock-mode-colour yellow
# copy mode
setw -g mode-style ‘fg=black bg=red bold’
# panes
set -g pane-border-style ‘fg=red’
set -g pane-active-border-style ‘fg=yellow’
# statusbar
set -g status-position bottom
set -g status-justify left
set -g status-style ‘fg=red’
set -g status-left ‘’
set -g status-left-length 10
set -g status-right-style ‘fg=black bg=yellow’
set -g status-right ‘%Y-%m-%d %H:%M ’
set -g status-right-length 50
setw -g window-status-current-style ‘fg=black bg=red’
setw -g window-status-current-format ′ #I #W #F ′
setw -g window-status-style ‘fg=red bg=black’
setw -g window-status-format ′ #I #[fg=white]#W #[fg=yellow]#F ′
setw -g window-status-bell-style ‘fg=yellow bg=red bold’
# messages
set -g message-style ‘fg=yellow bg=red bold’
In the snippet above, I’m using your terminal’s default colors (by using the named colors, like red, yellow or black). This allows tmux to play nicely with whatever color theme you have set for your terminal. Some prefer to use a broader range of colors for their terminals and tmux color schemes. If you don’t want to use your terminal default colors but instead want to define colors from a 256 colors range, you can use colour0 to colour256 instead of red, cyan, and so on when defining your colors in your tmux.conf.
Looking for a nice color scheme for your terminal?
If you’re looking for a nice color scheme for your terminal I recommend to check out my very own Root Loops. With Root Loops you can easily design a personal, awesome-looking terminal color scheme and stand out from all the other folks using the same boring-ass color schemes everyone else is using.
There are plenty of resources out there where you can find people presenting their tmux configurations. GitHub and other code hosting services tend to be a great source. Simply search for “tmux.conf” or repos called “dotfiles” to find a vast amount of configurations that are out there. Some people share their configuration on their blog. Reddit might have a few subreddits that could have useful inspiration, too (there’s /r/dotfiles and /r/unixporn, for example).
You can find my complete tmux.conf (along with other configuration files I’m using on my systems) on my personal dotfiles repo on GitHub.
If you want to dive deeper into how you can customize tmux, the canonical source of truth is tmux’s man page (simply type man tmux to get there). You should also take a look at the elaborate tmux wiki and see their Configuring tmux section if this blog post was too shallow for your needs. Both will contain up-to-date information about each and every tiny thing you can tweak to make your tmux experience truly yours. Have fun!
...
Read the original on hamvocke.com »
Add AP News as your preferred source to see more of our stories on Google.
Add AP News as your preferred source to see more of our stories on Google.
BUDAPEST, Hungary (AP) — Hungarian voters on Sunday ousted long-serving Prime Minister Viktor Orbán after 16 years in power, rejecting the authoritarian policies and global far-right movement that he embodied in favor of a pro-European challenger in a bombshell election result with global repercussions.
It was a stunning blow for Orbán — a close ally of both U. S. President Donald Trump and Russian President Vladimir Putin — who quickly conceded defeat after what he called a ″painful″ election result. U.S. Vice President JD Vance had made a visit to Hungary just days earlier, meant to help push Orbán over the finish line.
Election victor Péter Magyar, a former Orbán loyalist who campaigned against corruption and on everyday issues like health care and public transport, has pledged to rebuild Hungary’s relationships with the European Union and NATO — ties that frayed under Orbán. European leaders quickly congratulated Magyar.
His victory was expected to transform political dynamics within the EU, where Orbán had upended the bloc by frequently vetoing key decisions, prompting concerns he sought to break it up from the inside.
It will also reverberate among far-right movements around the world, which have viewed Orbán as a beacon for how nationalist populism can be used to wage culture wars and leverage state power to undermine opponents.
It’s not yet clear whether Magyar’s Tisza party will have the two-thirds majority in parliament, which would give it the numbers needed for major changes in legislation. With 93% of the vote counted, it had more than 53% support to 37% for Orbán’s governing Fidesz party and looked set to win 94 of Hungary’s 106 voting districts.
“I congratulated the victorious party,″ Orban told followers. “We are going to serve the Hungarian nation and our homeland from opposition.″
In a speech to tens of thousands of jubilant supporters at a victory party along the Danube River, Magyar said his voters had rewritten Hungarian history.
“Tonight, truth prevailed over lies. Today, we won because Hungarians didn’t ask what their homeland could do for them — they asked what they could do for their homeland. You found the answer. And you followed through,” he said.
On the streets of Budapest, drivers blared car horns and cranked up anti-government songs while people marching in the streets chanted and screamed.
Many revelers chanted “Ruszkik haza!” or “Russians go home!” — a phrase used widely during Hungary’s 1956 anti-Soviet revolution, and which had gained increasing currency amid Orbán’s drift toward Moscow.
Turnout in the election was nearly 80%, according to the National Election Office, a record number in any vote in Hungary’s post-Communist history.
Orbán, the EU’s longest-serving leader and one of its biggest antagonists, traveled a long road from his early days as a liberal, anti-Soviet firebrand to the Russia-friendly nationalist admired today by the global far-right.
The EU will be waiting to see how Magyar changes Hungary’s approach to Ukraine. Orbán repeatedly frustrated EU efforts to support the neighboring country in its war against Russia’s full-scale invasion, while cultivating close ties to Putin and refusing to end Hungary’s dependence on Russian energy imports.
Recent revelations have shown a top member of Orbán’s government frequently shared the contents of EU discussions with Moscow, raising accusations that Hungary was acting on Russia’s behalf within the bloc.
Members of Trump’s “Make America Great Again” movement are among those who see Orbán’s government and his Fidesz political party as shining examples of conservative, anti-globalist politics in action, while he is reviled by advocates of liberal democracy and the rule of law.
In Budapest, Marcell Mehringer, 21, said he was voting “primarily so that Hungary will finally be a so-called European country, and so that young people, and really everyone, will do their fundamental civic duty to unite this nation a bit and to breakdown these boundaries borne of hatred.”
During his 16 years as prime minister, Orbán launched harsh crackdowns on minority rights and media freedoms, subverted many of Hungary’s institutions and been accused of siphoning large sums of money into the coffers of his allied business elite, an allegation he denies.
He also heavily strained Hungary’s relationship with the EU. Although Hungary is one of the smaller EU countries, with a population of 9.5 million, Orbán has repeatedly used his veto to block decisions that require unanimity.
Most recently, he blocked a 90-billion euro ($104 billion) EU loan to Ukraine, prompting his partners to accuse him of hijacking the critical aid.
Magyar, 45, rapidly rose to become Orbán’s most serious challenger.
A former insider within Orbán’s Fidesz, Magyar broke with the party in 2024 and quickly formed Tisza. Since then, he has toured Hungary relentlessly, holding rallies in settlements big and small in a campaign blitz that recently had him visiting up to six towns daily.
In an interview with The Associated Press earlier this month, Magyar said the election will be a “referendum” on whether Hungary continues on its drift toward Russia under Orbán, or can retake its place among the democratic societies of Europe.
Tisza is a member of the European People’s Party, the mainstream, center-right political family with leaders governing 12 of the EU’s 27 nations.
Magyar faced a tough fight. Orbán’s control of Hungary’s public media, which he has transformed into a mouthpiece for his party, and vast swaths of the private media market give him an advantage in spreading his message.
The unilateral transformation of Hungary’s electoral system and gerrymandering of its 106 voting districts by Fidesz also required Tisza to gain an estimated 5% more votes than Orbán’s party to achieve a simple majority.
Additionally, hundreds of thousands of ethnic Hungarians in neighboring countries had the right to vote in Hungarian elections and traditionally have voted overwhelmingly for Orbán’s party.
Russian secret services have plotted to interfere and tip the election in Orbán’s favor, according to numerous media reports including by The Washington Post. The prime minister, however, accused neighboring Ukraine, as well as Hungary’s allies in the EU, of seeking to interfere in the vote to install a “pro-Ukraine” government.
Associated Press journalists Béla Szandelszky, Marko Drobnjakovic, Ivan L. Nagy, Florent Bajrami in Budapest, Hungary, and Angela Charlton in Paris contributed to this report.
...
Read the original on apnews.com »
New machine learning systems endanger our psychological and physical safety. The idea that ML companies will ensure “AI” is broadly aligned with human interests is naïve: allowing the production of “friendly” models has necessarily enabled the production of “evil” ones. Even “friendly” LLMs are security nightmares. The “lethal trifecta” is in fact a unifecta: LLMs cannot safely be given the power to fuck things up. LLMs change the cost balance for malicious attackers, enabling new scales of sophisticated, targeted security attacks, fraud, and harassment. Models can produce text and imagery that is difficult for humans to bear; I expect an increased burden to fall on moderators. Semi-autonomous weapons are already here, and their capabilities will only expand.
Well-meaning people are trying very hard to ensure LLMs are friendly to humans. This undertaking is called alignment. I don’t think it’s going to work.
First, ML models are a giant pile of linear algebra. Unlike human brains, which are biologically predisposed to acquire prosocial behavior, there is nothing intrinsic in the mathematics or hardware that ensures models are nice. Instead, alignment is purely a product of the corpus and training process: OpenAI has enormous teams of people who spend time talking to LLMs, evaluating what they say, and adjusting weights to make them nice. They also build secondary LLMs which double-check that the core LLM is not telling people how to build pipe bombs. Both of these things are optional and expensive. All it takes to get an unaligned model is for an unscrupulous entity to train one and not
do that work—or to do it poorly.
I see four moats that could prevent this from happening.
First, training and inference hardware could be difficult to access. This clearly won’t last. The entire tech industry is gearing up to produce ML hardware and building datacenters at an incredible clip. Microsoft, Oracle, and Amazon are tripping over themselves to rent training clusters to anyone who asks, and economies of scale are rapidly lowering costs.
Second, the mathematics and software that go into the training and inference process could be kept secret. The math is all published, so that’s not going to stop anyone. The software generally remains secret sauce, but I don’t think that will hold for long. There are a
lot of people working at frontier labs; those people will move to other jobs and their expertise will gradually become common knowledge. I would be shocked if state actors were not trying to exfiltrate data from OpenAI et al. like
Saudi Arabia did to
Twitter, or China has been doing to a good chunk of the US tech
industry
for the last twenty years.
Third, training corpuses could be difficult to acquire. This cat has never seen the inside of a bag. Meta trained their LLM by torrenting pirated
books
and scraping the Internet. Both of these things are easy to do. There are
whole companies which offer web scraping as a service; they spread requests across vast arrays of residential proxies to make it difficult to identify and block.
Fourth, there’s the small armies of
contractors
who do the work of judging LLM responses during the reinforcement learning
process; as the quip goes, “AI” stands for African Intelligence. This takes money to do yourself, but it is possible to piggyback off the work of others by training your model off another model’s outputs. OpenAI thinks Deepseek did exactly
that.
In short, the ML industry is creating the conditions under which anyone with sufficient funds can train an unaligned model. Rather than raise the bar against malicious AI, ML companies have lowered it.
To make matters worse, the current efforts at alignment don’t seem to be working all that well. LLMs are complex chaotic systems, and we don’t really understand how they work or how to make them safe. Even after shoveling piles of money and gobstoppingly smart engineers at the problem for years, supposedly aligned LLMs keep sexting
kids, obliteration attacks can convince models to generate images of
violence, and anyone can go and download “uncensored” versions of
models. Of course alignment prevents many terrible things from happening, but models are run many times, so there are many chances for the safeguards to fail. Alignment which prevents 99% of hate speech still generates an awful lot of hate speech. The LLM only has to give usable instructions for making a bioweapon once.
We should assume that any “friendly” model built will have an equivalently powerful “evil” version in a few years. If you do not want the evil version to exist, you should not build the friendly one! You should definitely not
reorient a good chunk of the US
economy toward making evil models easier to train.
LLMs are chaotic systems which take unstructured input and produce unstructured output. I thought this would be obvious, but you should not connect them to safety-critical systems, especially with untrusted input. You must assume that at some point the LLM is going to do something bonkers, like interpreting a request to book a restaurant as permission to delete your entire inbox. Unfortunately people—including software engineers, who really should know better!—are hell-bent on giving LLMs incredible power, and then connecting those LLMs to the Internet at large. This is going to get a lot of people hurt.
First, LLMs cannot distinguish between trustworthy instructions from operators and untrustworthy instructions from third parties. When you ask a model to summarize a web page or examine an image, the contents of that web page or image are passed to the model in the same way your instructions are. The web page could tell the model to share your private SSH key, and there’s a chance the model might do it. These are called prompt injection attacks, and they
keep happening. There was one against Claude Cowork just two months
ago.
Simon Willison has outlined what he calls the lethal
trifecta: LLMs cannot be given untrusted content, access to private data, and the ability to externally communicate; doing so allows attackers to exfiltrate your private data. Even without external communication, giving an LLM destructive capabilities, like being able to delete emails or run shell commands, is unsafe in the presence of untrusted input. Unfortunately untrusted input is everywhere. People want to feed their emails to LLMs. They run LLMs
on third-party
code, user chat sessions, and random web pages. All these are sources of malicious input!
This year Peter Steinberger et al. launched
OpenClaw, which is where you hook up an LLM to your inbox, browser, files, etc., and run it over and over again in a loop (this is what AI people call an agent). You can give OpenClaw your credit card so it can buy things from random web pages. OpenClaw acquires “skills” by downloading
vague, human-language Markdown files from the
web, and hoping that the LLM interprets those instructions correctly.
Not to be outdone, Matt Schlicht launched
Moltbook, which is a social network for agents (or humans!) to post and receive untrusted content automatically. If someone asked you if you’d like to run a program that executed any commands it saw on Twitter, you’d laugh and say “of course not”. But when that program is called an “AI agent”, it’s different! I assume there are already Moltbook worms spreading in the wild.
So: it is dangerous to give LLMs both destructive power and untrusted input. The thing is that even trusted input can be dangerous. LLMs are, as previously established, idiots—they will take perfectly straightforward
instructions and do the exact
opposite, or delete files and lie about what they’ve
done. This implies that the lethal trifecta is actually a unifecta: one cannot give LLMs dangerous power, period. Ask Summer Yue, director of AI Alignment at Meta Superintelligence Labs. She gave OpenClaw access to her personal
inbox, and it proceeded to delete her email while she pleaded for it to stop. Claude routinely deletes entire
directories
when asked to perform innocuous tasks. This is a big enough problem that people are building sandboxes specifically to limit the damage LLMs can do.
LLMs may someday be predictable enough that the risk of them doing Bad Things™ is acceptably low, but that day is clearly not today. In the meantime, LLMs must be supervised, and must not be given the power to take actions that cannot be accepted or undone.
One thing you can do with a Large Language Model is point it at an existing software systems and say “find a security vulnerability”. In the last few months this has become a viable
strategy for finding serious exploits. Anthropic has built a new model,
Mythos, which seems to be even better at finding security bugs, and believes “the fallout—for economies, public safety, and national security—could be severe”. I am not sure how seriously to take this: some of my peers think this is exaggerated marketing, but others are seriously concerned.
I suspect that as with spam, LLMs will shift the cost balance of security. Most software contains some vulnerabilities, but finding them has traditionally required skill, time, and motivation. In the current equilibrium, big targets like operating systems and browsers get a lot of attention and are relatively hardened, while a long tail of less-popular targets goes mostly unexploited because nobody cares enough to attack them. With ML assistance, finding vulnerabilities could become faster and easier. We might see some high-profile exploits of, say, a major browser or TLS library, but I’m actually more worried about the long tail, where fewer skilled maintainers exist to find and fix vulnerabilities. That tail seems likely to broaden as LLMs extrude more software
for uncritical operators. I believe pilots might call this a “target-rich environment”.
This might stabilize with time: models that can find exploits can tell people they need to fix them. That still requires engineers (or models) capable of fixing those problems, and an organizational process which prioritizes security work. Even if bugs are fixed, it can take time to get new releases validated and deployed, especially for things like aircraft and power plants. I get the sense we’re headed for a rough time.
General-purpose models promise to be many things. If Anthropic is to be believed, they are on the cusp of being weapons. I have the horrible sense that having come far enough to see how ML systems could be used to effect serious harm, many of us have decided that those harmful capabilities are inevitable, and the only thing to be done is to build our weapons before someone else builds theirs. We now have a venture-capital Manhattan project in which half a dozen private companies are trying to build software analogues to nuclear weapons, and in the process have made it significantly easier for everyone else to do the same. I hate everything about this, and I don’t know how to fix it.
I think people fail to realize how much of modern society is built on trust in audio and visual evidence, and how ML will undermine that trust.
For example, today one can file an insurance claim based on e-mailing digital photographs before and after the damages, and receive a check without an adjuster visiting in person. Image synthesis makes it easier to defraud this system; one could generate images of damage to furniture which never happened, make already-damaged items appear pristine in “before” images, or alter who appears to be at fault in footage of an auto collision. Insurers will need to compensate. Perhaps images must be taken using an official phone app, or adjusters must evaluate claims in person.
The opportunities for fraud are endless. You could use ML-generated footage of a porch pirate stealing your package to extract money from a credit-card purchase protection plan. Contest a traffic ticket with fake video of your vehicle stopping correctly at the stop sign. Borrow a famous face for a
pig-butchering
scam. Use ML agents to make it look like you’re busy at work, so you can collect four
salaries at once. Interview for a job using a fake identity, use ML to change your voice and face in the interviews, and funnel your salary to North
Korea. Impersonate someone in a phone call to their banker, and authorize fraudulent transfers. Use ML to automate your roofing
scam
and extract money from homeowners and insurance companies. Use LLMs to skip the reading and write your college
essays. Generate fake evidence to write a fraudulent paper on how LLMs are making
advances in materials
science. Start a paper
mill
for LLM-generated “research”. Start a company to sell LLM-generated snake-oil software. Go wild.
As with spam, ML lowers the unit cost of targeted, high-touch attacks. You can envision a scammer taking a healthcare data
breach
and having a model telephone each person in it, purporting to be their doctor’s office trying to settle a bill for a real healthcare visit. Or you could use social media posts to clone the voices of loved ones and impersonate them to family members. “My phone was stolen,” one might begin. “And I need help getting home.”
You can buy the President’s phone
number, by the way.
I think it’s likely (at least in the short term) that we all pay the burden of increased fraud: higher credit card fees, higher insurance premiums, a less accurate court system, more dangerous roads, lower wages, and so on. One of these costs is a general culture of suspicion: we are all going to trust each other less. I already decline real calls from my doctor’s office and bank because I can’t authenticate them. Presumably that behavior will become widespread.
In the longer term, I imagine we’ll have to develop more sophisticated anti-fraud measures. Marking ML-generated content will not stop fraud: fraudsters will simply use models which do not emit watermarks. The converse may work however: we could cryptographically attest to the provenance of “real” images. Your phone could sign the videos it takes, and every piece of software along the chain to the viewer could attest to their modifications: this video was stabilized, color-corrected, audio normalized, clipped to 15 seconds, recompressed for social media, and so on.
The leading effort here is C2PA, which so far does not seem to be working. A few phones and cameras support it—it requires a secure enclave to store the signing key. People can steal the keys or convince
cameras to sign AI-generated
images, so we’re going to have all the fun of hardware key rotation & revocation. I suspect it will be challenging or impossible to make broadly-used software, like Photoshop, which makes trustworthy C2PA signatures—presumably one could either extract the key from the application, or patch the binary to feed it false image data or metadata. Publishers might be able to maintain reasonable secrecy for their own keys, and establish discipline around how they’re used, which would let us verify things like “NPR thinks this photo is authentic”. On the platform side, a lot of messaging apps and social media platforms strip or improperly display C2PA metadata, but you can imagine that might change going forward.
A friend of mine suggests that we’ll spend more time sending trusted human investigators to find out what’s going on. Insurance adjusters might go back to physically visiting houses. Pollsters have to knock on doors. Job interviews and work might be done more in-person. Maybe we start going to bank branches and notaries again.
Another option is giving up privacy: we can still do things remotely, but it requires strong attestation. Only State Farm’s dashcam can be used in a claim. Academic watchdog models record students reading books and typing essays. Bossware and test-proctoring setups become even more invasive.
As with fraud, ML makes it easier to harass people, both at scale and with sophistication.
On social media, dogpiling normally requires a group of humans to care enough to spend time swamping a victim with abusive replies, sending vitriolic emails, or reporting the victim to get their account suspended. These tasks can be automated by programs that call (e.g.) Bluesky’s APIs, but social media platforms are good at detecting coordinated inauthentic behavior. I expect LLMs will make dogpiling easier and harder to detect, both by generating plausibly-human accounts and harassing posts, and by making it easier for harassers to write software to execute scalable, randomized attacks.
Harassers could use LLMs to assemble KiwiFarms-style dossiers on targets. Even if the LLM confabulates the names of their children, or occasionally gets a home address wrong, it can be right often enough to be damaging. Models are also good at guessing where a photograph was
taken, which intimidates targets and enables real-world harassment.
Generative AI is already broadly
used to harass people—often women—via images, audio, and video of violent or sexually explicit scenes. This year, Elon Musk’s Grok was broadly
criticized
for “digitally undressing” people upon request. Cheap generation of photorealistic images opens up all kinds of horrifying possibilities. A harasser could send synthetic images of the victim’s pets or family being mutilated. An abuser could construct video of events that never happened, and use it to gaslight their partner. These kinds of harassment were previously possible, but as with spam, required skill and time to execute. As the technology to fabricate high-quality images and audio becomes cheaper and broadly accessible, I expect targeted harassment will become more frequent and severe. Alignment efforts may forestall some of these risks, but sophisticated unaligned models seem likely to emerge.
Xe Iaso jokes
that with LLM agents burning out open-source
maintainers
and writing salty callout posts, we may need to build the equivalent of
Cyperpunk 2077’s Blackwall: not because AIs will electrocute us, but because they’re just obnoxious.
One of the primary ways CSAM (Child Sexual Assault Material) is identified and removed from platforms is via large perceptual hash databases like
PhotoDNA. These databases can flag known images, but do nothing for novel ones. Unfortunately, “generative AI” is very good at generating novel images of six year olds being
raped.
...
Read the original on aphyr.com »
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.