10 interesting stories served every morning and every evening.
Probably you already saw how it all turned out. On the very same day that Altman offered public support to Amodei, he signed a deal to take away Amodei’s business, with a deal that wasn’t all that different. You can’t get more Altman than that.
But here’s the kicker: Per The New York Times,
Let that sink in. Altman had secretly been working on the deal since Wednesday.
- before he announced his support for Dario
- but after Brockman had donated 25M to Trump’s PAC
It was all theatre. Dario never had a chance.
It’s one thing for the government to reject Anthropic’s terms—and entirely another to banish them permanently and, absurdly and punitively declare them a supply chain risk. Worse, they did it in favor of someone else who took pretty similar terms and happened to have given more campaign contributions.
Anthropic deserves a chance at EXACTLY the same terms; anything else reeks of corruption.
I am no fan of Amodei. I think he often overhypes things, many of which I have publicly challenged. The company ripped off a lot of writer’s work (per the $1.5B settlement), and recently walked back its core safety pledge.
But I believe in fair play. This wasn’t that.
It sure look like the US is transitioning from the former to the latter.
...
Read the original on garymarcus.substack.com »
The engineer shipped seven features in a single sprint. DORA metrics looked immaculate. The promotion packet practically wrote itself.
Six months later, an architectural change required modifying those features. No one on the team could explain why certain components existed or how they interacted. The engineer who built them stared at her own code like a stranger’s.
Code has become cheaper to produce than to perceive.
When an engineer writes code manually, two parallel processes occur. The first is production: characters appear in files, tests get written, systems change. The second is absorption: mental models form, edge cases become intuitive, architectural relationships solidify into understanding. These processes are coupled. The act of typing forces engagement. The friction of implementation creates space for reasoning.
AI-assisted development decouples these processes. A prompt generates hundreds of lines in seconds. The engineer reviews, adjusts, iterates. Output accelerates. But absorption cannot accelerate proportionally. The cognitive work of truly understanding what was built, why it was built that way, and how it relates to everything else remains bounded by human processing speed.
This gap between output velocity and comprehension velocity is cognitive debt.
Unlike technical debt, which surfaces through system failures or maintenance costs, cognitive debt remains invisible to velocity metrics. The code works. The tests pass. The features ship. The deficit exists only in the minds of the engineers who built the system, manifesting as uncertainty about their own work.
The debt is not truly invisible. It eventually appears in reliability metrics: Mean Time to Recovery stretches longer, Change Failure Rate creeps upward. But these are lagging indicators, separated by months from the velocity metrics that drive quarterly decisions. By the time MTTR signals a problem, the comprehension deficit has already compounded.
Engineering performance systems evolved to measure observable outputs. Story points completed. Features shipped. Commits merged. Review turnaround time. These metrics emerged from an era when output and comprehension were tightly coupled, when shipping something implied understanding something.
The metrics never measured comprehension directly because comprehension was assumed. An engineer who shipped a feature was presumed to understand that feature. The presumption held because the production process itself forced understanding.
That presumption no longer holds. An engineer can now ship features while maintaining only surface familiarity with their implementation. The features work. The metrics register success. The organizational knowledge that would traditionally accumulate alongside those features simply does not form at the same rate.
Performance calibration committees see velocity improvements. They do not see comprehension deficits. They cannot, because no artifact of the organizational measurement system captures that dimension.
The discussion of cognitive debt typically focuses on the engineer who generates code. The more acute problem sits with the engineer who reviews it.
Code review evolved as a quality gate. A senior engineer examines a junior engineer’s work, catching errors, suggesting improvements, transferring knowledge. The rate-limiting factor was always the junior engineer’s output speed. Senior engineers could review faster than juniors could produce.
AI-assisted development inverts this relationship. A junior engineer can now generate code faster than a senior engineer can critically audit it. The volume of generated code exceeds the bandwidth available for deep review. Something has to give, and typically it is review depth.
The reviewer faces an impossible choice. Maintain previous review standards and become a bottleneck that negates the velocity gains AI provides. Or approve code at the rate it arrives and hope the tests catch what the review missed. Most choose the latter, often unconsciously, because organizational pressure favors throughput.
This is where cognitive debt compounds fastest. The author’s comprehension deficit might be recoverable through later engagement with the code. The reviewer’s comprehension deficit propagates: they approved code they do not fully understand, which now carries implicit endorsement. The organizational assumption that reviewed code is understood code no longer holds.
Engineers working extensively with AI tools report a specific form of exhaustion that differs from traditional burnout. Traditional burnout emerges from sustained cognitive load, from having too much to hold in mind while solving complex problems. The new pattern emerges from something closer to cognitive disconnection.
The work happens quickly. Progress is visible. But the engineer experiences a persistent sense of not quite grasping their own output. They can execute, but explanation requires reconstruction. They can modify, but prediction becomes unreliable. The system they built feels slightly foreign even as it functions correctly.
This creates a distinctive psychological state: high output combined with low confidence. Engineers produce more while feeling less certain about what they have produced. In organizations that stack-rank based on visible output, this creates pressure to continue generating despite the growing uncertainty.
The engineer who pauses to deeply understand what they built falls behind in velocity metrics. The engineer who prioritizes throughput over comprehension meets their quarterly objectives. The incentive structure selects for the behavior that accelerates cognitive debt accumulation.
Knowledge in engineering organizations exists in two forms. The first is explicit: documentation, design documents, recorded decisions. The second is tacit: understanding held in the minds of people who built and maintained systems over time. Tacit knowledge cannot be fully externalized because much of it exists as intuition, pattern recognition, and contextual judgment that formed through direct engagement with the work.
When the people who built a system leave or rotate to new projects, tacit knowledge walks out with them. Organizations traditionally replenished this knowledge through the normal process of engineering work. New engineers building on existing systems developed their own tacit understanding through the friction of implementation.
AI-assisted development potentially short-circuits this replenishment mechanism. If new engineers can generate working modifications without developing deep comprehension, they never form the tacit knowledge that would traditionally accumulate. The organization loses knowledge not just through attrition but through insufficient formation.
This creates a delayed failure mode. The system continues to function. New features continue to ship. But the reservoir of people who truly understand the system gradually depletes. When circumstances eventually require that understanding, when something breaks in an unexpected way or requirements change in a way that demands architectural reasoning, the organization discovers the deficit.
The first involves the reversal of a normally reliable heuristic. Engineers typically trust code that has been in production for years. If it survived that long, it probably works. The longer code exists without causing problems, the more confidence it earns. AI-generated code inverts this pattern. The longer it remains untouched, the more dangerous it becomes, because the context window of the humans around it has closed completely. Code that was barely understood when written becomes entirely opaque after the people who wrote it have moved on.
The second failure mode surfaces during incidents. An alert fires at 3:00 AM. The on-call engineer opens a system they did not build, generated by tools they did not supervise, documented in ways that assume familiarity they do not possess. They are debugging a black box written by a black box. What would have been a ten-minute fix when someone understood the system becomes a four-hour forensic investigation when no one does. Multiply this across enough incidents and the aggregate cost exceeds whatever velocity gains the AI-assisted development provided.
The third failure mode operates on a longer timescale. Junior engineers who rely primarily on AI-assisted development never develop the intuition that comes from manual implementation. They ship features without forming the scar tissue that informs architectural judgment. The organization is effectively trading its pipeline of future Staff Engineers for this quarter’s feature delivery. The cost does not appear in current headcount models because the people who would have become senior architects five years from now are not yet absent. They simply never form.
From the perspective of engineering leadership, AI-assisted development presents as productivity gain. Teams ship faster. Roadmaps compress. Headcount discussions become more favorable. These are the observable signals that propagate upward through organizational reporting structures.
The cognitive debt accumulating in those teams does not present as a signal. There is no metric for “engineers who can explain their own code without re-reading it.” There is no dashboard for “organizational comprehension depth.” The concept does not fit into quarterly business review formats or headcount justification narratives.
Directors make decisions based on observable signals. When those signals uniformly indicate success, the decision to double down on the approach that produced those signals is rational within the information environment available to leadership. The decision is not wrong given the data. The data is incomplete.
The cognitive debt framing does not apply uniformly across all engineering work. Some tasks genuinely are mechanical. Some codebases genuinely benefit from rapid iteration without deep architectural understanding. Some features genuinely do not require the level of comprehension that would traditionally form through manual implementation.
The model also assumes that comprehension was previously forming at adequate rates. This assumption may be generous. Engineers have always varied in how deeply they understood their own work. The distribution may simply be shifting rather than a new phenomenon emerging.
Additionally, tooling and documentation practices may evolve to partially close the comprehension gap. If organizations develop methods for capturing and transmitting the understanding that AI-assisted development fails to form organically, the debt may prove manageable rather than accumulative.
The fundamental challenge is that organizations cannot optimize for what they cannot measure. Velocity is measurable. Comprehension is not, or at least not through any mechanism that currently feeds into performance evaluation, promotion decisions, or headcount planning.
Until comprehension becomes legible to organizational decision-making systems, the incentive structure will continue to favor velocity. Engineers who prioritize understanding over output will appear less productive than peers who prioritize output over understanding. Performance calibration will reward the behavior that accumulates debt faster.
This is not a failure of individual managers or engineers. It is a measurement system designed for an era when production and comprehension were coupled, operating in an era when that coupling no longer holds. The system is optimizing correctly for what it measures. What it measures no longer captures what matters.
The gap will eventually manifest. Whether through maintenance costs that exceed projections, through incidents that require understanding no one possesses, or through new requirements that expose the brittleness of systems built without deep comprehension. The timing and form of manifestation remain uncertain. The underlying dynamic does not.
...
Read the original on www.rockoder.com »
This is a brief guide to my new art project microgpt, a single file of 200 lines of pure Python with no dependencies that trains and inferences a GPT. This file contains the full algorithmic content of what is needed: dataset of documents, tokenizer, autograd engine, a GPT-2-like neural network architecture, the Adam optimizer, training loop, and inference loop. Everything else is just efficiency. I cannot simplify this any further. This script is the culmination of multiple projects (micrograd, makemore, nanogpt, etc.) and a decade-long obsession to simplify LLMs to their bare essentials, and I think it is beautiful 🥹. It even breaks perfectly across 3 columns:
Where to find it:
This GitHub gist has the full source code: microgpt.py
It’s also available on this web page: https://karpathy.ai/microgpt.html
Also available as a Google Colab notebook
The following is my guide on stepping an interested reader through the code.
The fuel of large language models is a stream of text data, optionally separated into a set of documents. In production-grade applications, each document would be an internet web page but for microgpt we use a simpler example of 32,000 names, one per line:
# Let there be an input dataset `docs`: list[str] of documents (e.g. a dataset of names)
if not os.path.exists(‘input.txt’):
import urllib.request
names_url = ’https://raw.githubusercontent.com/karpathy/makemore/refs/heads/master/names.txt’
urllib.request.urlretrieve(names_url, ‘input.txt’)
docs = [l.strip() for l in open(‘input.txt’).read().strip().split(‘\n’) if l.strip()] # list[str] of documents
random.shuffle(docs)
print(f”num docs: {len(docs)}“)
The dataset looks like this. Each name is a document:
The goal of the model is to learn the patterns in the data and then generate similar new documents that share the statistical patterns within. As a preview, by the end of the script our model will generate (“hallucinate”!) new, plausible-sounding names. Skipping ahead, we’ll get:
It doesn’t look like much, but from the perspective of a model like ChatGPT, your conversation with it is just a funny looking “document”. When you initialize the document with your prompt, the model’s response from its perspective is just a statistical document completion.
Under the hood, neural networks work with numbers, not characters, so we need a way to convert text into a sequence of integer token ids and back. Production tokenizers like tiktoken (used by GPT-4) operate on chunks of characters for efficiency, but the simplest possible tokenizer just assigns one integer to each unique character in the dataset:
# Let there be a Tokenizer to translate strings to discrete symbols and back
uchars = sorted(set(‘’.join(docs))) # unique characters in the dataset become token ids 0..n-1
BOS = len(uchars) # token id for the special Beginning of Sequence (BOS) token
vocab_size = len(uchars) + 1 # total number of unique tokens, +1 is for BOS
print(f”vocab size: {vocab_size}“)
In the code above, we collect all unique characters across the dataset (which are just all the lowercase letters a-z), sort them, and each letter gets an id by its index. Note that the integer values themselves have no meaning at all; each token is just a separate discrete symbol. Instead of 0, 1, 2 they might as well be different emoji. In addition, we create one more special token called BOS (Beginning of Sequence), which acts as a delimiter: it tells the model “a new document starts/ends here”. Later during training, each document gets wrapped with BOS on both sides: [BOS, e, m, m, a, BOS]. The model learns that BOS initates a new name, and that another BOS ends it. Therefore, we have a final vocavulary of 27 (26 possible lowercase characters a-z and +1 for the BOS token).
Training a neural network requires gradients: for each parameter in the model, we need to know “if I nudge this number up a little, does the loss go up or down, and by how much?”. The computation graph has many inputs (the model parameters and the input tokens) but funnels down to a single scalar output: the loss (we’ll define exactly what the loss is below). Backpropagation starts at that single output and works backwards through the graph, computing the gradient of the loss with respect to every input. It relies on the chain rule from calculus. In production, libraries like PyTorch handle this automatically. Here, we implement it from scratch in a single class called Value:
class Value:
__slots__ = (‘data’, ‘grad’, ‘_children’, ‘_local_grads’)
def __init__(self, data, children=(), local_grads=()):
self.data = data # scalar value of this node calculated during forward pass
self.grad = 0 # derivative of the loss w.r.t. this node, calculated in backward pass
self._children = children # children of this node in the computation graph
self._local_grads = local_grads # local derivative of this node w.r.t. its children
def __add__(self, other):
other = other if isinstance(other, Value) else Value(other)
return Value(self.data + other.data, (self, other), (1, 1))
def __mul__(self, other):
other = other if isinstance(other, Value) else Value(other)
return Value(self.data * other.data, (self, other), (other.data, self.data))
def __pow__(self, other): return Value(self.data**other, (self,), (other * self.data**(other-1),))
def log(self): return Value(math.log(self.data), (self,), (1/self.data,))
def exp(self): return Value(math.exp(self.data), (self,), (math.exp(self.data),))
def relu(self): return Value(max(0, self.data), (self,), (float(self.data > 0),))
def __neg__(self): return self * -1
def __radd__(self, other): return self + other
def __sub__(self, other): return self + (-other)
def __rsub__(self, other): return other + (-self)
def __rmul__(self, other): return self * other
def __truediv__(self, other): return self * other**-1
def __rtruediv__(self, other): return other * self**-1
def backward(self):
topo = []
visited = set()
def build_topo(v):
if v not in visited:
visited.add(v)
for child in v._children:
build_topo(child)
topo.append(v)
build_topo(self)
self.grad = 1
for v in reversed(topo):
for child, local_grad in zip(v._children, v._local_grads):
child.grad += local_grad * v.grad
I realize that this is the most mathematically and algorithmically intense part and I have a 2.5 hour video on it: micrograd video. Briefly, a Value wraps a single scalar number (.data) and tracks how it was computed. Think of each operation as a little lego block: it takes some inputs, produces an output (the forward pass), and it knows how its output would change with respect to each of its inputs (the local gradient). That’s all the information autograd needs from each block. Everything else is just the chain rule, stringing the blocks together.
Every time you do math with Value objects (add, multiply, etc.), the result is a new Value that remembers its inputs (_children) and the local derivative of that operation (_local_grads). For example, __mul__ records that \(\frac{\partial(a \cdot b)}{\partial a} = b\) and \(\frac{\partial(a \cdot b)}{\partial b} = a\). The full set of lego blocks:
The backward() method walks this graph in reverse topological order (starting from the loss, ending at the parameters), applying the chain rule at each step. If the loss is \(L\) and a node \(v\) has a child \(c\) with local gradient \(\frac{\partial v}{\partial c}\), then:
\[\frac{\partial L}{\partial c} \mathrel{+}= \frac{\partial v}{\partial c} \cdot \frac{\partial L}{\partial v}\]
This looks a bit scary if you’re not comfortable with your calculus, but this is literally just multiplying two numbers in an intuitive way. One way to see it looks as follows: “If a car travels twice as fast as a bicycle and the bicycle is four times as fast as a walking man, then the car travels 2 x 4 = 8 times as fast as the man.” The chain rule is the same idea: you multiply the rates of change along the path.
We kick things off by setting self.grad = 1 at the loss node, because \(\frac{\partial L}{\partial L} = 1\): the loss’s rate of change with respect to itself is trivially 1. From there, the chain rule just multiplies local gradients along every path back to the parameters.
Note the += (accumulation, not assignment). When a value is used in multiple places in the graph (i.e. the graph branches), gradients flow back along each branch independently and must be summed. This is a consequence of the multivariable chain rule: if \(c\) contributes to \(L\) through multiple paths, the total derivative is the sum of contributions from each path.
After backward() completes, every Value in the graph has a .grad containing \(\frac{\partial L}{\partial v}\), which tells us how the final loss would change if we nudged that value.
Here’s a concrete example. Note that a is used twice (the graph branches), so its gradient is the sum of both paths:
a = Value(2.0)
b = Value(3.0)
c = a * b # c = 6.0
L = c + a # L = 8.0
L.backward()
print(a.grad) # 4.0 (dL/da = b + 1 = 3 + 1, via both paths)
print(b.grad) # 2.0 (dL/db = a = 2)
This is exactly what PyTorch’s .backward() gives you:
This is the same algorithm that PyTorch’s loss.backward() runs, just on scalars instead of tensors (arrays of scalars) - algorithmically identical, significantly smaller and simpler, but of course a lot less efficient.
Let’s spell what the .backward() gives us above. Autograd calculated that if L = a*b + a, and a=2 and b=3, then a.grad = 4.0 is telling us about the local influence of a on L. If you wiggle the inmput a, in what direction is L changing? Here, the derivative of L w.r.t. a is 4.0, meaning that if we increase a by a tiny amount (say 0.001), L would increase by about 4x that (0.004). Similarly, b.grad = 2.0 means the same nudge to b would increase L by about 2x that (0.002). In other words, these gradients tell us the direction (positive or negative depending on the sign), and the steepness (the magnitude) of the influence of each individual input on the final output (the loss). This then allows us to interately nudge the parameters of our neural network to lower the loss, and hence improve its predictions.
The parameters are the knowledge of the model. They are a large collection of floating point numbers (wrapped in Value for autograd) that start out random and are iteratively optimized during training. The exact role of each parameter will make more sense once we define the model architecture below, but for now we just need to initialize them:
n_embd = 16 # embedding dimension
n_head = 4 # number of attention heads
n_layer = 1 # number of layers
block_size = 16 # maximum sequence length
head_dim = n_embd // n_head # dimension of each head
matrix = lambda nout, nin, std=0.08: [[Value(random.gauss(0, std)) for _ in range(nin)] for _ in range(nout)]
state_dict = {‘wte’: matrix(vocab_size, n_embd), ‘wpe’: matrix(block_size, n_embd), ‘lm_head’: matrix(vocab_size, n_embd)}
for i in range(n_layer):
state_dict[f’layer{i}.attn_wq’] = matrix(n_embd, n_embd)
state_dict[f’layer{i}.attn_wk’] = matrix(n_embd, n_embd)
state_dict[f’layer{i}.attn_wv’] = matrix(n_embd, n_embd)
state_dict[f’layer{i}.attn_wo’] = matrix(n_embd, n_embd)
state_dict[f’layer{i}.mlp_fc1′] = matrix(4 * n_embd, n_embd)
state_dict[f’layer{i}.mlp_fc2′] = matrix(n_embd, 4 * n_embd)
params = [p for mat in state_dict.values() for row in mat for p in row]
print(f”num params: {len(params)}“)
...
Read the original on karpathy.github.io »
It was around January 2020. I became the head coach of a youth basketball team.
I was a few months into my first job out of college, and I was feeling… empty. I couldn’t explain why, so I set out to fill the void. I built side projects, went drinking with coworkers, and got really into the upcoming election. These all felt like things I should be doing as the yuppy I had just become, but the emptiness resided.
Indiana loves its basketball, so it was easy for me to find a local gym to play pickup. I became friendly with the regulars and the staff. One day, the athletic director told me they were looking for a volunteer assistant basketball coach for the middle school league. Being a former camp counselor, the idea intrigued me. Unexpectedly, the “assistant” position became the “head” position, and I was quickly thrown into a clinic, where I had to draft all my players by the end of the session.
Team drafted. 6 kids. 1 game per week. 2 practices per week. 14 parent emails (somehow). Practice starts tomorrow.
You know those bullshit leadership positions we all had in high school and college? Like how you were “VP of Operations” for some club, and all you did was order pizza? Yeah, this was not that. Getting thrown into an empty gym with six kids and two basketballs is a thrilling experience! I’m so grateful my buddy Clayton joined me as co-coach. I spent the whole day preparing for that 2-hour practice, and I think it showed. We learned each other’s names, had a solid skills assessment, set some ground rules, and had some fun with a little knockout.
Here’s the headline: I fucking loved being a coach. And, I don’t want to brag, but I was really good at it. We lost one game, our first game, and went undefeated after that. But improving each kid’s skill and confidence was the real mission. Instead of my desk job, I’d be asking Clayton how we could make Corey¹ use his body for rebounding. Or how Monte’s soccer skills could be best leveraged. Or how Evan, our best player, could become an on-court leader.
David, our self-deprecating goofball who insisted he’d be benched in the 4th quarter, made two game-changing dives for the ball in our last game. At the end of every game, I’d ask if anyone had any shoutouts to give about their teammates. Evan called David “a beast” for his dives, followed by rousing applause from the whole team. David’s smile is likely something I’ll be taking to my grave.
As the kids’ confidence grew, mine did too. I walked around the gym with a strut. I greeted their families with confident eye contact, remembering every word they said. I felt myself performing better in all parts of life: work, community, relationships… I became “that guy” who made shit happen in our friend group.
Heading into March, I was planning a surprise for the team. I had a contact at the Indiana Pacers, and I was scheming to get them to play on the court during a break in play. But to much ruin, Covid became a pandemic, leading to an abrupt end to the season, and an immediate global quarantine…
But that’s another story for another time. This is a story about happiness- and finding it at any point in your life.
I was happy when I was a youth basketball coach. And I find it notable to recall why I was happy:
* I love helping kids. I can’t exactly explain why… perhaps it’s biological. But I’m pretty darn good at it. I find it much easier to talk to kids than adults.
* I love being in the real world. If I taught those kids on Zoom… it wouldn’t have been the same. To travel somewhere, break a sweat, and high-five my team was such a gift.
* I love being in control. Coordinating practice, calling the plays, making substitutions- coaching let me steer my own ship. “Loving control” has its drawbacks. But succeeding within that control gave me real confidence and belief in myself.
* I love basketball. I can’t think of a better activity to learn about your mind, body, and role within a system. I’m a junkyard dog. When I get mad, I work extra hard doing the stuff nobody else wants to do. But I lose energy faster. I learned this from the game.
If I could give any advice to someone who needs it, I’d tell them to write down the things that have made them happy, and then explore why.
Why am I writing this now? I have a sense that many people in tech are feeling what I’m feeling. For years, you’ve sat in front of a rectangle, moving tinier rectangles, only to learn that AI can now move those rectangles 10x better. As someone outside the equity class, you begin to wonder what your role is in this new paradigm. And whether rectangles were ever your ticket to happiness in the first place.
When I was the age my basketball kids were, The Social Network came out. Like so many, I am a product of that generation. I accepted the propaganda that my value to this world only went as far as my product could scale. At 28, I’m finally beginning to challenge that.
Don’t get me wrong, I love tech. I think it’s magical. But I really hope to live in a world where my future kids find sitting in front of a rectangle all day to be dystopian and cringe. I really really really hope the invisible hand finds its way to getting me back into something I love. Maybe this “death of SaaS” talk, regardless of truth, is the wakeup call to so many us need.
¹ Names have been changed to protect their privacy.
...
Read the original on ben-mini.com »
When you’re building with AI agents, they should be treated as untrusted and potentially malicious. Whether you’re worried about prompt injection, a model trying to escape its sandbox, or something nobody’s thought of yet, regardless of what your threat model is, you shouldn’t be trusting the agent. The right approach isn’t better permission checks or smarter allowlists. It’s architecture that assumes agents will misbehave and contains the damage when they do.
OpenClaw runs directly on the host machine by default. It has an opt-in Docker sandbox mode, but it’s turned off out of the box, and most users never turn it on. Without it, security relies entirely on application-level checks: allowlists, confirmation prompts, a set of “safe” commands. These checks come from a place of implicit trust that the agent isn’t going to try to do something wrong. Once you adopt the mindset that an agent is potentially malicious, it’s obvious that application-level blocks aren’t enough. They don’t provide hermetic security. A determined or compromised agent can find ways around them.
In NanoClaw, container isolation is a core part of the architecture. Each agent runs in its own container, on Docker or an Apple Container on macOS. Containers are ephemeral, created fresh per invocation and destroyed afterward. The agent runs as an unprivileged user and can only see directories that have been explicitly mounted in. A container boundary is enforced by the OS.
Even when OpenClaw’s sandbox is enabled, all agents share the same container. You might have one agent as a personal assistant and another for work, in different WhatsApp groups or Telegram channels. They’re all in the same environment, which means information can leak between agents that are supposed to be accessing different data.
Agents shouldn’t trust each other any more than you trust them. In NanoClaw, each agent gets its own container, filesystem, and Claude session history. Your personal assistant can’t see your work agent’s data because they run in completely separate sandboxes.
All agents see everything
Agents isolated from each other
The container boundary is the hard security layer — the agent can’t escape it regardless of configuration. On top of that, a mount allowlist at ~/.config/nanoclaw/mount-allowlist.json acts as an additional layer of defense-in-depth: it exists to prevent the user from accidentally mounting something that shouldn’t be exposed, not to prevent the agent from breaking out. Sensitive paths (.ssh, .gnupg, .aws, .env, private_key, credentials) are blocked by default. The allowlist lives outside the project directory, so a compromised agent can’t modify its own permissions. The host application code is mounted read-only, so nothing an agent does can persist after the container is destroyed.
People in your groups shouldn’t be trusted either. Non-main groups are untrusted by default. Other groups, and the people in them, can’t message other chats, schedule tasks for other groups, or view other groups’ data. Anyone in a group could send a prompt injection, and the security model accounts for that.
OpenClaw has nearly half a million lines of code, 53 config files, and over 70 dependencies. This breaks the basic premise of open source security. Chromium has 35+ million lines, but you trust Google’s review processes. Most open source projects work the other way: they stay small enough that many eyes can actually review them. Nobody has reviewed OpenClaw’s 400,000 lines. It was written in weeks with no proper review process. Complexity is where vulnerabilities hide, and Microsoft’s analysis confirmed this: OpenClaw’s risks could emerge through normal API calls, because no one person could see the full picture.
NanoClaw is one process and a handful of files. We rely heavily on Anthropic’s Agent SDK, the wrapper around Claude Code, for session management, memory compaction, and a lot more, instead of reinventing the wheel. A competent developer can review the entire codebase in an afternoon. This is a deliberate constraint, not a limitation. Our contribution guidelines accept bug fixes, security fixes, and simplifications only.
New functionality comes through skills: instructions with a full working reference implementation that a coding agent merges into your codebase. You review exactly what code will be added before it lands. And you only add the integrations you actually need. Every installation ends up as a few thousand lines of code tailored to the owner’s exact requirements.
This is the real difference. With a monolithic codebase of 400,000 lines, even if you only enable two integrations, the rest of the code is still there. It’s still loaded, still part of your attack surface, still reachable by prompt injections and rogue agents. You can’t disentangle what’s active from what’s dormant. You can’t audit it because you can’t even define the boundary of what “your code” is. With skills, the boundary is obvious: it’s a few thousand lines, it’s all code you chose to add, and you can read every line of it. The core is actually getting smaller over time: WhatsApp support, for example, is being pulled out and packaged as a skill.
If a hallucination or a misbehaving agent can cause a security issue, then the security model is broken. Security has to be enforced outside the agentic surface, not depend on the agent behaving correctly. Containers, mount restrictions, and filesystem isolation all exist so that even when an agent does something unexpected, the blast radius is contained.
None of this eliminates risk. An AI agent with access to your data is inherently a high-risk arrangement. But the right response is to make that trust as narrow and as verifiable as possible. Don’t trust the agent. Build walls around it.
You can read NanoClaw’s source code and full security model; they’re short enough to read in an afternoon.
...
Read the original on nanoclaw.dev »
When you’re building with AI agents, they should be treated as untrusted and potentially malicious. Whether you’re worried about prompt injection, a model trying to escape its sandbox, or something nobody’s thought of yet, regardless of what your threat model is, you shouldn’t be trusting the agent. The right approach isn’t better permission checks or smarter allowlists. It’s architecture that assumes agents will misbehave and contains the damage when they do.
OpenClaw runs directly on the host machine by default. It has an opt-in Docker sandbox mode, but it’s turned off out of the box, and most users never turn it on. Without it, security relies entirely on application-level checks: allowlists, confirmation prompts, a set of “safe” commands. These checks come from a place of implicit trust that the agent isn’t going to try to do something wrong. Once you adopt the mindset that an agent is potentially malicious, it’s obvious that application-level blocks aren’t enough. They don’t provide hermetic security. A determined or compromised agent can find ways around them.
In NanoClaw, container isolation is a core part of the architecture. Each agent runs in its own container, on Docker or an Apple Container on macOS. Containers are ephemeral, created fresh per invocation and destroyed afterward. The agent runs as an unprivileged user and can only see directories that have been explicitly mounted in. A container boundary is enforced by the OS.
Even when OpenClaw’s sandbox is enabled, all agents share the same container. You might have one agent as a personal assistant and another for work, in different WhatsApp groups or Telegram channels. They’re all in the same environment, which means information can leak between agents that are supposed to be accessing different data.
Agents shouldn’t trust each other any more than you trust them. In NanoClaw, each agent gets its own container, filesystem, and Claude session history. Your personal assistant can’t see your work agent’s data because they run in completely separate sandboxes.
All agents see everything
Agents isolated from each other
The container boundary is the hard security layer — the agent can’t escape it regardless of configuration. On top of that, a mount allowlist at ~/.config/nanoclaw/mount-allowlist.json acts as an additional layer of defense-in-depth: it exists to prevent the user from accidentally mounting something that shouldn’t be exposed, not to prevent the agent from breaking out. Sensitive paths (.ssh, .gnupg, .aws, .env, private_key, credentials) are blocked by default. The allowlist lives outside the project directory, so a compromised agent can’t modify its own permissions. The host application code is mounted read-only, so nothing an agent does can persist after the container is destroyed.
People in your groups shouldn’t be trusted either. Non-main groups are untrusted by default. Other groups, and the people in them, can’t message other chats, schedule tasks for other groups, or view other groups’ data. Anyone in a group could send a prompt injection, and the security model accounts for that.
OpenClaw has nearly half a million lines of code, 53 config files, and over 70 dependencies. This breaks the basic premise of open source security. Chromium has 35+ million lines, but you trust Google’s review processes. Most open source projects work the other way: they stay small enough that many eyes can actually review them. Nobody has reviewed OpenClaw’s 400,000 lines. It was written in weeks with no proper review process. Complexity is where vulnerabilities hide, and Microsoft’s analysis confirmed this: OpenClaw’s risks could emerge through normal API calls, because no one person could see the full picture.
NanoClaw is one process and a handful of files. We rely heavily on Anthropic’s Agent SDK, the wrapper around Claude Code, for session management, memory compaction, and a lot more, instead of reinventing the wheel. A competent developer can review the entire codebase in an afternoon. This is a deliberate constraint, not a limitation. Our contribution guidelines accept bug fixes, security fixes, and simplifications only.
New functionality comes through skills: instructions with a full working reference implementation that a coding agent merges into your codebase. You review exactly what code will be added before it lands. And you only add the integrations you actually need. Every installation ends up as a few thousand lines of code tailored to the owner’s exact requirements.
This is the real difference. With a monolithic codebase of 400,000 lines, even if you only enable two integrations, the rest of the code is still there. It’s still loaded, still part of your attack surface, still reachable by prompt injections and rogue agents. You can’t disentangle what’s active from what’s dormant. You can’t audit it because you can’t even define the boundary of what “your code” is. With skills, the boundary is obvious: it’s a few thousand lines, it’s all code you chose to add, and you can read every line of it. The core is actually getting smaller over time: WhatsApp support, for example, is being pulled out and packaged as a skill.
If a hallucination or a misbehaving agent can cause a security issue, then the security model is broken. Security has to be enforced outside the agentic surface, not depend on the agent behaving correctly. Containers, mount restrictions, and filesystem isolation all exist so that even when an agent does something unexpected, the blast radius is contained.
None of this eliminates risk. An AI agent with access to your data is inherently a high-risk arrangement. But the right response is to make that trust as narrow and as verifiable as possible. Don’t trust the agent. Build walls around it.
You can read NanoClaw’s source code and full security model; they’re short enough to read in an afternoon.
...
Read the original on nanoclaw.dev »
Every MCP tool call in Claude Code dumps raw data into your 200K context window. A Playwright snapshot costs 56 KB. Twenty GitHub issues cost 59 KB. One access log — 45 KB. After 30 minutes, 40% of your context is gone.
Context Mode is an MCP server that sits between Claude Code and these outputs. 315 KB becomes 5.4 KB. 98% reduction.
MCP became the standard way for AI agents to use external tools. But there’s a tension at its core: every tool interaction fills the context window from both sides — definitions on the way in, raw output on the way out.
With 81+ tools active, 143K tokens (72%) get consumed before your first message. Then the tools start returning data. A single Playwright snapshot burns 56 KB. A gh issue list dumps 59 KB. Run a test suite, read a log file, fetch documentation — each response eats into what remains.
Cloudflare showed that tool definitions can be compressed by 99.9% with Code Mode. We asked: what about the other direction?
Each execute call spawns an isolated subprocess with its own process boundary. Scripts can’t access each other’s memory or state. The subprocess runs your code, captures stdout, and only that stdout enters the conversation context. The raw data — log files, API responses, snapshots — never leaves the sandbox.
Ten language runtimes are available: JavaScript, TypeScript, Python, Shell, Ruby, Go, Rust, PHP, Perl, R. Bun is auto-detected for 3-5x faster JS/TS execution.
Authenticated CLIs (gh, aws, gcloud, kubectl, docker) work through credential passthrough — the subprocess inherits environment variables and config paths without exposing them to the conversation.
The index tool chunks markdown content by headings while keeping code blocks intact, then stores them in a SQLite FTS5 (Full-Text Search 5) virtual table. Search uses BM25 ranking — a probabilistic relevance algorithm that scores documents based on term frequency, inverse document frequency, and document length normalization. Porter stemming is applied at index time so “running”, “runs”, and “ran” match the same stem.
When you call search, it returns exact code blocks with their heading hierarchy — not summaries, not approximations, the actual indexed content. fetch_and_index extends this to URLs: fetch, convert HTML to markdown, chunk, index. The raw page never enters context.
Validated across 11 real-world scenarios — test triage, TypeScript error diagnosis, git diff review, dependency audit, API response processing, CSV analytics. All under 1 KB output each.
Over a full session: 315 KB of raw output becomes 5.4 KB. Session time before slowdown goes from ~30 minutes to ~3 hours. Context remaining after 45 minutes: 99% instead of 60%.
Two ways. Plugin Marketplace gives you auto-routing hooks and slash commands:
Or MCP-only if you just want the tools:
You don’t change how you work. Context Mode includes a PreToolUse hook that automatically routes tool outputs through the sandbox. Subagents learn to use batch_execute as their primary tool. Bash subagents get upgraded to general-purpose so they can access MCP tools.
The practical difference: your context window stops filling up. Sessions that used to hit the wall at 30 minutes now run for 3 hours. The same 200K tokens, used more carefully.
I run the MCP Directory & Hub. 100K+ daily requests. See every MCP server that ships. The pattern was clear: everyone builds tools that dump raw data into context. Nobody was solving the output side.
Cloudflare’s Code Mode blog post crystallized it. They compressed tool definitions. We compress tool outputs. Same principle, other direction.
Built it for my own Claude Code sessions first. Noticed I could work 6x longer before context degradation. Open-sourced it.
...
Read the original on mksg.lu »
Every developer I know uses AI for coding now. The productivity gains are real, but there are costs that don’t show up on any dashboard.
Imagine a spectrum. On the far left are humans typing on the keyboard, seeing the code in the IDE. On the far right: AGI. It implements everything on its own. Cheaply, flawlessly, better than any human, and no human overseer is required. Somewhere between those two extremes there’s you, using AI, today. That threshold moves to the right every week as models improve, tools mature, and workflows get refined.
Recently I stumbled upon this awesome daxfohl comment on HN:
Which is higher risk, using AI too much, or using AI too little?
and it made me think about LLMs for coding differently, especially after reading what other devs share on AI adoption in different workplaces. You can be wrong in both directions, but is the desired amount of AI usage at work changing as the models improve?
Not long ago the first AI coding tools like Cursor (2023) or Copilot (2022) emerged. They were able to quickly index the codebase using RAG, so they had the local context. They had all the knowledge of the models powering them, so they had an external knowledge of the Internet as well. Googling and browsing StackOverflow wasn’t needed anymore. Cursor gave the users a custom IDE with built in AI powered autocomplete and other baked-in AI tools, like chat, to make the experience coherent.
Then came the agent promise. MCPs, autonomous workflows, articles about agents running overnight started to pop up left and right. It was a different use of AI than Cursor. It was no longer an AI-assisted human coding, but a human-assisted AI coding.
Many devs tried it and got burned. Agents made tons of small mistakes. The AI-first process required a complete paradigm shift in how devs think about coding, in order to achieve great results. Also, agents often got stuck in loops, hallucinate dependencies, and produced code that looks almost right but isn’t. You needed to learn about a completely new tech, fueled by FOMO. And this new shiny tool never got it 100% right on the first try.
Software used to be deterministic. You controlled it with if/else branches, explicit state machines, clear logic. The new reality is controlling the development process with prompts, system instructions, and CLAUDE.md files, and hope the model produces the output you expect.
Then Opus 4.5 came out.
The workflows everyone were talking about just worked, out of the box (not always, obviously, but more often). Engineers transitioned to Forward Deployed Engineers, becoming responsible for many other things than coding. Sometimes not even coding by hand at all. Recently Spotify’s co-CEO Gustav Söderström said
An engineer at Spotify on their morning commute from Slack on their cell phone can tell Claude to fix a bug or add a new feature to the iOS app. And once Claude finishes that work, the engineer then gets a new version of the app, pushed to them on Slack on their phone, so that he can then merge it to production, all before they even arrive at the office.”
I hope they at least review the code before merging.
The next stage is an (almost) full automation. That’s what many execs want and try to achieve. It’s a capitalistic wet dream, a worker that never sleeps, never gets tired, always wants to work, is infinitely productive. But Geoffrey Hinton predicted in 2016 that deep learning would outperform radiologists at image analysis within five years. Anthropic’s CEO predicted AI would write 90% of code within three to six months of March 2025. None of this happened as predicted. The trajectory is real, but the timeline keeps slipping.
In 2012, neuroscientist Manfred Spitzer published Digital Dementia, arguing that when we outsource mental tasks to digital devices, the brain pathways responsible for those tasks atrophy. Use it or lose it. Not all of this is proven scientifically, but neuroplasticity research shows the brain strengthens pathways that get used and weakens ones that don’t. The core principle of the book is that the cognitive skills that you stop practicing will decline.
Margaret-Anne Storey, a software engineering researcher, recently gave this a more precise name: cognitive debt. Technical debt lives in the code. Cognitive debt lives in developers’ heads. It’s the accumulated loss of understanding that happens when you build fast without comprehending what you built. She grounds it in Peter Naur’s 1985 theory that a program is a theory existing in developers’ minds, capturing what it does, how intentions map to implementation, and how it can evolve. When that theory fragments, the system becomes a black box.
Apply this directly to fully agentic coding. If you stop writing code and only review AI output, your ability to reason about code atrophies. Slowly, invisibly, but inevitably. You can’t deeply review what you can no longer deeply understand.
This isn’t just theory. A 2026 randomized study by Shen and Tamkin tested this directly: 52 professional developers learning a new async library were split into AI-assisted and unassisted groups. The AI group scored 17% lower on conceptual understanding, debugging, and code reading. The largest gap was in debugging, the exact skill you need to catch what AI gets wrong. One hour of passive AI-assisted work produced measurable skill erosion.
The insidious part is that you don’t notice the decline because the tool compensates for it. You feel productive. The PRs are shipping. Mihaly Csikszentmihalyi’s research on flow showed that the state of flow depends on a balance between challenge and skill. Your mind needs to be stretched just enough. Real flow produces growth. Rachel Thomas called what AI-assisted work produces “dark flow”, a term borrowed from gambling research, describing the trance-like state slot machines are designed to induce. You feel absorbed, but the challenge-skill balance is gone because the AI handles the challenge. It feels like the flow state of deep work, but the feedback loop is broken. You’re not getting better, you’re getting dependent.
There’s this observation that keeps coming up in HN comments: if the AI writes all the code and you only review it, where does the skill to review come from? You can’t have one without the other. You don’t learn to recognize good code by reading about it in a textbook, or a PR. You learn by writing bad code, getting it torn apart, and building intuition through years of practice.
This creates what I’d call the review paradox: the more AI writes, the less qualified humans become to review what it wrote. The Shen-Tamkin study puts numbers on this. Developers who fully delegated to AI finished tasks fastest but scored worst on evaluations. The novices who benefit most from AI productivity are exactly the ones who need debugging skills to supervise it, and AI erodes those skills first.
Storey’s proposed fix is simple: “require humans to understand each AI-generated change before deployment.” That’s the right answer. It’s also the one that gets skipped first when velocity is the metric.
This goes deeper than individual skill decay. We used to have juniors, mids, seniors, staff engineers, architects. It was a pipeline where each level built on years of hands-on struggle. A junior spends years writing code that is rejected during the code review not because they were not careful, but didn’t know better. It’s how you build the judgment that separates someone who can write a function from someone who can architect a system. You can’t become a senior overnight.
Unless you use AI, of course. Now, a junior with Claude Code (Opus 4.5+) delivers PRs that look like senior engineer work. And overall that’s a good thing, I think. But does it mean that the senior hat fits everyone now? From day one? But the head underneath hasn’t changed. That junior doesn’t know why that architecture was chosen. From my experience, sometimes CC misses a new DB transaction where it’s needed. Sometimes it creates a lock on a resource, that shouldn’t be locked, due to number of reasons. I can defend my decisions and I enjoy when my code is challenged, when reviewers disagree, and we have a discussion. What will a junior do? Ask Claude.
It’s a two-sided collapse. Seniors who stop writing code and only review AI output lose their own depth. Juniors who skip the struggle never build it. Organizations are spending senior time every day on reviews while simultaneously breaking the mechanisms that create it. The pipeline that produced senior engineers, writing bad code, getting bad code reviewed, building intuition through failure, is being bypassed entirely. Nobody’s talking about what happens when that pipeline runs dry.
Look at what lands on C-level desks every week. Microsoft’s AI chief Mustafa Suleyman says all white-collar work will be automated within 18 months. Anthropic’s CEO Dario Amodei predicts AI will replace software engineers in 6-12 months and quoted his engineers saying they don’t write any code anymore, just let the model write it and edit the output. Sundar Pichai (CEO, Google) reported 25% of Google’s new code was AI-generated in late 2024. Months later, Google Research reported that number had reached 50% of code characters. If you’re a CTO watching that curve, of course you’re going to push your teams.
The problem is that predictions come from people selling AI or trying to prop the stock with AI hype. They have every incentive to accelerate adoption and zero accountability when the timelines slip, which, historically, they always do. And “50% of code characters” at Google, a company that has built its own models, tooling, and infrastructure from scratch, says very little about what your team can achieve with off-the-shelf agents next Monday.
AI adoption is not a switch to flip, rather a skill to calibrate. It’s not as simple as mandating specific tools, setting “AI-first” policies, measuring developers on how much AI they use (/r/ExperiencedDevs is full of these stories). A lot of good practices like usage of design patterns, proper test coverage, manual testing before merging, are often skipped these days because it reduces the pace. AI broke it? AI will fix it. You need a review? AI will do it. Not even Greptile or CodeRabbit. Just delegate the PR to Claude Code reviewer agent. Or Gemini. Or Codex. Pick your poison.
And here’s what actually happens when you force the AI usage. One developer on r/ExperiencedDevs described their company tracking AI usage per engineer: “I just started asking my bots to do random things I don’t even care about. The other day I told Claude to examine random directories to ‘find bugs’ or answer questions I already knew the answer to.” This thread is full of engineers reporting that AI has made code reviews “infinitely harder due to the AI slop produced by tech leads who have been off the tools long enough to be dangerous.”
This is sad, because being able to work with the AI tools is a perk for developers and since it improves pace, it’s something management wants as well. It’s obvious that the people gaming the metrics (not really using the AI the way the should) would be fired on the spot if the management learned how they are gaming the metrics (and it’s fair), but they are gaming the metrics because they don’t want to be fired…
Who should be responsible for setting the threshold of AI usage at the company? What if your top performing engineer just refuses to use AI? What if the newly hired junior uses AI all the time? These are the new questions and management is trying to find an answer to them, but it’s not as simple as measuring the AI usage.
This is Goodhart’s law in action: “When a measure becomes a target, it ceases to be a good measure.” Track AI usage per engineer and you won’t get better engineering, you’ll get compliance theater. Developers game the metrics, resent the tools, and the actual productivity gains that AI could deliver get buried under organizational dysfunction.
The financial cost is obvious. Agent time for non-trivial features is measured in hours, and those hours aren’t free. But the human cost is potentially worse, and it’s barely discussed.
Writing code can put you in a flow state, mentioned before. That deep, focused, creative problem-solving where hours disappear and you emerge with something you built and understand. And you’re proud of it. Someone wrote under your PR “Good job!” and gave you an approval. Reviewing AI-generated code does not do this. It’s the opposite. It’s a mental drain.
Developers need the dopamine hit of creation. That’s not a perk, it’s what keeps good engineers engaged, learning, retained, and prevents burnout. The joy of coding is probably what allowed them to become experienced devs in the first place. Replace creation with oversight and you get faster burnout, not faster shipping. You’ve turned engineering, the creative work, into the worst form of QA. The AI does all the art, the human folds the laundry.
I use AI every day. I use AI heavily at work, I use AI in my sideprojects, and I don’t want to go back. I love it! That’s why I’m worried. I’m afraid I became addicted and dependent. I’ve implemented countless custom commands, skills, and agents. I check CC release notes daily. And I know many are in similar situation right now, and we all wonder about what the future brings. Are we going to replace ourselves with AI? Or will we be responsible for cleaning AI slop? What’s the right amount of AI usage for me?
AI is just a tool. An extraordinarily powerful one, but a tool nonetheless. You wouldn’t mandate that every engineer uses a specific IDE, or measure people on how many lines they write per day (…right?). You’d let them pick the tools that make them most effective and measure what actually matters, the work that ships.
The right amount of AI is not zero. And it’s not maximum.
The Shen-Tamkin study identified six distinct AI interaction patterns among developers. Three led to poor learning: full delegation, progressive reliance, and outsourcing debugging to AI. Three preserved learning even with full AI access: asking for explanations, posing conceptual questions, and writing code independently while using AI for clarification. The differentiator wasn’t whether developers used AI, it was whether they stayed cognitively engaged.
Software engineering was never just about typing code. It’s defining the problem well, understanding the problem, translating the language from business to product to code, clarifying ambiguity, making tradeoffs, understanding what breaks when you change something. Someone has to do that before AGI, and AGI is nowhere close (luckily). You’re on call, the phone rings at 3am, can you triage the issue without an agent? If not, you’ve probably taken AI coding too far. If the AI usage becomes a new performance metric of developer, maybe using AI too often, too much, should be discouraged as well? Not because these tools are bad, but because the coding skills are worth maintaining.
If you’re using no AI at all in 2026, you are leaving real gains on the table:
* Search and context. AI is genuinely better than Google for navigating unfamiliar codebases, understanding legacy code, and finding relevant patterns. This alone justifies having it in your workflow (since 2023, Cursor etc)
* Boilerplate and scaffolding. Writing the hundredth CRUD endpoint, config file, or test scaffold by hand when an agent can produce it in seconds isn’t craftsmanship, it’s stubbornness. Just use AI. You’re not a CRUD developer anymore anyway, because we all wear many hats these days (post 2025 Sonnet)
* The workflow itself. The investigate, plan, implement, test, validate cycle that works with customized agents is a real improvement in how features get delivered. Hours instead of days for non-trivial work. It’s not the 10x that was promised, but 2x or 4x on an established codebases is low-hanging fruit. You must understand the output though and all the decisions AI made! (post 2025 Opus 4.5)
* Exploration. “What does this module do? How does this API work? What would break if I changed this?” AI is excellent at these questions. It won’t replace reading the code, but it’ll get you to the right file in the right minute. (since 2023)
Refusing to use AI out of principle is as irrational as adopting it out of hype.
If you go all-in on autonomous AI coding (especially without learning how it all actually works), you risk something worse than slow velocity, you risk invisible degradation:
* Bugs that look like features. AI-generated code passes CI. The types check. The tests are green. And somewhere inside there’s a subtle logic error, a hallucinated edge case, a pattern that’ll collapse under load. In domains like finance or healthcare, a wrong number that doesn’t throw an error is worse than a crash. (less and less relevant, but still relevant)
* A codebase nobody understands. When the agent writes everything and humans only review, six months later nobody on the team can explain why the system is architected the way it is. The AI made choices. Nobody questioned them because the tests passed. Storey describes a student team that hit exactly this wall: they couldn’t make simple changes without breaking things, and the problem wasn’t messy code, it was that no one could explain why certain design decisions had been made. Her conclusion: “velocity without understanding is not sustainable.” (will always be a problem, IMO)
* Cognitive atrophy. Everything in the Digital Dementia section above. Skills you stop practicing will decline. (will always be a problem, IMO)
* The seniority pipeline drying up. Also covered above. This one takes years to manifest, which is exactly why nobody’s planning for it. (It’s a new problem, I have no idea what it looks like in the future)
* Burnout. Reviewing AI output all day without the dopamine of creation is not a sustainable job description. (Old problem, but potentially hits faster?)
Here’s what keeps me up at night. By every metric on every dashboard, AI-assisted human development and human-assisted AI development is improving. More PRs shipped. More features delivered. Faster cycle times. The charts go up and to the right.
But metrics don’t capture what’s happening underneath. The mental fatigue of reviewing code you didn’t write all day. The boredom of babysitting an agent instead of solving problems. The slow, invisible erosion of the hard skills that made you good at this job in the first place. You stop holding the architecture in your head because the agent handles it. You stop thinking through edge cases because the tests pass. You stop wanting to dig deep because it’s easier to prompt and approve. There’s no spark in you anymore.
In this meme the developers are the butter robot. The ones with no mental capacity to review the plans and PRs from AI, will only click Accept, instead of doing the creative, challenging work. Oh the irony.
Simon Willison, one of the most ambitious developer of our time, admitted this is already happening to him. On projects where he prompted entire features without reviewing implementations, he “no longer has a firm mental model of what they can do and how they work.”
And then, one day, the metrics start slipping… Not because the tool got worse, but because you did. Not from lack of effort, but from lack of practice. It’s a feedback loop that looks like progress right up until it doesn’t.
No executive wants to measure this. “What is the effect of AI usage on our engineers’ cognitive abilities over 18 months?” is not an easy KPI. It doesn’t fit in a quarterly review. It doesn’t get tracked, and what doesn’t get tracked doesn’t get managed, until it shows up as a production incident that nobody on the team can debug without an agent, and the agent can’t debug either.
I’m not anti-AI, I like it a lot. I’m addicted to prompting, I get high from it. I’m just worried that this new dependency degrades us over time, quietly, and nobody’s watching for it.
...
Read the original on tomwojcik.com »
Introduction: Fascism at the End of Industrial Civilization
This essay argues that the United States is drifting toward a distinctly twenty‑first‑century form of fascism driven not by mass parties in brownshirts, but by an oligarchic techno‑feudal elite. Neoliberal capitalism has hollowed out democratic institutions and concentrated power in a transnational “authoritarian international” of billionaires, security chiefs, and political fixers who monetize state power while shielding one another from accountability. At the same time, Big Tech platforms have become neo‑feudal estates that extract rent from our data and behavior, weaponize disinformation, and provide the surveillance backbone of an emerging global police state.
Drawing on the work of Robert Reich, William I. Robinson, Yanis Varoufakis, and others, alongside historian Heather Cox Richardson’s detailed account of Trump‑era patronage, whistleblower suppression, and DHS/ICE mega‑detention plans, the essay contends that America is rapidly constructing a system of concentration‑camp infrastructure and paramilitary policing designed to manage “surplus” populations and political dissent. Elite impunity, entrenched through national‑security exceptionalism, legal immunities, and revolving‑door careers, means that those directing lawless violence face virtually no consequences. Elections still happen, courts still sit, newspapers still publish, but substantive power is increasingly exercised by unelected oligarchs, tech lords, and security bureaucracies.
This authoritarian drift cannot be separated from the broader crisis of industrial civilization. Ecological overshoot, climate chaos, resource constraints, and structural economic stagnation have undermined the promise of endless growth on which liberal democracy once rested. Rather than using the remnants of industrial wealth to democratize a just transition, ruling elites are hardening borders, expanding carceral infrastructure, and building a security regime to contain “surplus” humanity in a world of shrinking energy and material throughput. America’s oligarchic techno‑feudal fascism is thus not an anomaly, but one plausible endgame of industrial civilization: a stratified order of gated enclaves above and camps and precarity below, designed to preserve elite power as the old industrial world comes apart.
The American republic was founded on a promise that power would be divided, constrained, and answerable: a written constitution, separated branches, periodic elections, and a Bill of Rights that set bright lines even the sovereign could not cross. That promise was always compromised by slavery, settler colonialism, and gendered exclusion, but it retained real, if uneven, force as a normative horizon. What has shifted over the past half‑century is not simply the familiar gap between creed and practice, but the underlying structure of the system itself: the center of gravity has moved from public institutions toward a private oligarchy whose wealth and leverage allow it to function as a parallel sovereign.
The neoliberal turn of the 1970s and 1980s marked the decisive inflection point. Deregulation, financial liberalization, the crushing of organized labor, and the privatization of public goods redistributed power and income upward on a historic scale. Trade liberalization and capital mobility allowed corporations and investors to pit governments and workers against one another, extracting subsidies and tax concessions under the permanent threat of capital flight. At the same time, Supreme Court decisions eroded limits on political spending, redefining “speech” as something that could be purchased in unlimited quantities by those with the means.
The result, as Robert Reich notes, has been the consolidation of an American oligarchy that “paved the road to fascism” by ensuring that public policy reflects donor preferences far more consistently than popular majorities. In issue after issue, such as taxation, labor law, healthcare, and environmental regulation, there is a clear skew: the wealthy get what they want more often than not, while broadly popular but redistributive policies routinely die in committee or are gutted beyond recognition. This is not a conspiracy in the melodramatic sense; it is how the wiring of the system now works.
William Robinson’s analysis of “twenty‑first‑century fascism” sharpens the point. Global capitalism in its current form generates chronic crises: overproduction, under‑consumption, ecological breakdown, and a growing population that capital cannot profitably employ. Under such conditions, democratic politics becomes dangerous for elites, because electorates might choose structural reforms such as wealth taxes, public ownership, strong unions, and Green New Deal‑style transitions that would curb profits. Faced with this prospect, segments of transnational capital begin to see authoritarian solutions as rational: better to hollow out democracy, harden borders, and construct a global police state than to accept serious redistribution.
American politics in the early twenty‑first century fits this pattern with unsettling precision. A decaying infrastructure, stagnant wages, ballooning personal debt, militarized policing, and permanent war have produced widespread disillusionment. As faith in institutions erodes, public life is flooded with resentment and nihilism that can be redirected against scapegoats (immigrants, racial minorities, feminists, and queer and trans people) rather than against the oligarchic‑power‑complex that profits from the decay. It is in this vacuum that a figure like Donald Trump thrives: a billionaire demagogue able to channel anger away from the class that actually governs and toward those even more marginalized.
The decisive shift from plutocratic dysfunction to fascist danger occurs when oligarchs cease to see constitutional democracy as even instrumentally useful and instead invest in movements openly committed to minority rule. Koch‑style networks, Mercer‑funded operations, and Silicon Valley donors willing to underwrite hard‑right projects are not supporting democracy‑enhancing reforms; they are building the infrastructure for authoritarianism, from voter suppression to ideological media to data‑driven propaganda. The system that emerges is hybrid: elections still occur, courts still sit, newspapers still publish, but substantive power is increasingly concentrated in unelected hands.
II. The “authoritarian international” and the shadow world of deals
Historian Heather Cox Richardson’s recent analysis captures a formation that much mainstream commentary still struggles to name: a transnational “authoritarian international” in which oligarchs, political operatives, royal families, security chiefs, and organized criminals cooperate to monetize state power while protecting one another from scrutiny. This is not a formal alliance; it is an overlapping ecology of relationships, exclusive vacations, investment vehicles, shell companies, foundations, and intelligence ties, through which information, favors, and money flow.
The key is that this network is structurally post‑ideological. As Robert Mueller warned in his 2011 description of an emerging “iron triangle” of politicians, businesspeople, and criminals, these actors are not primarily concerned with religion, nationality, or traditional ideology. They will work across confessional and national lines so long as the deals are lucrative and risk is manageably socialized onto others. Saudi royals invest alongside Western hedge funds; Russian oligarchs launder money through London property and American private equity; Israeli and Emirati firms collaborate with U. S. tech companies on surveillance products that are then sold worldwide.
Within this milieu, the formal distinction between public office and private interest blurs. Richardson’s analysis of Donald Trump’s abrupt reversal on the Gordie Howe International Bridge after a complaint by a billionaire competitor with ties to Jeffrey Epstein—reads less like the exercise of public policy judgment and more like feudal patronage: the sovereign intervenes to protect a favored lord’s toll road. Tiny shifts in regulatory posture or federal support can move billions of dollars; for those accustomed to having the president’s ear, such interventions are simply part of doing business.
The same logic governs foreign policy. The Trump‑Kushner axis exemplifies this fusion of public and private. When a whistleblower alleges that the Director of National Intelligence suppressed an intercept involving foreign officials discussing Jared Kushner and sensitive topics like Iran, and when the complaint is then choked off with aggressive redaction and executive privilege, we see the machinery of secrecy misused not to protect the national interest but to shield a member of the family‑cum‑business empire at the center of power. It is as if the state has become a family office with nuclear weapons.
Josh Marshall’s phrase “authoritarian international” is apt because it names both the class composition and the political function of this network. The same names recur across far‑right projects: donors and strategists who back nationalist parties in Europe, ultras in Latin America, Modi’s BJP in India, and the MAGA movement in the United States. Their interests are not identical, but they overlap around a shared agenda: weakening labor and environmental protections, undermining independent media and courts, militarizing borders, and securing immunity for themselves and their peers.
This world is lubricated by blackmail and mutually assured destruction. As Richardson notes, players often seem to hold compromising material on one another, whether in the form of documented sexual abuse, financial crime, or war crimes. This shared vulnerability paradoxically stabilizes the network: as long as everyone has something on everyone else, defection is dangerous, and a predatory equilibrium holds. From the standpoint of democratic publics, however, this stability is catastrophic, because it means that scandal—once a mechanism for enforcing norms—loses much of its power. When “everyone is dirty,” no one can be clean enough to prosecute the others without risking exposure.
III. Techno‑feudal aristocracy and the colonization of everyday life
Layered atop this transnational oligarchy is the digital order that Varoufakis and others describe as techno‑feudalism: a regime in which a handful of platforms function like neo‑feudal estates, extracting rent from their “serfs” (users, gig workers, content creators) rather than competing in open markets. This shift is more than metaphor. In classical capitalism, firms profited primarily by producing goods or services and selling them on markets where competitors could, in principle, undercut them. In the platform order, gatekeepers profit by controlling access to the marketplace itself, imposing opaque terms on those who must use their infrastructure to communicate, work, or even find housing.
This can be seen across sectors:
Social media platforms own the digital public square. They monetize attention by selling advertisers access to finely sliced demographic and psychographic segments, while their recommendation algorithms optimize for engagement, often by privileging outrage and fear.
Ride‑hailing and delivery apps control the interface between customers and labor, setting prices unilaterally and disciplining workers through ratings, algorithmic management, and the ever‑present threat of “deactivation.”
Cloud providers and app stores gatekeep access to the basic infrastructure upon which countless smaller firms depend, taking a cut of transactions and reserving the right to change terms or remove competitors from the ecosystem entirely.
In each case, the platform is less a company among companies and more a landlord among tenants, collecting tolls for the right to exist within its domain. Users produce the very capital stock, data, content, behavioral profiles, that platforms own and monetize, yet they have little say over how this material is used or how the digital environment is structured. The asymmetry of power is profound: the lords can alter the code of the world; the serfs can, at best, adjust their behavior to avoid algorithmic invisibility or sanction.
For authoritarian politics, this structure is a gift. First, platforms have become the primary vectors of disinformation and propaganda. Cambridge Analytica’s work for Trump in 2016, funded by billionaires like the Mercers, was an early prototype: harvest data, micro‑target individuals with tailored messaging, and flood their feeds with narratives designed to activate fear and resentment. Since then, the techniques have grown more sophisticated, and far‑right movements worldwide have learned to weaponize meme culture, conspiracy theories, and “shitposting” as recruitment tools.
Second, the same infrastructures that enable targeted advertising enable granular surveillance. Location data, social graphs, search histories, and facial‑recognition databases provide an unprecedented toolkit for monitoring and disciplining populations. In the hands of a regime sliding toward fascism, these tools can be turned against dissidents with terrifying efficiency: geofencing protests to identify attendees, scraping social media to build dossiers, using AI to flag “pre‑criminal” behavior. The emerging “global police state” that Robinson describes depends heavily on such techno‑feudal capacities.
Third, the digital order corrodes the very preconditions for democratic deliberation. Information overload, filter bubbles, and algorithmic amplification of sensational content produce a public sphere saturated with noise. Under these conditions, truth becomes just another aesthetic, and the distinction between fact and fiction collapses into vibes. This is the post‑modern nihilism you name: a sense that nothing is stable enough to believe in, that everything is spin. Fascist movements do not seek to resolve this condition; they weaponize it, insisting that only the Leader and his trusted media tell the real truth, while everything else is a hostile lie.
Finally, the techno‑feudal aristocracy’s material interests align with authoritarianism. Privacy regulations, antitrust enforcement, data localization rules, and strong labor rights all threaten platform profits. Democratic movements that demand such reforms are therefore adversaries. Conversely, strongman leaders who promise deregulation, tax breaks, and law‑and‑order crackdowns, even if they occasionally threaten specific firms, are often acceptable partners. The result is a convergence: oligarchs of data and oligarchs of oil, real estate, and finance finding common cause in an order that disciplines the many and exempts the few.
IV. Elite impunity and the machinery of lawlessness
Authoritarianism is not only about who holds power; it is about who is answerable for wrongdoing. A system where elites can violate laws with impunity while ordinary people are punished harshly for minor infractions is already halfway to fascism, whatever labels it wears. The United States has, over recent decades, constructed precisely such a system.
The Arab Center’s “Machinery of Impunity” report details how, in areas ranging from mass surveillance to foreign wars to domestic policing, senior officials who authorize illegal acts almost never face criminal consequences. Edward Snowden’s revelations exposed systemic violations of privacy and civil liberties, yet it was the whistleblower who faced prosecution and exile, not the architects of the programs. Torture during the “war on terror” was acknowledged, even documented in official reports, but those who designed and approved the torture regime kept their law licenses, academic posts, and media gigs. Lethal strikes on small boats in the Caribbean and Pacific, justified by secret intelligence and shielded by classified legal opinions, have killed dozens with no public evidence that the targets posed imminent threats.
This pattern is not an aberration but a feature. As a Penn State law review article notes, the U. S. legal system builds in multiple layers of protection for high officials: sovereign immunity, state secrets privilege, narrow standing rules, and prosecutorial discretion all combine to make it extraordinarily difficult to hold the powerful to account. Violations of the Hatch Act, campaign‑finance laws, or ethics rules are often treated as technicalities, and when reports do document unlawful behavior, as in the case of Mike Pompeo’s partisan abuse of his diplomatic office, there are “no consequences” beyond mild censure. Jamelle Bouie’s recent video essay for the New York Times drives the point home: America is “bad at accountability” because institutions have been designed and interpreted to favor elite impunity.
Richardson shows how this culture functions inside the national‑security state. A whistleblower complaint alleging that the Director of National Intelligence suppressed an intelligence intercept involving Jared Kushner and foreign officials was not allowed to run its course. Instead, it was bottled up, then transmitted to congressional overseers in a highly redacted form, with executive privilege invoked to shield the president’s involvement. The same mechanisms that insulate covert operations abroad from democratic oversight are deployed to protect domestic political allies from scrutiny.
Immigration enforcement offers another window. The Arab Center notes that ICE raids, family separation, and other abuses “escalated under the current Trump administration into highly visible kidnappings, abuse, and deportations” with little accountability for senior officials. The National Immigrant Justice Center documents a detention system where 90 percent of detainees are held in for‑profit facilities, where medical neglect, punitive solitary confinement, and preventable deaths are common, yet contracts are renewed and expanded. A culture of impunity allows agents and managers to treat rights violations not as career‑ending scandals but as acceptable collateral damage.
Latin American scholars of impunity warn that such selective enforcement produces a “quiet crisis of accountability” in which the rule of law is hollowed out from within. Laws remain on the books, but their application is skewed: harsh on the poor and marginalized, permissive toward the powerful. Over time, this normalizes the idea that some people are above the law, while others exist primarily as objects of control. When a polity internalizes this hierarchy, fascism no longer needs to arrive in jackboots; it is already present in the daily operations of the justice system.
The danger, as the Arab Center emphasizes, is that the costs of impunity “come home to roost.” Powers originally justified as necessary to fight terrorism or foreign enemies migrate back into domestic politics. Surveillance tools built for foreign intelligence monitoring are turned on activists and journalists; militarized police tactics perfected in occupied territories are imported into American streets. A population taught to accept lawless violence against outsiders (migrants, foreigners, enemy populations) is gradually conditioned to accept similar violence against internal opponents.
In this context of oligarchic capture, techno‑feudal control, and elite impunity, the rapid expansion of detention infrastructure and the deployment of paramilitary “federal agents” across the interior United States are not aberrations; they are central pillars of an emergent fascist order.
Richardson’s insistence on calling these facilities concentration camps is analytically exact. A concentration camp, in the historical sense, is not necessarily a death camp; it is a place where a state concentrates populations it considers threats or burdens, subjecting them to confinement, disease, abuse, and often death through neglect rather than industrialized extermination. By that definition, the sprawling network of ICE and Border Patrol detention centers, where people are warehoused for months to years, often in horrific conditions, qualifies.
New reporting details how this system is poised to scale up dramatically. An internal ICE memo, recently surfaced, outlines a $38 billion plan for a “new detention center model” that would, in one year, create capacity for roughly 92,600 people by purchasing eight “mega centers,” 16 processing centers, and 10 additional facilities. The largest of these warehouses would hold between 7,000 and 10,000 people each for average stays of about 60 days, more than double the size of the largest current federal prison. Separate reporting has mapped at least 23 industrial warehouses being surveyed for conversion into mass detention camps, with leases already secured at several sites.
Investigations by Amnesty International and others into prototype facilities have found detainees shackled in overcrowded cages, underfed, forced to use open‑air toilets that flood, and routinely denied medical care. Sexual assault and extortion by guards, negligent deaths, and at least one homicide have been documented. These are not accidents; they are predictable outcomes of a profit‑driven system where private contractors are paid per bed and oversight is weak, and of a political culture that dehumanizes migrants as “invaders” or “animals.”
Richardson highlights another crucial dimension: the way DHS has been retooled to project this violence into the interior as a form of political terror. Agents from ICE and Border Patrol, subdivisions of a relatively new department lacking the institutional restraints of the military, have been deployed in cities far from any border, often in unmarked vehicles, wearing masks and lacking visible identification. Secret legal memos under Trump gutted the traditional requirement of a judicial warrant for entering homes, replacing it with internal sign‑off by another DHS official, a direct violation of the Fourth Amendment’s protection against unreasonable searches and seizures.
This matters both instrumentally and symbolically. Instrumentally, it enables efficient mass raids and “snatch and grab” operations that bypass local law‑enforcement norms and judicial oversight. Symbolically, it communicates that the state reserves the right to operate as a lawless force, unconstrained by the very constitution it claims to defend. When masked, unidentified agents can seize people off the streets, shove them into unmarked vans, and deposit them in processing centers without due process, the aesthetic of fascism…thugs in the night…becomes reality.
Richardson rightly connects this to the post‑Reconstruction South, where paramilitary groups like the Ku Klux Klan, often tolerated or quietly aided by local officials, used terror to destroy a biracial democracy that had briefly flourished. Today’s difference is that communications technology allows rapid mobilization of witnesses and counter‑protesters: people can rush to the scene when agents arrive, document abuses on smartphones, and coordinate legal support. Yet even this can be folded into the logic of spectacle. The images of militarized agents confronting crowds under the glow of streetlights and police floodlamps serve as warnings: this is what happens when you resist.
The planned network of processing centers and mega‑warehouses adds another layer of menace. As Richardson points out, if the stated goal is deportation, there is no clear need for facilities capable of imprisoning tens of thousands for months. Part of the answer is coercive leverage: detained people are easier to pressure into abandoning asylum claims and accepting removal, especially when they are told, day after day, that they could walk free if they “just sign.” But the architecture also anticipates a future in which new categories of internal enemies, protesters, “Antifa,” “domestic extremists,” can be funneled into the same carceral estate once migrant flows diminish or political needs change.
Economically, the camps generate their own constituency. ICE and DHS tout job creation numbers to local officials, promising hundreds of stable, often union‑free positions in communities hollowed out by deindustrialization. Private prison firms and construction companies see lucrative contracts; investors see secure returns backed by federal guarantees. A web of stakeholders thus becomes materially invested in the continuation and expansion of mass detention. This is techno‑feudalism in concrete and razor wire: a carceral estate in which bodies are the rent‑producing asset.
Once such an estate exists, its logic tends to spread. Border‑style tactics migrate into ordinary policing; surveillance tools trialed on migrants are turned on domestic movements; legal doctrines crafted to justify raids and warrantless searches in the name of immigration control seep into other domains. The fascist gradient steepens: more people find themselves at risk of sudden disappearance into a system where rights are theoretical and violence is routine.
Arab Center Washington DC. “The Machinery of Impunity: How Washington’s Elite Stays Above the Law and How to End It.” December 2, 2025. https://arabcenterdc.org/resource/the-machinery-of-impunity-how-washingtons-elite-stays-above-the-law-and-how-to-end-it/.
Bouie, Jamelle. “Opinion | America Is Bad at Accountability.” New York Times video, January 5, 2026. https://www.nytimes.com/video/opinion/100000010627706/america-is-bad-at-accountability.html.
Courier Newsroom. “MAP: All 23 Industrial Warehouses ICE Wants to Turn into Detention ‘Death Camps’.” February 9, 2026. https://couriernewsroom.com/news/map-ice-detention-warehouse/.
Hampton Institute. “The End of an Empire: Systemic Decay and the Economic Foundation of American Fascism.” June 8, 2025. https://www.hamptonthink.org/read/the-end-of-an-empire-systemic-decay-and-the-economic-foundation-of-american-fascism.
“Impunity by Design: Latin America’s Quiet Crisis of Accountability.” Just Security, November 9, 2025. https://www.justsecurity.org/124089/impunity-by-design-latin-americas-quiet-crisis-of-accountability/.
Penn State Journal of Law & International Affairs. “Caught in the Act but Not Punished: On Elite Rule of Law and Impunity.” 2016. https://insight.dickinsonlaw.psu.edu/cgi/viewcontent.cgi?article=1144&context=jlia.
Reich, Robert. “How America’s Oligarchy Has Paved the Road to Fascism (Why American Democracy Is on the Brink).” Substack, January 4, 2024. https://robertreich.substack.com/p/the-american-oligarchy-why-is-american.
Robinson, William I. “Global Capitalism and Twenty-First Century Fascism: A U. S. Case Study.” Race & Class 48, no. 2 (2006): 13–30. https://robinson.faculty.soc.ucsb.edu/Assets/pdf/raceandclass.pdf.
Transnational Institute. “Follow the Money: The Business Interests Behind the Far Right.” February 2, 2026. https://www.tni.org/en/article/follow-the-money-the-business-interests-behind-the-far-right.
...
Read the original on collapseofindustrialcivilization.com »
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.