10 interesting stories served every morning and every evening.
The new Claude Opus 4.6 improves on its predecessor’s coding skills. It plans more carefully, sustains agentic tasks for longer, can operate more reliably in larger codebases, and has better code review and debugging skills to catch its own mistakes. And, in a first for our Opus-class models, Opus 4.6 features a 1M token context window in beta. Opus 4.6 can also apply its improved abilities to a range of everyday work tasks: running financial analyses, doing research, and using and creating documents, spreadsheets, and presentations. Within Cowork, where Claude can multitask autonomously, Opus 4.6 can put all these skills to work on your behalf.The model’s performance is state-of-the-art on several evaluations. For example, it achieves the highest score on the agentic coding evaluation Terminal-Bench 2.0 and leads all other frontier models on Humanity’s Last Exam, a complex multidisciplinary reasoning test. On GDPval-AA—an evaluation of performance on economically valuable knowledge work tasks in finance, legal, and other domains1—Opus 4.6 outperforms the industry’s next-best model (OpenAI’s GPT-5.2) by around 144 Elo points,2 and its own predecessor (Claude Opus 4.5) by 190 points. Opus 4.6 also performs better than any other model on BrowseComp, which measures a model’s ability to locate hard-to-find information online.As we show in our extensive system card, Opus 4.6 also shows an overall safety profile as good as, or better than, any other frontier model in the industry, with low rates of misaligned behavior across safety evaluations.Opus 4.6 is state-of-the-art on real-world work tasks across several professional domains.Opus 4.6 gets the highest score in the industry for deep, multi-step agentic search.In Claude Code, you can now assemble agent teams to work on tasks together. On the API, Claude can use compaction to summarize its own context and perform longer-running tasks without bumping up against limits. We’re also introducing adaptive thinking, where the model can pick up on contextual clues about how much to use its extended thinking, and new effort controls to give developers more control over intelligence, speed, and cost. We’ve made substantial upgrades to Claude in Excel, and we’re releasing Claude in PowerPoint in a research preview. This makes Claude much more capable for everyday work.Claude Opus 4.6 is available today on claude.ai, our API, and all major cloud platforms. If you’re a developer, use claude-opus-4-6 via the Claude API. Pricing remains the same at $5/$25 per million tokens; for full details, see our pricing page.We cover the model, our new product updates, our evaluations, and our extensive safety testing in depth below.We build Claude with Claude. Our engineers write code with Claude Code every day, and every new model first gets tested on our own work. With Opus 4.6, we’ve found that the model brings more focus to the most challenging parts of a task without being told to, moves quickly through the more straightforward parts, handles ambiguous problems with better judgment, and stays productive over longer sessions.Opus 4.6 often thinks more deeply and more carefully revisits its reasoning before settling on an answer. This produces better results on harder problems, but can add cost and latency on simpler ones. If you’re finding that the model is overthinking on a given task, we recommend dialing effort down from its default setting (high) to medium. You can control this easily with the /effort parameter.Here are some of the things our Early Access partners told us about Claude Opus 4.6, including its propensity to work autonomously without hand-holding, its success where previous models failed, and its effect on how teams work:
Claude Opus 4.6 is the strongest model Anthropic has shipped. It takes complicated requests and actually follows through, breaking them into concrete steps, executing, and producing polished work even when the task is ambitious. For Notion users, it feels less like a tool and more like a capable collaborator.Early testing shows Claude Opus 4.6 delivering on the complex, multi-step coding work developers face every day—especially agentic workflows that demand planning and tool calling. This starts unlocking long-horizon tasks at the frontier.Claude Opus 4.6 is a huge leap for agentic planning. It breaks complex tasks into independent subtasks, runs tools and subagents in parallel, and identifies blockers with real precision.Claude Opus 4.6 is the best model we’ve tested yet. Its reasoning and planning capabilities have been exceptional at powering our AI Teammates. It’s also a fantastic coding model — its ability to navigate a large codebase and identify the right changes to make is state of the art.Claude Opus 4.6 reasons through complex problems at a level we haven’t seen before. It considers edge cases that other models miss and consistently lands on more elegant, well-considered solutions. We’re particularly impressed with Opus 4.6 in Devin Review, where it’s increased our bug catching rates.Claude Opus 4.6 feels noticeably better than Opus 4.5 in Windsurf, especially on tasks that require careful exploration like debugging and understanding unfamiliar codebases. We’ve noticed Opus 4.6 thinks longer, which pays off when deeper reasoning is needed.Claude Opus 4.6 represents a meaningful leap in long-context performance. In our testing, we saw it handle much larger bodies of information with a level of consistency that strengthens how we design and deploy complex research workflows. Progress in this area gives us more powerful building blocks to deliver truly expert-grade systems professionals can trust.Across 40 cybersecurity investigations, Claude Opus 4.6 produced the best results 38 of 40 times in a blind ranking against Claude 4.5 models. Each model ran end to end on the same agentic harness with up to 9 subagents and 100+ tool calls.Claude Opus 4.6 is the new frontier on long-running tasks from our internal benchmarks and testing. It’s also been highly effective at reviewing code.Claude Opus 4.6 achieved the highest BigLaw Bench score of any Claude model at 90.2%. With 40% perfect scores and 84% above 0.8, it’s remarkably capable for legal reasoning.Claude Opus 4.6 autonomously closed 13 issues and assigned 12 issues to the right team members in a single day, managing a ~50-person organization across 6 repositories. It handled both product and organizational decisions while synthesizing context across multiple domains, and it knew when to escalate to a human.Claude Opus 4.6 is an uplift in design quality. It works beautifully with our design systems and it’s more autonomous, which is core to Lovable’s values. People should be creating things that matter, not micromanaging AI.Claude Opus 4.6 excels in high-reasoning tasks like multi-source analysis across legal, financial, and technical content. Box’s eval showed a 10% lift in performance, reaching 68% vs. a 58% baseline, and near-perfect scores in technical domains.Claude Opus 4.6 generates complex, interactive apps and prototypes in Figma Make with an impressive creative range. The model translates detailed designs and multi-layered tasks into code on the first try, making it a powerful starting point for teams to explore and build ideas.Claude Opus 4.6 is the best Anthropic model we’ve tested. It understands intent with minimal prompting and went above and beyond, exploring and creating details I didn’t even know I wanted until I saw them. It felt like I was working with the model, not waiting on it.Both hands-on testing and evals show Claude Opus 4.6 is a meaningful improvement for design systems and large codebases, use cases that drive enormous enterprise value. It also one-shotted a fully functional physics engine, handling a large multi-scope task in a single pass.Claude Opus 4.6 is the biggest leap I’ve seen in months. I’m more comfortable giving it a sequence of tasks across the stack and letting it run. It’s smart enough to use subagents for the individual pieces.Claude Opus 4.6 handled a multi-million-line codebase migration like a senior engineer. It planned up front, adapted its strategy as it learned, and finished in half the time.We only ship models in v0 when developers will genuinely feel the difference. Claude Opus 4.6 passed that bar with ease. Its frontier-level reasoning, especially with edge cases, helps v0 to deliver on our number-one aim: to let anyone elevate their ideas from prototype to production.The performance jump with Claude Opus 4.6 feels almost unbelievable. Real-world tasks that were challenging for Opus [4.5] suddenly became easy. This feels like a watershed moment for spreadsheet agents on Shortcut.Across agentic coding, computer use, tool use, search, and finance, Opus 4.6 is an industry-leading model, often by a wide margin. The table below shows how Claude Opus 4.6 compares to our previous models and to other industry models on a variety of benchmarks.Opus 4.6 is much better at retrieving relevant information from large sets of documents. This extends to long-context tasks, where it holds and tracks information over hundreds of thousands of tokens with less drift, and picks up buried details that even Opus 4.5 would miss.A common complaint about AI models is “context rot,” where performance degrades as conversations exceed a certain number of tokens. Opus 4.6 performs markedly better than its predecessors: on the 8-needle 1M variant of MRCR v2—a needle-in-a-haystack benchmark that tests a model’s ability to retrieve information “hidden” in vast amounts of text—Opus 4.6 scores 76%, whereas Sonnet 4.5 scores just 18.5%. This is a qualitative shift in how much context a model can actually use while maintaining peak performance.All in all, Opus 4.6 is better at finding information across long contexts, better at reasoning after absorbing that information, and has substantially better expert-level reasoning abilities in general.Finally, the charts below show how Claude Opus 4.6 performs on a variety of benchmarks that assess its software engineering skills, multilingual coding ability, long-term coherence, cybersecurity capabilities, and its life sciences knowledge.Opus 4.6 maintains focus over time and earns $3,050.53 more than Opus 4.5 on Vending-Bench 2.Opus 4.6 finds real vulnerabilities in codebases better than any other model.Opus 4.6 performs almost 2× better than Opus 4.5 on computational biology, structural biology, organic chemistry, and phylogenetics tests.These intelligence gains do not come at the cost of safety. On our automated behavioral audit, Opus 4.6 showed a low rate of misaligned behaviors such as deception, sycophancy, encouragement of user delusions, and cooperation with misuse. Overall, it is just as well-aligned as its predecessor, Claude Opus 4.5, which was our most-aligned frontier model to date. Opus 4.6 also shows the lowest rate of over-refusals—where the model fails to answer benign queries—of any recent Claude model.The overall misaligned behavior score for each recent Claude model on our automated behavioral audit (described in full in the Claude Opus 4.6 system card).For Claude Opus 4.6, we ran the most comprehensive set of safety evaluations of any model, applying many different tests for the first time and upgrading several that we’ve used before. We included new evaluations for user wellbeing, more complex tests of the model’s ability to refuse potentially dangerous requests, and updated evaluations of the model’s ability to surreptitiously perform harmful actions. We also experimented with new methods from interpretability, the science of the inner workings of AI models, to begin to understand why the model behaves in certain ways—and, ultimately, to catch problems that standard testing might miss.A detailed description of all capability and safety evaluations is available in the Claude Opus 4.6 system card.We’ve also applied new safeguards in areas where Opus 4.6 shows particular strengths that might be put to dangerous as well as beneficial uses. In particular, since the model shows enhanced cybersecurity abilities, we’ve developed six new cybersecurity probes—methods of detecting harmful responses—to help us track different forms of potential misuse.We’re also accelerating the cyberdefensive uses of the model, using it to help find and patch vulnerabilities in open-source software (as we describe in our new cybersecurity blog post). We think it’s critical that cyberdefenders use AI models like Claude to help level the playing field. Cybersecurity moves fast, and we’ll be adjusting and updating our safeguards as we learn more about potential threats; in the near future, we may institute real-time intervention to block abuse.We’ve made substantial updates across Claude, Claude Code, and the Claude Developer Platform to let Opus 4.6 perform at its best.On the API, we’re giving developers better control over model effort and more flexibility for long-running agents. To do so, we’re introducing the following features:Adaptive thinking. Previously, developers only had a binary choice between enabling or disabling extended thinking. Now, with adaptive thinking, Claude can decide when deeper reasoning would be helpful. At the default effort level (high), the model uses extended thinking when useful, but developers can adjust the effort level to make it more or less selective.Effort. There are now four effort levels to choose from: low, medium, high (default), and max. We encourage developers to experiment with different options to find what works best.Context compaction (beta). Long-running conversations and agentic tasks often hit the context window. Context compaction automatically summarizes and replaces older context when the conversation approaches a configurable threshold, letting Claude perform longer tasks without hitting limits.1M token context (beta). Opus 4.6 is our first Opus-class model with 1M token context. Premium pricing applies for prompts exceeding 200k tokens ($10/$37.50 per million input/output tokens).128k output tokens. Opus 4.6 supports outputs of up to 128k tokens, which lets Claude complete larger-output tasks without breaking them into multiple requests.US-only inference. For workloads that need to run in the United States, US-only inference is available at 1.1× token pricing.Across Claude and Claude Code, we’ve added features that allow knowledge workers and developers to tackle harder tasks with more of the tools they use every day.We’ve introduced agent teams in Claude Code as a research preview. You can now spin up multiple agents that work in parallel as a team and coordinate autonomously—best for tasks that split into independent, read-heavy work like codebase reviews. You can take over any subagent directly using Shift+Up/Down or tmux.Claude now also works better with the office tools you already use. Claude in Excel handles long-running and harder tasks with improved performance, and can plan before acting, ingest unstructured data and infer the right structure without guidance, and handle multi-step changes in one pass. Pair that with Claude in PowerPoint, and you can first process and structure your data in Excel, then bring it to life visually in PowerPoint. Claude reads your layouts, fonts, and slide masters to stay on brand, whether you’re building from a template or generating a full deck from a description. Claude in PowerPoint is now available in research preview for Max, Team, and Enterprise plans.
...
Read the original on www.anthropic.com »
Before you read this post, ask yourself a question: When was the last time you truly thought hard?
By “thinking hard,” I mean encountering a specific, difficult problem and spending multiple days just sitting with it to overcome it.
a) All the time. b) Never. c) Somewhere in between.
If your answer is (a) or (b), this post isn’t for you. But if, like me, your response is (c), you might get something out of this, if only the feeling that you aren’t alone.
First, a disclaimer: this post has no answers, not even suggestions. It is simply a way to vent something I’ve been feeling for the last few months.
I believe my personality is built on two primary traits:
The Builder (The desire to create, ship, and be pragmatic).
The Thinker (The need for deep, prolonged mental struggle).
The builder is pretty self explanatory, it’s motivated by velocity and utility. It is the part of me that craves the transition from “idea” to “reality.” It loves the dopamine hit of a successful deploy, the satisfaction of building systems to solve real problems, and the knowledge that someone, somewhere, is using my tool.
To explain the Thinker , I need to go back to my university days studying physics. Every now and then, we would get homework problems that were significantly harder than average. Even if you had a decent grasp of the subject, just coming up with an approach was difficult.
I observed that students fell into three categories when facing these problems (well, four, if you count the 1% of geniuses for whom no problem was too hard).
* Type 1: The majority. After a few tries, they gave up and went to the professor or a TA for help.
* Type 2: The Researchers. They went to the library to look for similar problems or insights to make the problem approachable. They usually succeeded.
I fell into the third category, which, in my experience, was almost as rare as the genius 1%. My method was simply to think. To think hard and long. Often for several days or weeks, all my non-I/O brain time was relentlessly chewing on possible ways to solve the problem, even while I was asleep.
This method never failed me. I always felt that deep prolonged thinking was my superpower. I might not be as fast or naturally gifted as the top 1%, but given enough time, I was confident I could solve anything. I felt a deep satisfaction in that process.
That satisfaction is why software engineering was initially so gratifying. It hit the right balance. It satisfied The Builder (feeling productive and pragmatic by creating useful things) and The Thinker (solving really hard problems). Thinking back, the projects where I grew the most as an engineer were always the ones with a good number of really hard problems that needed creative solutions.
But recently, the number of times I truly ponder a problem for more than a couple of hours has decreased tremendously.
Yes, I blame AI for this.
I am currently writing much more, and more complicated software than ever, yet I feel I am not growing as an engineer at all. When I started meditating on why I felt “stuck,” I realized I am starving The Thinker.
“Vibe coding” satisfies the Builder. It feels great to see to pass from idea to reality in a fraction of a time that would take otherwise. But it has drastically cut the times I need to came up with creative solutions for technical problems. I know many people who are purely Builders, for them this era is the best thing that ever happened. But for me, something is missing.
I know what you might be thinking: “If you can ‘vibe code’ your way through it, the problem wasn’t actually hard.”
I think that misses the point. It’s not that AI is good for hard problems, it’s not even that good for easy problems. I’m confident that my third manual rewrite of a module would be much better than anything the AI can output. But I am also a pragmatist.
If I can get a solution that is “close enough” in a fraction of the time and effort, it is irrational not to take the AI route. And that is the real problem: I cannot simply turn off my pragmatism.
At the end of the day, I am a Builder. I like building things. The faster I build, the better. Even if I wanted to reject AI and go back to the days where the Thinker’s needs were met by coding, the Builder in me would struggle with the inefficiency.
Even though the AI almost certainly won’t come up with a 100% satisfying solution, the 70% solution it achieves usually hits the “good enough” mark.
To be honest, I don’t know. I am still figuring it out.
I’m not sure if my two halves can be satisfied by coding anymore. You can always aim for harder projects, hoping to find problems where AI fails completely. I still encounter those occasionally, but the number of problems requiring deep creative solutions feels like it is diminishing rapidly.
I have tried to get that feeling of mental growth outside of coding. I tried getting back in touch with physics, reading old textbooks. But that wasn’t successful either. It is hard to justify spending time and mental effort solving physics problems that aren’t relevant or state-of-the-art when I know I could be building things.
My Builder side won’t let me just sit and think about unsolved problems, and my Thinker side is starving while I vibe-code. I am not sure if there will ever be a time again when both needs can be met at once.
“Now we have the right to give this being the well-known name that always designates what no power of imagination, no flight of the boldest fantasy, no intently devout heart, no abstract thinking however profound, no enraptured and transported spirit has ever attained: God. But this basic unity is of the past; it no longer is. It has, by changing its being, totally and completely shattered itself. God has died and his death was the life of the world.”
- Philipp Mainländer
...
Read the original on www.jernesto.com »
These days it seems you need a trillion fake dollars, or lunch with politicians to get your own data center. They may help, but they’re not required. At comma we’ve been running our own data center for years. All of our model training, metrics, and data live in our own data center in our own office. Having your own data center is cool, and in this blog post I will describe how ours works, so you can be inspired to have your own data center too.
If your business relies on compute, and you run that compute in the cloud, you are putting a lot of trust in your cloud provider. Cloud companies generally make onboarding very easy, and offboarding very difficult. If you are not vigilant you will sleepwalk into a situation of high cloud costs and no way out. If you want to control your own destiny, you must run your own compute.
Self-reliance is great, but there are other benefits to running your own compute. It inspires good engineering. Maintaining a data center is much more about solving real-world challenges. The cloud requires expertise in company-specific APIs and billing systems. A data center requires knowledge of Watts, bits, and FLOPs. I know which one I rather think about.
Avoiding the cloud for ML also creates better incentives for engineers. Engineers generally want to improve things. In ML many problems go away by just using more compute. In the cloud that means improvements are just a budget increase away. This locks you into inefficient and expensive solutions. Instead, when all you have available is your current compute, the quickest improvements are usually speeding up your code, or fixing fundamental issues.
Finally there’s cost, owning a data center can be far cheaper than renting in the cloud. Especially if your compute or storage needs are fairly consistent, which tends to be true if you are in the business of training or running models. In comma’s case I estimate we’ve spent ~5M on our data center, and we would have spent 25M+ had we done the same things in the cloud.
Our data center is pretty simple. It’s maintained and built by only a couple engineers and technicians. Your needs may be slightly different, our implementation should provide useful context.
To run servers you need power. We currently use about 450kW at max. Operating a data center exposes you to many fun engineering challenges, but procuring power is not one of them. San Diego power cost is over 40c/kWh, ~3x the global average. It’s a ripoff, and overpriced simply due to political dysfunction. We spent $540,112 on power in 2025, a big part of the data center cost. In a future blog post I hope I can tell you about how we produce our own power and you should too.
Data centers need cool dry air. Typically this is achieved with a CRAC system, but they are power-hungry. San Diego has a mild climate and we opted for pure outside air cooling. This gives us less control of the temperature and humidity, but uses only a couple dozen kW. We have dual 48” intake fans and dual 48” exhaust fans to keep the air cool. To ensure low humidity (
The majority of our current compute is 600 GPUs in 75 TinyBox Pro machines. They were built in-house, which saves us money and ensures they suit our needs. Our self-built machines fail at a similar rate to pre-built machines we’ve bought, but we’re capable of fixing them ourselves quickly. They have 2 CPUs and 8 GPUs each, and work as both training machines and general compute workers.
For data storage we have a few racks of Dell machines (R630 and R730). They are filled with SSDs for a total of ~4PB of storage. We use SSDs for reliability and speed. Our main storage arrays have no redundancy and each node needs to be able to saturate the network bandwidth with random access reads. For the storage machines this means reading up to 20Gbps of each 80TB chunk.
Other than storage and compute machines we have several one-off machines to run services. This includes a router, climate controller, data ingestion machine, storage master servers, metric servers, redis servers, and a few more.
Running the network requires switches, but at this scale we don’t need to bother with complicated switch topologies. We have 3 100Gbps interconnected Z9264F switches, which serve as the main ethernet network. We have two more infiniband switches to interconnect the 2 tinybox pro groups for training all-reduce.
To effectively use all these compute and storage machines you need some infra. At this scale, services don’t need redundancy to achieve 99% uptime. We use a single master for all services, which makes things pretty simple.
All servers get ubuntu installed with pxeboot and are managed by salt.
All of our storage arrays use mkv. The main array is 3PB of non-redundant storage hosting our driving data we train on. We can read from this array at ~1TB/s, which means we can train directly on the raw data without caching. Redundancy is not needed since no specific data is critical.
We have an additional ~300TB non-redundant array to cache intermediate processed results. And lastly, we have a redundant mkv storage array to store all of our trained models and training metrics. Each of these 3 arrays have a separate single master server.
We use slurm to manage the compute nodes, and compute jobs. We schedule two types of distributed compute. Pytorch training jobs, and miniray workers.
To train models across multiple GPU nodes we use torch.distributed FSDP. We have 2 separate training partitions, each intra-connected with Infiniband for training across machines. We wrote our own training framework which handles the training loop boilerplate, but it’s mostly just pytorch.
We have a custom model experiment tracking service (similar to wandb or tensorboard). It provides a dashboard for tracking experiments, and shows custom metrics and reports. It is also the interface for the mkv storage array that hosts the model weights. The training runs store the model weights there with a uuid, and they are available to download for whoever needs to run them. The metrics and reports for our latest models are also open.
Besides training we have many other compute tasks. This can be anything from running tests, running models, pre-processing data, or even running agent rollouts for on-policy training. We wrote a lightweight open-source task scheduler called miniray that allows you to run arbitrary python code on idle machines. This is a simpler version of dask, with a focus on extreme simplicity. Slurm will schedule any idle machine to be an active miniray worker, and accept pending tasks. All the task information is hosted in a central redis server.
Miniray workers with GPUs will spin up a triton inference server to run model inference with dynamic batching. A miniray worker can thus easily and efficiently run any of the models hosted in the model mkv storage array.
Miniray makes it extremely easy to scale parallel tasks to hundreds of machines. For example, the controls challenge record was set by just having ~1hr of access to our data center with miniray.
All our code is in a monorepo that we have cloned on our workstations. This monorepo is kept small (
The most complex thing we do at comma is train driving models on-policy, these training runs require training data to be generated during training by running simulated driving rollouts with the most recent model weights. Here’s a real-world command we just used to train such a model. This training run uses all of the infrastructure described above. While only this small command is needed to kick everything off, it orchestrates a lot of moving parts.
Does all this stuff sound exciting? Then build your own datacenter for yourself or your company! You can also come work here.
...
Read the original on blog.comma.ai »
Add AP News as your preferred source to see more of our stories on Google.
Add AP News as your preferred source to see more of our stories on Google.
LONDON (AP) — In France, civil servants will ditch Zoom and Teams for a homegrown video conference system. Soldiers in Austria are using open source office software to write reports after the military dropped Microsoft Office. Bureaucrats in a German state have also turned to free software for their administrative work.
Around Europe, governments and institutions are seeking to reduce their use of digital services from U. S. Big Tech companies and turning to domestic or free alternatives. The push for “digital sovereignty” is gaining attention as the Trump administration strikes an increasingly belligerent posture toward the continent, highlighted by recent tensions over Greenland that intensified fears that Silicon Valley giants could be compelled to cut off access.
Concerns about data privacy and worries that Europe is not doing enough to keep up with the United States and Chinese tech leadership are also fueling the drive.
The French government referenced some of these concerns when it announced last week that 2.5 million civil servants would stop using video conference tools from U. S. providers — including Zoom, Microsoft Teams, Webex and GoTo Meeting — by 2027 and switch to Visio, a homegrown service.
The objective is “to put an end to the use of non-European solutions, to guarantee the security and confidentiality of public electronic communications by relying on a powerful and sovereign tool,” the announcement said.
“We cannot risk having our scientific exchanges, our sensitive data, and our strategic innovations exposed to non-European actors,” David Amiel, a civil service minister, said in a press release.
Microsoft said it continues to “partner closely with the government in France and respect the importance of security, privacy, and digital trust for public institutions.”
The company said it is “focused on providing customers with greater choice, stronger data protection, and resilient cloud services — ensuring data stays in Europe, under European law, with robust security and privacy protections.”
Zoom, Webex and GoTo Meeting did not respond to requests for comment.
French President Emmanuel Macron has been pushing digital sovereignty for years. But there’s now a lot more “political momentum behind this idea now that we need to de-risk from U. S. tech,” Nick Reiners, senior geotechnology analyst at the Eurasia Group.
“It feels kind of like there’s a real zeitgeist shift,” Reiners said
It was a hot topic at the World Economic Forum’s annual meeting of global political and business elites last month in Davos, Switzerland. The European Commission’s official for tech sovereignty, Henna Virkkunen, told an audience that Europe’s reliance on others “can be weaponized against us.”
“That’s why it’s so important that we are not dependent on one country or one company when it comes to very critical fields of our economy or society,” she said, without naming countries or companies.
A decisive moment came last year when the Trump administration sanctioned the International Criminal Court’s top prosecutor after the tribunal, based in The Hague, Netherlands, issued an arrest warrant for Israeli Prime Minister Benjamin Netanyahu, an ally of President Donald Trump.
The sanctions led Microsoft to cancel Khan’s ICC email, a move that was first reported by The Associated Press and sparked fears of a “kill switch” that Big Tech companies can use to turn off service at will.
Microsoft maintains it kept in touch with the ICC “throughout the process that resulted in the disconnection of its sanctioned official from Microsoft services. At no point did Microsoft cease or suspend its services to the ICC.”
Microsoft President Brad Smith has repeatedly sought to strengthen trans-Atlantic ties, the company’s press office said, and pointed to an interview he did last month with CNN in Davos in which he said that jobs, trade and investment. as well as security, would be affected by a rift over Greenland.
“Europe is the American tech sector’s biggest market after the United States itself. It all depends on trust. Trust requires dialogue,” Smith said.
Other incidents have added to the movement. There’s a growing sense that repeated EU efforts to rein in tech giants such as Google with blockbuster antitrust fines and sweeping digital rule books haven’t done much to curb their dominance.
Billionaire Elon Musk is also a factor. Officials worry about relying on his Starlink satellite internet system for communications in Ukraine.
Washington and Brussels wrangled for years over data transfer agreements, triggered by former National Security Agency contractor Edward Snowden’s revelations of U. S. cyber-snooping.
With online services now mainly hosted in the cloud through data centers, Europeans fear that their data is vulnerable.
U. S. cloud providers have responded by setting up so-called “sovereign cloud” operations, with data centers located in European countries, owned by European entities and with physical and remote access only for staff who are European Union residents.
The idea is that “only Europeans can take decisions so that they can’t be coerced by the U. S.,” Reiners said.
The German state of Schleswig-Holstein last year migrated 44,000 employee inboxes from Microsoft to an open source email program. It also switched from Microsoft’s SharePoint file sharing system to Nextcloud, an open source platform, and is even considering replacing Windows with Linux and telephones and videoconferencing with open source systems.
“We want to become independent of large tech companies and ensure digital sovereignty,” Digitalization Minister Dirk Schrödter said in an October announcement.
The French city of Lyon said last year that it’s deploying free office software to replace Microsoft. Denmark’s government and the cities of Copenhagen and Aarhus have also been trying out open-source software.
“We must never make ourselves so dependent on so few that we can no longer act freely,” Digital Minister Caroline Stage Olsen wrote on LinkedIn last year. “Too much public digital infrastructure is currently tied up with very few foreign suppliers.”
The Austrian military said it has also switched to LibreOffice, a software package with word processor, spreadsheet and presentation programs that mirrors Microsoft 365’s Word, Excel and PowerPoint.
The Document Foundation, a nonprofit based in Germany that’s behind LibreOffice, said the military’s switch “reflects a growing demand for independence from single vendors.” Reports also said the military was concerned that Microsoft was moving file storage online to the cloud — the standard version of LibreOffice is not cloud-based.
Some Italian cities and regions adopted the software years ago, said Italo Vignoli, a spokesman for The Document Foundation. Back then, the appeal was not needing to pay for software licenses. Now, it’s the main reason is to avoid being locked into a proprietary system.
“At first, it was: we will save money and by the way, we will get freedom,” Vignoli said. “Today it is: we will be free and by the way, we will also save some money.”
Associated Press writer Molly Quell in The Hague, Netherlands contributed to this report.
This version corrects the contribution line to Molly Quell instead of Molly Hague.
...
Read the original on apnews.com »
Today, we’re releasing Voxtral Transcribe 2, two next-generation speech-to-text models with state-of-the-art transcription quality, diarization, and ultra-low latency. The family includes Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications. Voxtral Realtime is open-weights under the Apache 2.0 license.
We’re also launching an audio playground in Mistral Studio to test transcription instantly, powered by Voxtral Transcribe 2, with diarization and timestamps.
Voxtral Mini Transcribe V2: State-of-the-art transcription with speaker diarization, context biasing, and word-level timestamps in 13 languages.
Voxtral Realtime: Purpose-built for live transcription with latency configurable down to sub-200ms, enabling voice agents and real-time applications.
Best-in-class efficiency: Industry-leading accuracy at a fraction of the cost, with Voxtral Mini Transcribe V2 achieving the lowest word error rate, at the lowest price point.
Open weights: Voxtral Realtime ships under Apache 2.0, deployable on edge for privacy-first applications.
Voxtral Realtime is purpose-built for applications where latency matters. Unlike approaches that adapt offline models by processing audio in chunks, Realtime uses a novel streaming architecture that transcribes audio as it arrives. The model delivers transcriptions with delay configurable down to sub-200ms, unlocking a new class of voice-first applications.
Word error rate (lower is better) across languages in the FLEURS transcription benchmark.
At 2.4 seconds delay, ideal for subtitling, Realtime matches Voxtral Mini Transcribe V2, our latest batch model. At 480ms delay, it stays within 1-2% word error rate, enabling voice agents with near-offline accuracy.
The model is natively multilingual, achieving strong transcription performance in 13 languages, including English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch. With a 4B parameter footprint, it runs efficiently on edge devices, ensuring privacy and security for sensitive deployments.
We’re releasing the model weights under Apache 2.0 on the Hugging Face Hub.
Average diarization error rate (lower is better) across five English benchmarks (Switchboard, CallHome, AMI-IHM, AMI-SDM, SBCSAE) and the TalkBank multilingual benchmark (German, Spanish, English, Chinese, Japanese).
Average word error rate (lower is better) across the top-10 languages in the FLEURS transcription benchmark.
Voxtral Mini Transcribe V2 delivers significant improvements in transcription and diarization quality across languages and domains. At approximately 4% word error rate on FLEURS and $0.003/min, Voxtral offers the best price-performance of any transcription API. It outperforms GPT-4o mini Transcribe, Gemini 2.5 Flash, Assembly Universal, and Deepgram Nova on accuracy, and processes audio approximately 3x faster than ElevenLabs’ Scribe v2 while matching on quality at one-fifth the cost.
Generate transcriptions with speaker labels and precise start/end times. Ideal for meeting transcription, interview analysis, and multi-party call processing. Note: with overlapping speech, the model typically transcribes one speaker.
Provide up to 100 words or phrases to guide the model toward correct spellings of names, technical terms, or domain-specific vocabulary. Particularly useful for proper nouns or industry terminology that standard models often miss. Context biasing is optimized for English; support for other languages is experimental.
Generate precise start and end timestamps for each word, enabling applications like subtitle generation, audio search, and content alignment.
Like Realtime, this model now supports 13 languages: English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch. Non-English performance significantly outpaces competitors.
Maintains transcription accuracy in challenging acoustic environments, such as factory floors, busy call centers, and field recordings.
Process recordings up to 3 hours in a single request.
Word error rate (lower is better) across languages in the FLEURS transcription benchmark.
Test Voxtral Transcribe 2 directly in Mistral Studio. Upload up to 10 audio files, toggle diarization, choose timestamp granularity, and add context bias terms for domain-specific vocabulary. Supports .mp3, .wav, .m4a, .flac, .ogg up to 1GB each.
Transcribe multilingual recordings with speaker diarization that clearly attributes who said what and when. At Voxtral’s price point, annotate large volumes of meeting content at industry-leading cost efficiency.
Build conversational AI with sub-200ms transcription latency. Connect Voxtral Realtime to your LLM and TTS pipeline for responsive voice interfaces that feel natural.
Transcribe calls in real time, enabling AI systems to analyze sentiment, suggest responses, and populate CRM fields while conversations are still happening. Speaker diarization ensures clear attribution between agents and customers.
Generate live multilingual subtitles with minimal latency. Context biasing handles proper nouns and technical terminology that trip up generic transcription services.
Monitor and transcribe interactions for regulatory compliance, with diarization providing clear speaker attribution and timestamps enabling precise audit trails.
Both models support GDPR and HIPAA-compliant deployments through secure on-premise or private cloud setups.
Voxtral Mini Transcribe V2 is available now via API at $0.003 per minute. Try it now in the new Mistral Studio audio playground or in Le Chat.
Voxtral Realtime is available via API at $0.006 per minute and as open weights on Hugging Face.
If you’re excited about building world-class speech AI and putting frontier models into the hands of developers everywhere, we’d love to hear from you. Apply to join our team.
The next chapter of AI is yours.
...
Read the original on mistral.ai »
In 2024, Apple signed a deal with Taboola to serve ads in its app, notably Apple News. John Gruber, writing in Daring Fireball said at the time:
If you told me that the ads in Apple News have been sold by Taboola for the last few years, I’d have said, “Oh, that makes sense.” Because the ads in Apple News — at least the ones I see1 — already look like chumbox Taboola ads. Even worse, they’re incredibly repetitious.
I use Apple News to keep up on topics that I don’t find in sources I pay for (The Guardian and The New York Times). But there’s no way I’m going to pay the exorbitant price Apple wants for Apple News+ — £13 — because, while you get more publications, you still get ads.
And those ads have gotten worse recently. Many if not most of them look like and probably are scams. Here are a few examples from Apple News today.
Here are three ads that are scammy; the first two were clearly generated by AI, and the third may have been created by AI.
Why are they scams? When I searched domain information for the domains, I found that they were registered very recently.
This recent registration doesn’t necessarily mean they are scams, but they don’t inspire much confidence.
Here’s one example. This ad from Tidenox, whose website says I am retiring, showing a photo of an elderly woman, who says, “For 26 years, Tidenox has been port of your journey in creating earth and comfort at home.” The image of the retiring owner is probably made by AI. (Update: someone on Hacker News pointed out the partly masked Google Gemini logo on the bottom right. I hadn’t spotted that, in part because I don’t use any AI image generation tools.)
These fake “going out of business ads” have been around for a few years, and even the US Better Business Bureau warns about them, as they take peoples’ money then shut down. Does Apple care? Does Taboola care? Does Apple care that Taboola serves ads like this? My guess: no, no, and no.
Note the registration date for the tidenox.com domain. It’s nowhere near 26 years old, and it’s registered in China:
Shame on Apple for creating a honeypot for scam ads in what they consider to be a premium news service. This company cannot be trusted with ads in its products any more.
...
Read the original on kirkville.com »
...
Read the original on notepad-plus-plus.org »
My experience adopting any meaningful tool is that I’ve necessarily gone through three phases: (1) a period of inefficiency (2) a period of adequacy, then finally (3) a period of workflow and life-altering discovery.
In most cases, I have to force myself through phase 1 and 2 because I usually have a workflow I’m already happy and comfortable with. Adopting a tool feels like work, and I do not want to put in the effort, but I usually do in an effort to be a well-rounded person of my craft.
This is my journey of how I found value in AI tooling and what I’m trying next with it. In an ocean of overly dramatic, hyped takes, I hope this represents a more nuanced, measured approach to my views on AI and how they’ve changed over time.
Immediately cease trying to perform meaningful work via a chatbot (e.g. ChatGPT, Gemini on the web, etc.). Chatbots have real value and are a daily part of my AI workflow, but their utility in coding is highly limited because you’re mostly hoping they come up with the right results based on their prior training, and correcting them involves a human (you) to tell them they’re wrong repeatedly. It is inefficient.
I think everyone’s first experience with AI is a chat interface. And I think everyone’s first experience trying to code with AI has been asking a chat interface to write code.
While I was still a heavy AI skeptic, my first “oh wow” moment was pasting a screenshot of Zed’s command palette into Gemini, asking it to reproduce it with SwiftUI, and being truly flabbergasted that it did it very well. The command palette that ships for macOS in Ghostty today is only very lightly modified from what Gemini produced for me in seconds.
But when I tried to reproduce that behavior for other tasks, I was left disappointed. In the context of brownfield projects, I found the chat interface produced poor results very often, and I found myself very frustrated copying and pasting code and command output to and from the interface. It was very obviously far less efficient than me doing the work myself.
To find value, you must use an agent. An agent is the industry-adopted term for an LLM that can chat and invoke external behavior in a loop1
At a bare minimum, the agent must have the ability to: read files, execute programs, and make HTTP requests.
The next phase on my journey I tried
Claude Code. I’ll cut to the chase: I initially wasn’t impressed. I just wasn’t getting good results out of my sessions. I felt I had to touch up everything it produced and this process was taking more time than if I had just done it myself. I read blog posts, watched videos, but just wasn’t that impressed.
Instead of giving up, I forced myself to reproduce all my manual commits
with agentic ones. I literally did the work twice. I’d do the work manually, and then I’d fight an agent to produce identical results in terms of quality and function (without it being able to see my manual solution, of course).
This was excruciating, because it got in the way of simply getting things done. But I’ve been around the block with non-AI tools enough to know that friction is natural, and I can’t come to a firm, defensible conclusion without exhausting my efforts.
But, expertise formed. I quickly discovered for myself from first principles what others were already saying, but discovering it myself resulted in a stronger fundamental understanding.
Break down sessions into separate clear, actionable tasks. Don’t try
to “draw the owl” in one mega session.
For vague requests, split the work into separate planning vs. execution
sessions.
If you give an agent a way to verify its work, it more often than
not fixes its own mistakes and prevents regressions.
More generally, I also found the edges of what agents — at the time — were good at, what they weren’t good at, and for the tasks they were good at how to achieve the results I wanted.
All of this led to significant efficiency gains, to the point where I was starting to naturally use agents in a way that I felt was no slower than doing it myself (but I still didn’t feel it was any faster, since I was mostly babysitting an agent).
The negative space here is worth reiterating: part of the efficiency gains here were understanding when not to reach for an agent. Using an agent for something it’ll likely fail at is obviously a big waste of time and having the knowledge to avoid that completely leads to time savings2.
At this stage, I was finding adequate value with agents that I was happy to use them in my workflow, but still didn’t feel like I was seeing any net efficiency gains. I didn’t care though, I was content at this point with AI as a tool.
To try to find some efficiency, I next started up a new pattern:
block out the last 30 minutes of every day to kick off one or more agents.
My hypothesis was that perhaps I could gain some efficiency if the agent can make some positive progress in the times I can’t work anyways. Basically: instead of trying to do more in the time I have, try to do more in the time I don’t have.
Similar to the previous task, I at first found this both unsuccessful and annoying. But, I once again quickly found different categories of work that were really helpful:
* Deep research sessions where I’d ask agents to survey some
field, such as finding all libraries in a specific language with
a specific license type and producing multi-page summaries for each
on their pros, cons, development activity, social sentiment, etc.
* Parallel agents attempting different vague ideas I had but didn’t
have time to get started on. I didn’t expect them to produce something
I’d ever ship here, but perhaps could illuminate some unknown unknowns
when I got to the task the next day.
* Issue and PR triage/review. Agents are good at using gh (GitHub CLI),
so I manually scripted a quick way to spin up a bunch in parallel to
triage issues. I would NOT allow agents to respond, I just wanted
reports the next day to try to guide me towards high value or low effort
tasks.
To be clear, I did not go as far as others went to have agents running in loops all night. In most cases, agents completed their tasks in less than half an hour. But, the latter part of the working day, I’m usually tired and coming out of flow and find myself too personally inefficient, so shifting my effort to spinning up these agents I found gave me a “warm start” the next morning that got me working more quickly than I would’ve otherwise.
I was happy, and I was starting to feel like I was doing more than I was doing prior to AI, if only slightly.
By this point, I was getting very confident about what tasks my AI was and wasn’t great at. I had really high confidence with certain tasks that the AI would achieve a mostly-correct solution. So the next step on my journey was: let agents do all of that work while I worked on other tasks.
More specifically, I would start each day by taking the results of my prior night’s triage agents, filter them manually to find the issues that an agent will almost certainly solve well, and then keep them going in the background (one at a time, not in parallel).
Meanwhile, I’d work on something else. I wasn’t going to social media (any more than usual without AI), I wasn’t watching videos, etc. I was in my own, normal, pre-AI deep thinking mode working on something I wanted to work on or had to work on.
Very important at this stage: turn off agent desktop notifications.
Context switching is very expensive. In order to remain efficient, I found that it was my job as a human to be in control of when I interrupt the agent, not the other way around. Don’t let the agent notify you. During natural breaks in your work, tab over and check on it, then carry on.
Importantly, I think the “work on something else” helps counteract the highly publicized Anthropic skill formation paper. Well, you’re trading off: not forming skills for the tasks you’re delegating to the agent while continuing to form skills naturally in the tasks you continue to work on manually.
At this point I was firmly in the “no way I can go back” territory. I felt more efficient, but even if I wasn’t, the thing I liked the most was that I could now focus my coding and thinking on tasks I really loved while still adequately completing the tasks I didn’t.
At risk of stating the obvious: agents are much more efficient when they produce the right result the first time, or at worst produce a result that requires minimal touch-ups. The most sure-fire way to achieve this is to give the agent fast, high quality tools to automatically tell it when it is wrong.
I don’t know if there is a broad industry-accepted term for this yet, but I’ve grown to calling this “harness engineering.” It is the idea that anytime you find an agent makes a mistake, you take the time to engineer a solution such that the agent never makes that mistake again. I don’t need to invent any new terms here; if another one exists, I’ll jump on the bandwagon.
This comes in two forms:
Better implicit prompting (AGENTS.md). For simple things, like the agent repeatedly running the wrong commands or finding the wrong APIs, update the AGENTS.md (or equivalent). Here is
an example from Ghostty. Each line in that file is based on a bad agent behavior, and it almost completely resolved them all.
Actual, programmed tools. For example, scripts to take screenshots, run filtered tests, etc etc. This is usually paired with an AGENTS.md change to let it know about this existing.
This is where I’m at today. I’m making an earnest effort whenever I see an agent do a Bad Thing to prevent it from ever doing that bad thing again. Or, conversely, I’m making an earnest effort for agents to be able to verify they’re doing a Good Thing.
Simultaneous to step 5, I’m also operating under the goal of
having an agent running at all times. If an agent isn’t running, I ask myself “is there something an agent could be doing for me right now?”
I particularly like to combine this with slower, more thoughtful models like Amp’s deep mode (which is basically just GPT-5.2-Codex) which can take upwards of 30+ minutes to make small changes. The flip side of that is that it does tend to produce very good results.
I’m not [yet?] running multiple agents, and currently don’t really want to.
I find having the one agent running is a good balance for me right now between being able to do deep, manual work I find enjoyable, and babysitting my kind of stupid and yet mysteriously productive robot friend.
The “have an agent running at all times” goal is still just a goal. I’d say right now I’m maybe effective at having a background agent running 10 to 20% of a normal working day. But, I’m actively working to improve that.
And that’s where I’m at today.
Through this journey, I’ve personally reached a point where I’m having success with modern AI tooling and I believe I’m approaching it with the proper measured view that is grounded in reality. I really don’t care one way or the other if AI is here to stay3, I’m a software craftsman that just wants to build stuff for the love of the game.
The whole landscape is moving so rapidly that I’m sure I’ll look back at this post very quickly and laugh at my naivete. But, as they say, if you can’t be embarassed about your past self, you’re probably not growing. I just hope I’ll grow in the right direction!
I have no skin in the game here4, and there are of course other reasons behind utility to avoid using AI. I fully respect anyone’s individual decisions regarding it. I’m not here to convince you! For those interested, I just wanted to share my personal approach to navigating these new tools and give a glimpse about how I approach new tools
in general, regardless of AI.
...
Read the original on mitchellh.com »
He’s just this guy, you know?
How To
How To: Tape backup and recovery
How To: Image processing and management
Musings
That’s right — this little device is what stood between me and the ability to run an even older piece of software that I recently unearthed during an expedition of software archaeology.
For a bit more background, I was recently involved in helping a friend’s accounting firm to move away from using an extremely legacy software package that they had locked themselves into using for the last four decades.
This software was built using a programming language called RPG (“Report Program Generator”), which is older than COBOL (!), and was used with IBM’s midrange computers such as the System/3, System/32, and all the way up to the AS/400. Apparently, RPG was subsequently ported to MS-DOS, so that the same software tools built with RPG could run on personal computers, which is how we ended up here.
This accounting firm was actually using a Windows 98 computer (yep, in 2026), and running the RPG software inside a DOS console window. And it turned out that, in order to run this software, it requires a special hardware copy-protection dongle to be attached to the computer’s parallel port! This was a relatively common practice in those days, particularly with “enterprise” software vendors who wanted to protect their very important™ software from unauthorized use.
Sadly, most of the text and markings on the dongle’s label has been worn or scratched off, but we can make out several clues:
The words “Stamford, CT”, and what’s very likely the logo of a company called “Software Security Inc”. The only evidence for the existence of this company is this record of them exhibiting their wares at SIGGRAPH conferences in the early 1990s, as well as several patents issued to them, relating to software protection.
A word that seems to say “RUNTIME”, which will become clear in a bit.
My first course of action was to take a disk image of the Windows 98 PC that was running this software, and get it running in an emulator, so that we could see what the software actually does, and perhaps export the data from this software into a more modern format, to be used with modern accounting tools. But of course all of this requires the hardware dongle; none of the accounting tools seem to work without it plugged in.
Before doing anything, I looked through the disk image for any additional interesting clues, and found plenty of fascinating (and archaeologically significant?) stuff:
We’ve got a compiler for the RPG II language (excellent!), made by a company called Software West Inc.
Even better, there are two versions of the RPG II compiler, released on various dates in the 1990s by Software West.
We’ve got the complete source code of the accounting software, written in RPG. It looks like the full accounting package consists of numerous RPG modules, with a gnarly combination of DOS batch files for orchestrating them, all set up as a “menu” system for the user to navigate using number combinations. Clearly the author of this accounting system was originally an IBM mainframe programmer, and insisted on bringing those skills over to DOS, with mixed results.
I began by playing around with the RPG compiler in isolation, and I learned very quickly that it’s the RPG compiler itself that requires the hardware dongle, and then the compiler automatically injects the same copy-protection logic into any executables it generates. This explains the text that seems to say “RUNTIME” on the dongle.
The compiler consists of a few executable files, notably RPGC. EXE, which is the compiler, and SEU.EXE, which is a source editor (“Source Entry Utility”). Here’s what we get when we launch SEU without the dongle, after a couple of seconds:
A bit rude, but this gives us an important clue: this program must be trying to communicate over the parallel port over the course of a few seconds (which could give us an opportunity to pause it for debugging, and see what it’s doing during that time), and then exits with a message (which we can now find in a disassembly of the program, and trace how it gets there).
A great tool for disassembling executables of this vintage is Reko. It understands 16-bit real mode executables, and even attempts to decompile them into readable C code that corresponds to the disassembly.
And so, looking at the decompiled/disassembled code in Reko, I expected to find in and out instructions, which would be the telltale sign of the program trying to communicate with the parallel port through the PC’s I/O ports. However… I didn’t see an in or out instruction anywhere! But then I noticed something: Reko disassembled the executable into two “segments”: 0800 and 0809, and I was only looking at segment 0809.
If we look at segment 0800, we see the smoking gun: in and out instructions, meaning that the copy-protection routine is definitely here, and best of all, the entire code segment is a mere 0x90 bytes, which suggests that the entire routine should be pretty easy to unravel and understand. For some reason, Reko was not able to decompile this code into a C representation, but it still produced a disassembly, which will work just fine for our purposes. Maybe this was a primitive form of obfuscation from those early days, which is now confusing Reko and preventing it from associating this chunk of code with the rest of the program… who knows.
Here is a GitHub Gist with the disassembly of this code, along with my annotations and notes. My x86 assembly knowledge is a little rusty, but here is the gist of what this code does:
It’s definitely a single self-contained routine, intended to be called using a “far” CALL instruction, since it returns with a RETF instruction.
It begins by detecting the address of the parallel port, by reading the BIOS data area. If the computer has more than one parallel port, the dongle must be connected to the first parallel port (LPT1).
It performs a loop where it writes values to the data register of the parallel port, and then reads the status register, and accumulates responses in the BH and BL registers.
At the end of the routine, the “result” of the whole procedure is stored in the BX register (BH and BL together), which will presumably be “verified” by the caller of the routine.
Very importantly, there doesn’t seem to be any “input” into this routine. It doesn’t pop anything from the stack, nor does it care about any register values passed into it. Which can only mean that the result of this routine is completely constant! No matter what complicated back-and-forth it does with the dongle, the result of this routine should always be the same.
With the knowledge that this routine must exit with some magic value stored in BX, we can now patch the first few bytes of the routine to do just that! Not yet knowing which value to put in BX, let’s start with 1234:
BB 34 12 MOV BX, 1234h
CB RETF
Only the first four bytes need patching — set BX to our desired value, and get out of there (RETF). Running the patched executable with these new bytes still fails (expectedly) with the same message of “No dongle, no edit”, but it fails immediately, instead of after several seconds of talking to the parallel port. Progress!
Stepping through the disassembly more closely, we get another major clue: The only value that BH can be at the end of the routine is 76h (this is hard-coded into the routine). So, our total value for the magic number in BX must be of the form 76xx. In other words, only the BL value remains unknown:
BB __ 76 MOV BX, 76__h
CB RETF
Since BL is an 8-bit register, it can only have 256 possible values. And what do we do when we have 256 combinations to try? Brute force it! I whipped up a script that plugs a value into that particular byte (from 0 to 255) and programmatically launches the executable in DosBox, and observes the output. Lo and behold, it worked! The brute forcing didn’t take long at all, because the correct number turned out to be… 6. Meaning that the total magic number in BX should be 7606h:
BB 06 76 MOV BX, 7606h
CB RETF
Bingo!
And then, proceeding to examine the other executable files in the compiler suite, the parallel port routine turns out to be exactly the same. All of the executables have the exact same copy protection logic, as if it was rubber-stamped onto them. In fact, when the compiler (RPGC.EXE) compiles some RPG source code, it seems to copy the parallel port routine from itself into the compiled program. That’s right: the patched version of the compiler will produce executables with the same patched copy protection routine! Very convenient.
I must say, this copy protection mechanism seems a bit… simplistic? A hardware dongle that just passes back a constant number? Defeatable with a four-byte patch? Is this really worthy of a patent? But who am I to pass judgment. It’s possible that I haven’t fully understood the logic, and the copy protection will somehow re-surface in another way. It’s also possible that the creators of the RPG compiler (Software West, Inc) didn’t take proper advantage of the hardware dongle, and used it in a way that is so easily bypassed.
In any case, Software West’s RPG II compiler is now free from the constraint of the parallel port dongle! And at some point soon, I’ll work on purging any PII from the compiler directories, and make this compiler available as an artifact of computing history. It doesn’t seem to be available anywhere else on the web. If anyone reading this was associated with Software West Inc, feel free to get in touch — I have many questions!
...
Read the original on dmitrybrant.com »
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.