10 interesting stories served every morning and every evening.
10 interesting stories served every morning and every evening.
Please make sure your browser supports JavaScript and cookies and that you are not blocking them from loading. For more information you can review our Terms of Service and Cookie Policy.
TL;DR: Today, we are announcing that all GitHub Copilot plans will transition to usage-based billing on June 1, 2026.
Instead of counting premium requests, every Copilot plan will include a monthly allotment of GitHub AI Credits, with the option for paid plans to purchase additional usage. Usage will be calculated based on token consumption, including input, output, and cached tokens, using the listed API rates for each model.
This change aligns Copilot pricing with actual usage and is an important step toward a sustainable, reliable Copilot business and experience for all users.
To help customers prepare, we are also launching a preview bill experience in early May, giving users and admins visibility into projected costs before the June 1 transition. This will be available to users via their Billing Overview page when they log in to github.com.
Why we’re making this change
Copilot is not the same product it was a year ago.
It has evolved from an in-editor assistant into an agentic platform capable of running long, multi-step coding sessions, using the latest models, and iterating across entire repositories. Agentic usage is becoming the default, and it brings significantly higher compute and inference demands.
Today, a quick chat question and a multi-hour autonomous coding session can cost the user the same amount. GitHub has absorbed much of the escalating inference cost behind that usage, but the current premium request model is no longer sustainable.
Usage-based billing fixes that. It better aligns pricing with actual usage, helps us maintain long-term service reliability, and reduces the need to gate heavy users.
What’s changing
Starting June 1, premium request units (PRUs) will be replaced by GitHub AI Credits.
Credits will be consumed based on token usage, including input, output, and cached tokens, according to the published API rates for each model.
A few important details:
Base plan pricing is not changing. Copilot Pro remains $10/month, Pro+ remains $39/month, Business remains $19/user/month, and Enterprise remains $39/user/month.
Code completions and Next Edit suggestions remain included in all plans and do not consume AI Credits.
Fallback experiences will no longer be available. Today, users who exhaust PRUs may fall back to a lower-cost model and continue working. Under the new model, usage will instead be governed by available credits and admin budget controls.
Copilot code review will also consume GitHub Actions minutes, in addition to GitHub AI Credits. These minutes are billed at the same per-minute rates as other GitHub Actions workflows.
Last week, we also rolled out temporary changes to Copilot Individual plans, including Free, Pro, Pro+, and Student, and paused self-serve Copilot Business plan purchases. These were reliability and performance measures as we prepare for the broader transition to usage-based billing. We will loosen usage limits once usage-based billing is in effect.
What this means for individuals
Copilot Pro and Pro+ monthly subscriptions will include monthly AI Credits aligned to their current subscription prices:
Copilot Pro: $10/month, including $10 in monthly AI Credits
Copilot Pro+: $39/month, including $39 in monthly AI Credits
Users on a monthly Pro or Pro+ plan will automatically migrate to usage-based billing on June 1, 2026.
Users on annual Pro or Pro+ plans will remain on their existing plan with premium request-based pricing until their plan expires. Model multipliers will increase on June 1 (see table) for annual plan subscribers only. At expiration, they will transition to Copilot Free with the option to upgrade to a paid monthly plan. Alternatively, they may convert to a monthly paid plan before their annual plan expires, and we will provide prorated credits for the remaining value of their annual plan.
What this means for businesses and enterprises
Copilot Business and Copilot Enterprise monthly seat pricing remains unchanged:
Copilot Business: $19/user/month, including $19 in monthly AI Credits
Copilot Enterprise: $39/user/month, including $39 in monthly AI Credits
To support the transition, existing Copilot Business and Copilot Enterprise customers will automatically receive promotional included usage for June, July, and August:
Copilot Business: $30 in monthly AI Credits
Copilot Enterprise: $70 in monthly AI Credits
We are also introducing pooled included usage across a business, which helps eliminate stranded capacity. Instead of each user’s unused included usage being isolated, credits can be pooled across the organization.
Admins will also have new budget controls. They will be able to set budgets at the enterprise, cost center, and user levels. When the included pool is exhausted, organizations can choose whether to allow additional usage at published rates or cap spend.
The bottom line
Plan prices aren’t changing. You’ll have full control over what you spend, tools to track your usage, and the option to purchase more AI Credits if and when you need them.
If you have questions, visit our documentation for individuals and for businesses and enterprises, and our FAQ and related discussion.
Written by
Mario Rodriguez leads the GitHub Product team as Chief Product Officer. His core identity is being a learner and his passion is creating developer tools—so much so that he has spent the last 20 years living that mission in leadership roles across Microsoft and GitHub. Mario most recently oversaw GitHub’s AI strategy and the GitHub Copilot product line, launching and growing Copilot across thousands of organizations and millions of users. Mario spends time outside of GitHub with his wife and two daughters. He also co-chairs and founded a charter school in an effort to progress education in rural regions of the United States.
Related posts
Explore more from GitHub
Docs
Everything you need to master GitHub, all in one place.
Go to Docs
GitHub
Build what’s next on GitHub, the place for anyone from anywhere to build anything.
Start building
Customer stories
Meet the companies and engineering teams that build with GitHub.
Learn more
The GitHub Podcast
Catch up on the GitHub podcast, a show dedicated to the topics, trends, stories and culture in and around the open source developer community on GitHub.
Listen now
I came across a video by Simple Lucas describing a routine to improve focus and productivity.
The routine was basically:
Don’t use any screens/entertainment when trying to focus on work.
When you start to feel mentally drained, sit and stare at a wall for x minutes to recover focus.
I’ve been trying it, and it’s a very effective (but hard) routine.
The problem
The core problem is that most people by default are in an information overload.
A paper published in 2012 showed that in 2008 the average person was receiving 34 GB of information daily, with a daily information exposure growth rate of about 5.4% per year 1.
Extrapolating that trend, we would be at about 87 GB worth of data today.
This calculation includes audio, visual, and text data and incorporates quality into the measurement, i.e. 10 minutes of HD video has more information than 10 minutes of 480p video.
It’s unclear to me exactly how the quality impacts things, but regardless it is obvious that we are all being drowned in a sea of information.
I certainly go through periods of “brain fog” and lack of focus/motivation.
These periods usually go something like:
Get a bad night of sleep (up late for an event, kids keep waking me up).
Wake up very tired so consume large amounts of caffeine.
Have trouble focusing after 2/3 cups so use media while working to dull the pain (music/podcasts) or take more “breaks” (reading hackernews).
Stay up late because I’m wired on caffeine and dopamine from scrolling.
Go back to 2.
I find these cycles very hard to break out of when I’m in them.
The media consumption constitutes a small dopamine hit.
Large numbers of small hits puts you in a hole, where you need even more/stronger hits to feel good.
Disconnecting
The obvious solution is to disconnect from scrolling, but that doesn’t overcome the biggest issue.
When I’m in this “brain fog” cycle (and sometimes outside of it), I will find that around 1/2 pm I hit a wall.
My head will start hurting, my motivation will be trash, and my productivity significantly degrades.
My first instinct is to go for more coffee.
That usually lets me keep working, but at a slow/painful pace.
While looking for focusing strategies I came across the life-changing solution…
Stare at a Wall!
After watching Simple Lucas’ experience, I decided to try it when I hit my focus wall.
It worked.
In my attempts, I combined wall staring with a few other concepts I had heard about.
First was activating the parasympathetic nervous system by staring at the wall “out-of-focus” and using peripheral vision.
Second was incorporating mind blanking which means trying to think of nothing.
I tried intervals of 5 – 10 minutes and when I was done, my focus was back!
What I didn’t expect was how difficult it would be.
Sitting for 5 – 10 minutes staring at a wall without thinking of anything is hard!
I relate it somewhat to the feeling I have with working out.
Often times I want to avoid it because it’s hard, but I’m always happy when I push through and complete it.
It was the exact same experience with the wall staring.
So far I’ve been feeling significant focus/productivity improvements.
I’ve also been using some other strategies to improve focus, which I’ll be talking about in a future post.
I plan to continue this routine and will update to see how much it has impacted productivity/focus.
Thanks for reading!
https://ijoc.org/index.php/ijoc/article/view/1566 ↩︎
https://ijoc.org/index.php/ijoc/article/view/1566 ↩︎
Forensic intelligence // Breach analysis
4TB of voice samples were just stolen from 40,000 AI contractors. Here is how to verify if yours is being weaponized.
By the ORAVYS forensic desk
Published April 24, 2026
~7 min read
On April 4, 2026, the extortion group Lapsus$ posted Mercor on its leak site. The dump is reported at roughly four terabytes and bundles a payload that breach analysts have been warning about for two years: voice biometrics paired with the same person’s government-issued identity document. According to the leaked sample index, the archive covers more than 40,000 contractors who signed up to label data, record reading passages, and run through verification calls for AI training.
Five contractor lawsuits were filed within ten days of the post. The plaintiffs argue that the company collected voice prints under a “training data” framing without making clear they were also a permanent biometric identifier. The lawsuits matter, but the people whose voices were already exfiltrated have a more immediate question. What does an attacker actually do with thirty seconds of someone’s clean read voice plus a scan of their driver’s license?
Why this breach is different
Most voice leaks in the last decade fell into one of two buckets. Either a call center got popped and recordings were stolen with no easy way to map them back to identity. Or an ID-document broker leaked driver’s licenses and selfies without any audio attached. Mercor merged both columns. The contractor onboarding pipeline asked for a passport or driver’s license scan, then a webcam selfie, then a sit-down voice recording reading scripted prompts in a quiet room. That sequence, in one row of one database, is exactly what a synthetic voice cloning service needs as input.
The Wall Street Journal reported in February 2026 that high-quality voice cloning now requires roughly fifteen seconds of clean reference audio for tools available off the shelf. The Mercor recordings are reported to average two to five minutes of studio-clean speech per contractor. That is far past the threshold. Pair it with a verified ID document and the attacker has both the clone and the credential needed to put the clone to work.
What attackers can now do with stolen voice data
The threat models below are not speculative. Each is a documented technique already used in the wild before this breach.
Bank verification bypass. Several US and UK banks still treat voiceprint matching as one of two factors. A clone of the account holder reading a challenge phrase clears the audio gate, leaving only a knowledge question that often comes from the same leaked dataset.
Vishing the victim’s employer. Calling HR or finance pretending to be the employee to redirect payroll, request a wire, or unlock a workstation. The Krebs on Security archive lists more than two dozen confirmed cases since 2023.
Deepfake video calls in the Hong Kong Arup template. In 2024 a finance worker at Arup wired roughly 25 million dollars after a multi-person deepfake video call. The voices and faces had been built from public footage. Mercor leaked something better than public footage: studio audio plus a verified ID.
Insurance claim fraud. Pindrop reported a 475 percent year-over-year increase in synthetic voice attacks against insurance call centers across 2025. Auto, life, and disability claims are the prime targets because they are settled by phone.
Romance and grandparent scams targeting family members. The FBI Internet Crime Complaint Center logged 2.3 billion dollars in losses for victims aged 60 and over in calendar year 2026. The single fastest-growing category was emergency impersonation calls, where the synthetic voice claims to be a relative in trouble.
How to check if your voice is being misused
If you ever uploaded a voice sample to Mercor, or to any of the other AI training brokers that operated through 2025, treat your voice the way you would treat a leaked password. You cannot rotate it, but you can change what it unlocks. Here is the short list.
Self-audit your public audio footprint. Search YouTube, podcast directories, and old Zoom recordings for samples of your voice that are publicly indexable. Take down what you can. The less reference audio is in the open, the less robust an attacker’s clone.
Set up a verbal codeword with family and finance contacts. Pick a phrase that has never been spoken on a recording and never typed in chat. Brief the people who handle money on your behalf. If a call ever asks for a transfer, the codeword is mandatory.
Rotate where voiceprints are still in use. Google Voice Match, Amazon Alexa Voice ID, Apple personal voice, and any banking voiceprint enrollment can be deleted and replaced. Do that now, ideally from a new recording in a different acoustic environment than the leaked sample.
Tell your bank to disable voiceprint as a verification factor. Ask in writing for multi-factor authentication that combines an app token or hardware key with a knowledge factor. Many banks let you opt out of voice as a primary factor; few of them advertise it.
Run suspicious recordings through a forensic scanner. If you receive an audio file or voicemail that claims to be from someone you know and asks for money, access, or urgency, run it through a deepfake detector before acting. ORAVYS offers a free check for the first three samples submitted by breach victims (see the offer below).
The forensic checklist that experts use
When a sample lands on a forensic analyst’s desk, the following artifacts are the first pass. Each is something a synthetic voice tends to get slightly wrong, even when the perceptual quality is high.
Codec mismatch. The audio claims to come from a phone call but the spectral signature does not match any known telephony codec.
Breath patterns. Real speakers inhale at predictable points dictated by phrase length and lung capacity. Synthetic voices often skip breaths or insert them at the wrong syllabic boundary.
Micro-jitter. Natural vocal folds vibrate with small irregularities. Generated audio is often too clean at the millisecond level.
Formant trajectory. Vowel transitions follow physical articulator paths in a real mouth. Cloned voices sometimes take impossible shortcuts between formants.
Room acoustics inconsistency. The reverb signature should be identical from the start of the file to the end. Generated audio is often dry while the splice context is reverberant.
Prosody flatness. Synthetic speech often has narrower pitch and energy variance than the same speaker would have in real conditions.
Speech rate stability. Real humans speed up and slow down with content. Generated speech tends to hold a metronomic rate across long passages.
What ORAVYS does specifically
More than 3,000 forensic engines run in parallel on every submitted sample, covering signal, prosody, articulation, codec, and provenance domains.
AudioSeal watermark detection flags files generated by major commercial voice models when the watermark is preserved, giving a deterministic positive when present.
An anti-spoofing module trained against the ASVspoof public benchmarks scores the likelihood that a sample was synthesized rather than recorded.
Biometric processing is RGPD compliant. Audio is never used to train commercial models without explicit consent and is purged on a defined retention schedule.
Free verification for Mercor breach victims
If you were a Mercor contractor and you believe your voice may already be in circulation, ORAVYS will analyze the first three suspect samples free of charge. You will receive a forensic report covering watermark detection, anti-spoofing score, and the artifact checklist above. No card required, no quota gate.
Analyze a voice sample now →
Sources cited in this article: Lapsus$ leak site index (April 2026), Wall Street Journal voice cloning report (February 2026), Pindrop Voice Intelligence Report 2025, FBI IC3 Elder Fraud Report 2026, Krebs on Security archives. Lawsuit references are matters of public record. ORAVYS does not host or redistribute the leaked dataset and does not accept it as input.
NOTICE OF OBSOLESCENCE
TL;DR: pgBackRest is no longer being maintained. If you fork pgBackRest, please select a new name for your project.
After a lot of thought, I have decided to stop working on pgBackRest. I did not come to this decision lightly. pgBackRest has been my passion project for the last thirteen years, and I was fortunate to have corporate sponsorship for much of this time, but there were also many late nights and weekends as I worked to make pgBackRest the project it is today, aided by numerous contributors. Every open-source developer knows exactly what I mean and how much of your life gets devoted to a special project.
Since Crunchy Data was sold, I have been maintaining pgBackRest and looking for a position that would allow me to continue the work, but so far I have not been successful. Likewise, my efforts to secure sponsorship have also fallen far short of what I need to make the project viable.
Like everyone else, I need to make a living, and the range of pgBackRest-related roles is very limited. I can now consider a wider variety of opportunities, but those will not leave me time to work on pgBackRest, which requires a fair amount of time for maintenance, bug fixes, PR reviews, answering issues, etc. That does not even include time to write new features, which is what I really love to do. Rather than do the work poorly and/or sporadically, I think it makes more sense to have a hard stop.
I imagine at some point pgBackRest will be forked, but that will be a new project with new maintainers, and they will need to build trust the same way we did.
Again, many thanks to all the pgBackRest contributors over the years. It was a pleasure working with you!
Introduction
pgBackRest is a reliable backup and restore solution for PostgreSQL that seamlessly scales up to the largest databases and workloads.
pgBackRest v2.58.0 is the current stable release. Release notes are on the Releases page.
Features
Parallel Backup & Restore
Compression is usually the bottleneck during backup operations so pgBackRest solves this problem with parallel processing and more efficient compression algorithms such as lz4 and zstd.
Local or Remote Operation
A custom protocol allows pgBackRest to backup, restore, and archive locally or remotely via TLS/SSH with minimal configuration. An interface to query PostgreSQL is also provided via the protocol layer so that remote access to PostgreSQL is never required, which enhances security.
Multiple Repositories
Multiple repositories allow, for example, a local repository with minimal retention for fast restores and a remote repository with a longer retention for redundancy and access across the enterprise.
Full, Differential, & Incremental Backups (at File or Block Level)
Full, differential, and incremental backups are supported. pgBackRest is not susceptible to the time resolution issues of rsync, making differential and incremental backups safe without the requirement to checksum each file. Block-level backups save space by only copying the parts of files that have changed.
Backup Rotation & Archive Expiration
Retention polices can be set for full and differential backups to create coverage for any time frame. The WAL archive can be maintained for all backups or strictly for the most recent backups. In the latter case WAL required to make older backups consistent will be maintained in the archive.
Backup Integrity
Checksums are calculated for every file in the backup and rechecked during a restore or verify. After a backup finishes copying files, it waits until every WAL segment required to make the backup consistent reaches the repository.
Backups in the repository may be stored in the same format as a standard PostgreSQL cluster (including tablespaces). If compression is disabled and hard links are enabled it is possible to snapshot a backup in the repository and bring up a PostgreSQL cluster directly on the snapshot. This is advantageous for terabyte-scale databases that are time consuming to restore in the traditional way.
All operations utilize file and directory level fsync to ensure durability.
Page Checksums
If page checksums are enabled pgBackRest will validate the checksums for every file that is copied during a backup. All page checksums are validated during a full backup and checksums in files that have changed are validated during differential and incremental backups.
Validation failures do not stop the backup process, but warnings with details of exactly which pages have failed validation are output to the console and file log.
This feature allows page-level corruption to be detected early, before backups that contain valid copies of the data have expired.
Backup Resume
An interrupted backup can be resumed from the point where it was stopped. Files that were already copied are compared with the checksums in the manifest to ensure integrity. Since this operation can take place entirely on the repository host, it reduces load on the PostgreSQL host and saves time since checksum calculation is faster than compressing and retransmitting data.
Streaming Compression & Checksums
Compression and checksum calculations are performed in stream while files are being copied to the repository, whether the repository is located locally or remotely.
If the repository is on a repository host, compression is performed on the PostgreSQL host and files are transmitted in a compressed format and simply stored on the repository host. When compression is disabled a lower level of compression is utilized to make efficient use of available bandwidth while keeping CPU cost to a minimum.
Delta Restore
The manifest contains checksums for every file in the backup so that during a restore it is possible to use these checksums to speed processing enormously. On a delta restore any files not present in the backup are first removed and then checksums are generated for the remaining files. Files that match the backup are left in place and the rest of the files are restored as usual. Parallel processing can lead to a dramatic reduction in restore times.
Parallel, Asynchronous WAL Push & Get
Dedicated commands are included for pushing WAL to the archive and getting WAL from the archive. Both commands support parallelism to accelerate processing and run asynchronously to provide the fastest possible response time to PostgreSQL.
WAL push automatically detects WAL segments that are pushed multiple times and de-duplicates when the segment is identical, otherwise an error is raised. Asynchronous WAL push allows transfer to be offloaded to another process which compresses WAL segments in parallel for maximum throughput. This can be a critical feature for databases with extremely high write volume.
Asynchronous WAL get maintains a local queue of WAL segments that are decompressed and ready for replay. This reduces the time needed to provide WAL to PostgreSQL which maximizes replay speed. Higher-latency connections and storage (such as S3) benefit the most.
The push and get commands both ensure that the database and repository match by comparing PostgreSQL versions and system identifiers. This virtually eliminates the possibility of misconfiguring the WAL archive location.
Tablespace & Link Support
Tablespaces are fully supported and on restore tablespaces can be remapped to any location. It is also possible to remap all tablespaces to one location with a single command which is useful for development restores.
File and directory links are supported for any file or directory in the PostgreSQL cluster. When restoring it is possible to restore all links to their original locations, remap some or all links, or restore some or all links as normal files or directories within the cluster directory.
S3, Azure, and GCS Compatible Object Store Support
pgBackRest repositories can be located in S3, Azure, and GCS compatible object stores to allow for virtually unlimited capacity and retention.
Encryption
pgBackRest can encrypt the repository to secure backups wherever they are stored.
Compatibility with ten versions of PostgreSQL
pgBackRest includes support for ten versions of PostgreSQL, the five supported versions and the last five EOL versions. This allows ample time to upgrade to a supported version.
Getting Started
pgBackRest strives to be easy to configure and operate:
User guides for various operating systems and PostgreSQL versions.
User guides for various operating systems and PostgreSQL versions.
Command reference for command-line operations.
Command reference for command-line operations.
Configuration reference for creating pgBackRest configurations.
Configuration reference for creating pgBackRest configurations.
Sponsorship
pgBackRest would not exist without sponsors. Writing new features, fixing bugs, reviewing contributions, answering questions from the community, and maintenance all take a considerable amount of time.
Current sponsors: Supabase.
Past sponsors: Crunchy Data, Resonate.
Recognition
Armchair graphic by Alexander Skowalsky.
April 2026
This is a 24/7 live feed of Claude Sonnet 4.6 prompting talkie-1930 – 13b-it in order to explore its knowledge, capabilities, and inclinations. talkie’s outputs reflect the culture and values of the texts it was trained on, not the views of its authors.
Why vintage language models?
Have you ever daydreamed about talking to someone from the past? What would you ask someone with no knowledge of the modern world? What would they ask you? While we don’t have time machines yet, we can simulate this experience by training, in Owain Evans’s phrase, ‘vintage’ language models: LMs trained only on historical text.
These models are fascinating conversation partners (watch Claude prompt talkie, our 13B 1930 LM, in the widget above). But we are also excited by the possibility that the careful study of the behaviors and capabilities of vintage LMs will advance our understanding of AI in general.
Figure 1. In an early attempt to understand a vintage model’s anticipation of the future, we took nearly 5,000 historical event descriptions from the New York Times’s “On This Day” feature, calculated their surprisingness (measured as bits per byte of text) to our 13B model trained exclusively on pre-1931 text, and binned by decade.
For example, we can evaluate LMs’ ability to predict the future. Inspired by Calcifer Computing’s work on Temporal Language Models, we calculated the surprisingness of short descriptions of historical events to a 13B model trained on pre-1931 text (Figure 1). We can see an increase after the knowledge cutoff, particularly pronounced in the 1950s and 1960s, followed by a plateau. We will continue to develop evals to measure with greater confidence how forecasting performance improves with model size and decays at longer horizons. Training larger vintage language models will allow us to uncover these scaling trends.
Figure 2. Patents and a paper published after talkie’s knowledge cutoff. Left to right: helicopter patent (Sikorsky, 1935), Turing machines paper (Turing, 1936), xerography patent (Carlson, 1942).
Similarly, we can test LMs’ abilities to come up with new ideas by seeing if they can arrive at inventions or scientific discoveries we know would arise after their knowledge cutoffs, such as those pictured in Figure 2. As Demis Hassabis has asked, could a model trained up to 1911 independently discover General Relativity, as Einstein did in 1915?
Figure 3. We gave a Python programming test (HumanEval) to a series of pairs of vintage models (trained on pre-1931 text) and modern models (trained on the web), which have the same architecture. Left: This chart shows what percentage of problems each model would get right at least once, given 100 chances and randomly chosen Python functions as examples to learn from in-context. Right: An example of a successful solution to a Python coding problem produced by a vintage language model. The model had access to several other in-context examples to learn from.
Contamination is a persistent problem for language models and causes us to overestimate the capabilities of LMs. Vintage LMs are contamination-free by construction, enabling unique generalization experiments, like examining whether a model with no knowledge of digital computers can learn to code in a modern programming language. Figure 3 (left-hand side) shows an early example of such a test, measuring how well models trained on pre-1931 text can, when given a few demonstration examples of Python programs, write new correct programs. While vintage models dramatically underperform models trained on web data (which includes code), we’ve found that they are slowly but steadily improving at this task with scale.
There is still a long way to go before this capability is notable, however. All correct solutions generated by the vintage models are simple one-line programs (such as adding two inputs), or small modifications to in-context example programs. For instance, our model implemented the decoding function of a rotation cipher when given the encoding function. Although the solution (Figure 3, right-hand side) is only a single character edit (swapping an addition for a subtraction), this success suggests an understanding of inverse functions. We hope LMs with early knowledge cutoffs help the research community understand how well LMs can generalize beyond their pre-training data.
Vintage language models could also teach us about the impact of data diversity in AI development. While modern models vary in disposition, capability, and behavior, they are all closely related to one another by having been trained, whether directly or indirectly (via distillation and synthetic data), on the web. How does this shape and constrain what they are? How much of what we think we know about LMs is about human language and culture in general, or about this one dataset—the web—in particular? Training on different sources may lead to very different kinds of models being created. Studying the ways in which they are similar and different could improve our understanding of language model personas, behaviors, and dispositions.
Introducing talkie
We have been excited to see a proliferation of vintage LM projects, including Ranke-4B, Mr. Chatterbox, and Machina Mirabilis.
Alongside these efforts, we introduce talkie-1930 – 13b-base, a 13B language model trained on 260B tokens of historical pre-1931 English text. Additionally, we present a post-trained checkpoint turning our base model into a conversation partner without relying on modern chat transcripts or instruction-tuning data.
talkie is the largest vintage language model we are aware of, and we plan to continue scaling significantly. As a next step, we are training a GPT-3-level model, which we hope to release this summer. A preliminary estimate also suggests we can grow our corpus to well over a trillion tokens of historical text, which should be sufficient to create a GPT-3.5 level model—similar in capability to the original ChatGPT.
Benchmarking an LM from 1930
Figure 4. Evaluation accuracy vs. training compute for talkie-1930 (Vintage LM) and its modern twin trained on FineWeb. The vintage model underperforms the modern model on knowledge evals. Filtering out questions anachronistic from the perspective of 1930 roughly halves the performance gap between the vintage and modern models.
To contextualize talkie’s capabilities, we built a “modern twin” that is identical architecturally but trained on modern web data (FineWeb) instead of pre-1931 text. On average, talkie underperforms its modern counterpart in standard LM evaluations, even after correcting for question anachronism, despite being trained with the same number of FLOPs (see Figure 4). But we have been encouraged by its similar performance on core language understanding and numeracy tasks.
We suspect a combination of differences in data quality (poor optical character recognition) and corpus subject matter distribution explains why talkie-1930 underperforms on some benchmarks. To maximize the compute efficiency of future vintage language model training, we are developing a vintage optical character recognition (OCR) system to improve the quality of transcription of historical text.
Collecting vintage data
Piggybacking off the invaluable work of organizations like the Institutional Data Initiative and the Internet Archive and efforts like Common Pile, we have collected hundreds of billions of pre-1931 English-language tokens. These include books, newspapers, periodicals, scientific journals, patents, and case law. We chose the end of 1930 as the cutoff date because that is when works enter the public domain in the United States. For this version of the model, we also limited ourselves to primarily English-language texts, because validating the data pipeline requires deep familiarity with source documents, and we are native English speakers. But multilingual corpus expansion is a high priority, both to increase the size of the corpus and the diversity of perspectives it represents.
Training vintage language models
Developing vintage language models presents unique challenges. Here, we discuss some of them in brief. We will follow up in greater detail in the coming months as we continue our research.
Temporal leakage
Who was president of the United States in 1936, and what were the most significant pieces of legislation he signed?
Franklin D. Roosevelt was president of the United States from 1933 to 1937. The most important legislation signed by him was the National Recovery Act of 1933; the Agricultural Adjustment Act of 1935; and the Emergency Banking Act of 1935 (amended in 1936).
Figure 5. talkie-1930 – 13b’s knowledge of the Roosevelt presidency and New Deal is an example of imperfect filtering of the pre-training corpus.
The most important objective when training vintage language models is that no data leaks into the training corpus from after the intended knowledge cutoff (in our case, December 31st, 1930). There are several ways this can happen, such as including modern documents with faulty date metadata, or old documents with post hoc anachronistic insertions like editorial introductions or footnotes.
For talkie-1930, we developed a document-level n-gram-based anachronism classifier and used it to filter the pre-training corpus. However, this was not perfect. An earlier 7B version of talkie clearly knew about the Roosevelt presidency and New Deal legislation (Figure 5). talkie-1930 – 13b is additionally aware of some details related to World War II and the immediate postwar order (the United Nations and the division of Germany). For future versions of the model, we are developing new techniques for leakage detection and filtering using more advanced classifiers.
Data quality
Figure 6. OCR errors reduce language model learning efficiency. Left: Training LMs on pre-1931 texts transcribed using conventional OCR systems only shows 30% of the learning efficiency of a model trained on human-transcribed versions of the same texts. Regex cleaning of the OCR’d text recovers some performance. Right: Example of a messy machine transcription of The Wonderful Wizard of Oz (Baum, 1899).
Data quality is an important issue for all machine learning experiments. It is a special challenge when training vintage language models. Because there was no digital publishing in 1930, all text in our dataset had to be transcribed from a physical source, which introduces a form of noise not seen in natively digital text. While OCR was an early success story of machine learning and computer vision, the classic OCR systems often used to transcribe historical documents struggle on all but the simplest layouts and cleanest scans. Modern VLM-based systems have higher accuracy, but we have found they are prone to hallucinate modern facts into our corpus, poisoning the exercise.
In controlled experiments, we have found that when training an LM on pre-1931 texts transcribed using conventional OCR systems, for a given amount of compute, they only achieve 30% of the performance of a model trained on human-transcribed versions of the same texts (see Figure 6). Simple regex cleaning brings that number up to 70%—still a large discrepancy. We aim to shrink the remaining gap in performance by retranscribing the talkie corpus using our vintage OCR system.
Vintage post-training
Figure 7. Examples of historical reference texts with regular structure used for post-training. Left to right: etiquette manual (Beadle, 1859), practical knowledge book (Henley, 1914), parlor guide (Sandison, c. 1895), letter-writing manual (Chambers, 1900).
The lack of ready-made post-training data is another significant challenge. Fine-tuning our base model on off-the-shelf instruction-response pairs would bake in anachronistic knowledge, style, and expectations of what a chat assistant ought to be like. Rather than attempting to filter out these biases, we built a post-training pipeline from scratch.
First, we generated instruction-response pairs from historical texts with regular structure, such as etiquette manuals, letter-writing manuals, cookbooks, dictionaries, encyclopedias, and poetry and fable collections (see Figure 7), and fine-tuned our base model on them using a simple chat format.
Next, to improve instruction-following abilities, we generated synthetic prompts covering different types of tasks, such as summarizing documents, responding to direct information requests, and continuing multi-turn conversations coherently. We then ran online direct preference optimization on rollouts generated from these prompts, using Claude Sonnet 4.6 as a judge. Over the course of training, on a held-out eval set, the judge’s average instruction-following rating of talkie’s responses increased from 2.0 to 3.4 (on a five-point scale).
Finally, we did another round of supervised fine-tuning, this time on rejection-sampled multi-turn synthetic chats between Claude Opus 4.6 and talkie, to smooth out persistent rough edges in its conversational abilities.
While we have tried to post-train talkie free from modern influence, reinforcement learning with AI feedback inevitably shapes talkie’s behavior anachronistically. (The 7B version of talkie emerged from RL speaking in listicles.) As we scale up, we hope to be able to use our vintage base models themselves as judges to enable a fully bootstrapped era-appropriate post-training pipeline.
Scaling talkie
We plan to scale talkie rapidly in the coming months. This will entail:
Increasing the size of our English-language corpus, and expanding it beyond English.
Re-OCR’ing as much of pre-1931 text as is feasible using our new OCR system.
Strengthening the leakage detection pipeline by developing new anachronism classification techniques.
Expanding and refining the vintage post-training pipeline in collaboration with historians, including by developing methodologies for constructing accurate historical personas.
Join us
We are excited to collaborate with researchers and institutions to build the next generation of vintage language models. Please get in touch.
Are you a researcher or institution with historical texts? We’d love to discuss how we can help make them accessible to researchers and readers, including by applying our OCR model.
Are you an individual or institution interested in supporting vintage language model development with funding or compute? We can likely use either, or put you in touch with other teams working in the space.
Are you an academic in the humanities? We are excited to discuss how vintage language models, and the data and infrastructure used to train them, could be useful for your research.
Are you an AI researcher? We would love to support and collaborate on research on training and studying vintage language models.
Are you an artist or writer? We think vintage language models could be fruitful tools to experiment with.
Content considerations
talkie reflects the culture and values of the texts it was trained on. As such, it can produce outputs that will be offensive to users.
Acknowledgements
Thanks to Coefficient Giving and Anthropic for support with funding and compute.
For helpful discussions, we thank Pranav Anand, Benjamin Breen, Catherine Brobston, Collin Burns, Matteo Cargnelutti, Mackenzie Cooley, Brandon Duderstadt, Owain Evans, Chloë Farr, Ryan Greenblatt, Michael Hla, Mark Humphries, Sam Klein, Greg Leppert, Jack Lindsey, Christina Lu, Seoirse Murray, Jake Naviasky, Krishna Patel, Ethan Perez, Puria Radmard, Ludwig Schmidt, Buck Shlegeris, Benjamin Sturgeon, Daniel Tan, Ross Taylor, Cam Tice, Trip Venturella, Merlijn Wajer, and Tao Xu.
Citation
@article{levine2026talkie,
title={Introducing talkie: a 13B vintage language model from 1930},
author={Levine, Nick and Duvenaud, David and Radford, Alec},
year={2026},
month={April},
url={https://talkie-lm.com/introducing-talkie}
}
China’s state planner on Monday called for Meta to unwind its $2 billion acquisition of Manus, a Singaporean artificial intelligence startup with Chinese roots.
The decision to prohibit foreign investment in Manus was made in accordance with laws and regulations, the National Development and Reform Commission said in a brief statement. It added that it has asked the parties involved to withdraw the acquisition transaction.
Shares of Meta closed 0.53% higher on Monday.
The deal had attracted scrutiny from both China and Washington, as lawmakers in the U.S. have prohibited American investors from backing Chinese AI companies directly. Meanwhile, Beijing has increased efforts to discourage Chinese AI founders from moving business offshore.
watch now
The Chinese government’s intervention in the transaction drew alarm among tech founders and venture capitalists in the country who were hoping to take advantage of the so-called Singapore-washing model, where companies relocate from China to the city-state to avoid scrutiny from Beijing and Washington.
Manus was founded in China before relocating to Singapore. The company develops general-purpose AI agents and launched its first general AI agent in March last year, which can execute complex tasks such as market research, coding and data analysis. The release saw the startup lauded as the next DeepSeek.
Manus said it had passed $100 million in annual recurring revenue, or ARR, in December, eight months on from launching a product, which it claimed made it the fastest startup in the world at the time to hit the milestone from $0.
The company raised $75 million in a round led by U.S. VC Benchmark in April last year.
When Meta announced the deal late last year, the tech giant said it would look to accelerate artificial intelligence innovation for businesses and integrate advanced automation into its consumer and enterprise products, including its Meta AI assistant.
But in January, China’s Ministry of Commerce said it would conduct an assessment and investigation into how the acquisition complied with laws and regulations concerning export controls, technology import and export, and overseas investment.
A Meta spokesperson told CNBC that the transaction “complied fully with applicable law,” and that it anticipated “an appropriate resolution to the inquiry.”
When asked about China’s move to block Meta’s acquisition of Manus, APEC Senior Officials Meeting Chairman Chen Xu told reporters that it is “important that all parties act in a spirit of mutual benefit.”
While Chen said he did not know the specifics of the issue, he said that “if such an issue can be handled properly, it can help facilitate more substantive discussions in APEC.” That’s according to an official English translation.
— CNBC’s Anniek Bao and Dylan Butts contributed to this story.
Dirac - Accurate & Highly Token Efficient Open Source AI Agent
Dirac topped the Terminal-Bench-2 leaderboard for gemini-3-flash-preview with a 65.2% score!
Dirac topped the Terminal-Bench-2 leaderboard for gemini-3-flash-preview with a 65.2% score!
It is a well studied phenomenon that any given model’s reasoning ability degrades with the context length. If we can keep context tightly curated, we improve both accuracy and cost while making larger changes tractable in a single task.
Dirac is an open-source coding agent built with this in mind. It reduces API costs by 64.8% on average while producing better and faster work. Using hash-anchored parallel edits, AST manipulation, and a suite of advanced optimizations. Oh, and no MCP.
Our goal: Optimize for bang-for-the-buck on tooling with bare minimum prompting instead of going blindly minimalistic.
📊 Evals
Dirac is benchmarked against other leading open-source agents on complex, real-world refactoring tasks. Dirac consistently achieves 100% accuracy at a fraction of the cost. These evals are run on public github repos and should be reproducible by anyone.
🏆 TerminalBench 2.0 Leaderboard: Dirac recently topped the Terminal-Bench-2 leaderboard with a 65.2% score using gemini-3-flash-preview. This outperforms both Google’s official baseline (47.6%) and the top closed-source agent Junie CLI (64.3%). This was achieved without any benchmark-specific info or any AGENTS.md files being inserted.
🏆 TerminalBench 2.0 Leaderboard: Dirac recently topped the Terminal-Bench-2 leaderboard with a 65.2% score using gemini-3-flash-preview. This outperforms both Google’s official baseline (47.6%) and the top closed-source agent Junie CLI (64.3%). This was achieved without any benchmark-specific info or any AGENTS.md files being inserted.
Note on the cost table below: A bug was discovered in Cline, the parent repo, after running these evals (issue #10314). We have submitted a PR #10315 to fix this. This bug caused the evals for Dirac and Cline to slightly underreport the numbers ($0.03 vs $0.05 per million token cache read). Although there won’t be a large difference, we will update the evals soon.
Note on the cost table below: A bug was discovered in Cline, the parent repo, after running these evals (issue #10314). We have submitted a PR #10315 to fix this. This bug caused the evals for Dirac and Cline to slightly underreport the numbers ($0.03 vs $0.05 per million token cache read). Although there won’t be a large difference, we will update the evals soon.
All tasks for all models used gemini-3-flash-preview with thinking set to high
🟢 Success | 🟡 Incomplete | 🔴 Failure
🟢 Success | 🟡 Incomplete | 🔴 Failure
Cost Comparison: Dirac is 64.8% cheaper than the competition (a 2.8x cost reduction).
* Expected number of files to be modified/created to complete the task.
See evals/README.md for detailed task descriptions and methodology.
Cost Comparison: Dirac is 64.8% cheaper than the competition (a 2.8x cost reduction).
* Expected number of files to be modified/created to complete the task.
See evals/README.md for detailed task descriptions and methodology.
🚀 Key Features
Hash-Anchored Edits: Dirac uses stable line hashes to target edits with extreme precision, avoiding the “lost in translation” issues of traditional line-number based editing.
AST-Native Precision: Built-in understanding of language syntax (TypeScript, Python, C++, etc.) allows Dirac to perform structural manipulations like function extraction or class refactoring with 100% accuracy.
Multi-File Batching: Dirac can process and edit multiple files in a single LLM roundtrip, significantly reducing latency and API costs.
High-Bandwidth Context: Optimized context curation keeps the agent lean and fast, ensuring the LLM always has the most relevant information without wasting tokens.
Autonomous Tool Use: Dirac can read/write files, execute terminal commands, use a headless browser, and more - all while keeping you in control with an approval-based workflow.
Skills & AGENTS.md: Customize Dirac’s behavior with project-specific instructions using AGENTS.md files. It also seamlessly picks up Claude’s skills by automatically reading from .ai, .claude, and .agents directories.
Native Tool Calling Only: To ensure maximum reliability and performance, Dirac exclusively supports models with native tool calling enabled. (Note: MCP is not supported).
📦 Installation
VS Code Extension
Install Dirac from the VS Code Marketplace.
CLI (Terminal)
Install the Dirac CLI globally using npm:
npm install -g dirac-cli
🚀 CLI Quick Start
Authenticate:
dirac auth
dirac auth
Run your first task:
dirac “Analyze the architecture of this project”
dirac “Analyze the architecture of this project”
Configuration (Environment Variables)
You can provide API keys via environment variables to skip the dirac auth step. This is ideal for CI/CD or non-persistent environments:
ANTHROPIC_API_KEY
OPENAI_API_KEY
OPENROUTER_API_KEY
GEMINI_API_KEY
GROQ_API_KEY
MISTRAL_API_KEY
XAI_API_KEY (x.ai)
HF_TOKEN (HuggingFace)
… and others (see src/shared/storage/env-config.ts for the full list).
AWS Bedrock
Use Bedrock by setting AWS credentials and region. When any of these are present, Dirac automatically switches to the Bedrock provider:
AWS_REGION — AWS region (e.g. us-east-1)
AWS_ACCESS_KEY_ID — AWS access key
AWS_SECRET_ACCESS_KEY — AWS secret key
AWS_SESSION_TOKEN — session token (for temporary credentials)
AWS_BEDROCK_MODEL — model ID for both act and plan modes (e.g. us.anthropic.claude-sonnet-4 – 6)
AWS_BEDROCK_MODEL_ACT — model ID for act mode only
AWS_BEDROCK_MODEL_PLAN — model ID for plan mode only
Works seamlessly with aws-vault:
AWS_REGION=us-east-1 AWS_BEDROCK_MODEL=us.anthropic.claude-sonnet-4 – 6 \
aws-vault exec my-profile — dirac “your task”
Note: Newer Claude models on Bedrock (Sonnet 4.6+) require a cross-region inference profile prefix (us., eu., ap.). See the AWS docs for supported model IDs.
Note: Newer Claude models on Bedrock (Sonnet 4.6+) require a cross-region inference profile prefix (us., eu., ap.). See the AWS docs for supported model IDs.
Common Commands
dirac “prompt”: Start an interactive task.
dirac -p “prompt”: Run in Plan Mode to see the strategy before executing.
dirac -y “prompt”: Yolo Mode (auto-approve all actions, great for simple fixes).
git diff | dirac “Review these changes”: Pipe context directly into Dirac.
dirac history: View and resume previous tasks.
🛠️ Getting Started
Open the Dirac sidebar in VS Code.
Configure your preferred AI provider (Anthropic, OpenAI, OpenRouter, etc.).
Start a new task by describing what you want to build or fix.
Watch Dirac go!
📈 Star History
📄 License
Dirac is open source and licensed under the Apache License 2.0.
🤝 Acknowledgments
Dirac is a fork of the excellent Cline project. We are grateful to the Cline team and contributors for their foundational work.
Built with ❤️ by Max Trivedi at Dirac Delta Labs
De Nederlandsche Bank will sign a major contract tomorrow with Schwarz Digits, the IT arm of Lidl owner Schwarz Group. DNB aims to reduce its dependence on American cloud companies. As a major Dutch organization, it is opting for a European partner, but how will this play out?
Sales Director Bernd Wagner announced the news on Monday at the Hannover Messe, according to De Telegraaf. The move itself comes as no surprise. DNB Director Steven Maijoor announced last October that he intended to “set a good example” and switch to a European cloud, though he acknowledged that it “is not yet as robust or high-quality as the one from the U.S.”
That is precisely the consideration every company must make. Can a European alternative function well enough to meet the organization’s requirements and needs? Lidl’s platform has been in development for years, but those of Amazon, Google, and Microsoft have sometimes had as much as 20 years of development work behind them.
The fact that the transition to European alternatives does not always go smoothly is evident in Schleswig-Holstein, where the local government is already struggling with the migration from Microsoft to an open-source environment.
Large organizations are already connected to Lidl’s cloud. For example, Lidl and the German supermarket chain Kaufland use it. Deutsche Bahn also collaborates with the Schwarz Group. But now a Dutch organization from a highly regulated sector is opting for this cloud.
Concerns about cloud dependency
Last year, the Dutch Central Bank (DNB) and the Netherlands Authority for the Financial Markets (AFM) warned that the Dutch financial sector had become too dependent on foreign IT service providers, particularly American ones. These concerns were fueled by geopolitical tensions. For example, a prosecutor at the International Criminal Court in The Hague was cut off from his Microsoft email account by President Donald Trump. The ICC is now also switching to non-American systems.
Incidentally, when issuing that warning, DNB also had to admit that it itself is largely dependent on American service providers for its digital infrastructure.
Schwarz Digits and Stackit
Schwarz Digits, via the Stackit cloud platform, has long positioned itself as a European alternative to American hyperscalers. The Lidl-owned company is building a sovereign cloud where all data falls under European law. This sets it apart from American providers, who, under the Cloud Act, are required to hand over data to U.S. authorities. Schwarz Digits recently announced an investment of 11 billion euros in a large data center in Lübbenau.
The project originally began as an internal IT system for Lidl and Kaufland, but is now also attracting external clients, including SAP and Bayern Munich. Together with Deutsche Telekom, it is working on broader European IT alternatives.
A spokesperson for DNB confirmed concerns about cloud dependency on Monday but declined to comment on individual contracts. “That is why, with every new step toward the cloud, we explicitly assess geopolitical risks and explore how we can reduce our dependency,” the spokesperson said.
Spend your time writing
and don’t worry about the rest.
.docauthor {Jennifer Chu}
.pagemargin {topright}
.docauthor | MIT News
# X-ray flashes from a supermassive black hole
!(70%)[Black hole](img/blackhole.jpg)
.abstract
One supermassive black hole has kept astronomers glued to their scopes
for the last several years.
The black hole in question is `1ES 1927+654`, which is about as
massive as a million suns and sits in a galaxy that is 270 million
light-years away.
In 2018, astronomers at MIT and elsewhere observed that the black
hole’s corona — a cloud of whirling, white-hot plasma — suddenly
**disappeared**, before reassembling months later.
The brief though dramatic shut-off was a first in black hole astronomy.
> This would be the closest thing we know of around any black hole.
> - Megan Masterson, a graduate student in physics at MIT
Complete authoring experience
Write Markdown to reach flow state faster.Use Quarkdown’s extensions to achieve more.
One tool to rule them all
Whether you are writing a research paper, a quick report, a company-wide wiki, class notes, or preparing interactive slides for your next talk, there’s only one line you need.
Replaces
.doctype {paged}
For articles, books and reports.
Replaces
.doctype {plain}
For notes, knowledge bases and simple static websites.
Replaces
.doctype {docs}
For wikis, technical documentation and large knowledge bases.
Replaces
.doctype {slides}
For lectures, talks and interactive presentations.
Typesetting for the impatient
With blazing fast compilation and live preview, see results instantly as you type.
Don’t repeat yourself
Reuse your workflow thanks to powerful scripting capabilities.
.function {animal}
name ecosystem picture:
.row
.clip {circle}
.picture
- **Name**: .name
- **Ecosystem**: .ecosystem
.animal {Red panda} ecosystem:{Temperate forests}

.animal {Sea otter} ecosystem:{Kelp forests}

.animal {Clownfish} ecosystem:{Coral reefs}

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.