10 interesting stories served every morning and every evening.

Can we have the day off?

mlsu.io

May 26

So, ap­par­ently we are at the cusp of the en­tire world’s white col­lar work­force (and, by ex­ten­sion, much of the US work­force) un­der­go­ing a rev­o­lu­tion in pro­duc­tiv­ity. AI is the tech­nol­ogy that is go­ing to rev­o­lu­tion­ize the way we work, the way we in­ter­act with the world, the way we learn, the way we so­cial­ize, and all of this. This sounds great. Really, it does. Everything get­ting faster and eas­ier would be an ex­tra­or­di­nary boon to all of our lives.

Can we have a day off then?

If AI is go­ing to 10x our pro­duc­tiv­ity across the board, that means that I should be able to pro­duce the same amount of out­put by mid­day on Monday that, in the be­fore times, would have taken all week.

So can I just take Friday off? From here on out, I’ll work Monday, Tuesday, Wednesday, Thursday, and then take Friday off. We can even de­clare Friday to be some­thing like an AI work­ers’ day;” on Thursday I promise I’ll work my ass off writ­ing great prompts and then the agents can churn on them all day on Friday. In that case, you’d hardly even lose Friday, right?

And like, this would ap­ply across the board, of course. So you, the board of di­rec­tors, and the C-suite, you guys could take Friday off to go to play an en­tire 18 holes at the golf course. It’ll be beau­ti­ful, can you imag­ine? You don’t have to be in the of­fice, be­cause I’m not go­ing to be in the of­fice. You don’t have to be at the of­fice, be­cause the AI agents are there. And nei­ther do I!

Just one ex­tra day. Seems rea­son­able and quite a small change re­ally, in light of the to­tal world rev­o­lu­tion across every swathe of hu­man pro­duc­tiv­ity.

(Yo, Elon: I’m try­ing to in­crease the fer­til­ity rate. Childcare for 3 small chil­dren is six thou­sand dol­lars a month here in California. Do I have to go into the of­fice all five days this week? Why not four?)

Introducing Claude Opus 4.8

www.anthropic.com

We’re up­grad­ing Claude Opus to a new ver­sion: Claude Opus 4.8. It builds on Opus 4.7 with im­prove­ments across bench­marks, and is a more ef­fec­tive col­lab­o­ra­tor. It’s avail­able to­day for the same price.

Opus 4.8 launches along­side sev­eral new fea­tures. Users on claude.ai now have con­trol over the amount of ef­fort Claude puts into a task. Claude Code has a new dynamic work­flows” fea­ture that al­lows it to tackle very large-scale prob­lems. And fast mode for Opus 4.8—where the model can work at 2.5× the speed—is now three times cheaper than it was for pre­vi­ous mod­els.

Opus 4.8’s ca­pa­bil­i­ties

The table be­low shows how Opus 4.8 com­pares to its pre­de­ces­sor and to other mod­els on tests of cod­ing, agen­tic skills, rea­son­ing, and prac­ti­cal knowl­edge work tasks. More de­tails and a much wider range of ca­pa­bil­ity eval­u­a­tions are pro­vided in the Claude Opus 4.8 System Card.

Collaborating with Opus 4.8

Early testers have found Claude Opus 4.8 to be more re­li­able and sharper in its judge­ment when it’s per­form­ing agen­tic tasks. Below are quotes from many of these testers about their ex­pe­ri­ence col­lab­o­rat­ing with Opus 4.8:

Claude Opus 4.8 has no­tice­ably bet­ter judg­ment. In Claude Code, it asks the right ques­tions, catches its own mis­takes, pushes back when a plan is­n’t sound, and builds up con­fi­dence around com­plex, multi-ser­vice ex­plo­rations be­fore mak­ing big changes. It’s a great model to build with.

Claude Opus 4.8 has no­tice­ably bet­ter judg­ment. In Claude Code, it asks the right ques­tions, catches its own mis­takes, pushes back when a plan is­n’t sound, and builds up con­fi­dence around com­plex, multi-ser­vice ex­plo­rations be­fore mak­ing big changes. It’s a great model to build with.

On our Super-Agent bench­mark, Claude Opus 4.8 is the only model to com­plete every case end-to-end, beat­ing prior Opus mod­els and GPT-5.5 at par­ity on cost. For agent prod­ucts in trans­la­tion, deep re­search, slide-build­ing, and analy­sis, it de­liv­ers pow­er­ful re­li­a­bil­ity.

On our Super-Agent bench­mark, Claude Opus 4.8 is the only model to com­plete every case end-to-end, beat­ing prior Opus mod­els and GPT-5.5 at par­ity on cost. For agent prod­ucts in trans­la­tion, deep re­search, slide-build­ing, and analy­sis, it de­liv­ers pow­er­ful re­li­a­bil­ity.

On CursorBench, Claude Opus 4.8 ex­ceeds prior Opus mod­els across every ef­fort level. Tool call­ing is mean­ing­fully more ef­fi­cient, us­ing fewer steps for the same in­tel­li­gence, and it car­ries end-to-end tasks through.

On CursorBench, Claude Opus 4.8 ex­ceeds prior Opus mod­els across every ef­fort level. Tool call­ing is mean­ing­fully more ef­fi­cient, us­ing fewer steps for the same in­tel­li­gence, and it car­ries end-to-end tasks through.

Claude Opus 4.8 de­liv­ers the high­est score recorded on our Legal Agent Benchmark, and is the first model to break 10% over­all on the all-pass stan­dard. For sub­stan­tive le­gal work, that’s the kind of ac­cu­racy lift that trans­lates di­rectly into how much real at­tor­ney work our cus­tomers can hand off with con­fi­dence.

Claude Opus 4.8 de­liv­ers the high­est score recorded on our Legal Agent Benchmark, and is the first model to break 10% over­all on the all-pass stan­dard. For sub­stan­tive le­gal work, that’s the kind of ac­cu­racy lift that trans­lates di­rectly into how much real at­tor­ney work our cus­tomers can hand off with con­fi­dence.

Claude Opus 4.8 feels like a ma­jor qual­ity-of-life up­date over Opus 4.7: faster, eas­ier to col­lab­o­rate with, and bet­ter at car­ry­ing con­text and style di­rec­tion across a long ses­sion. Opus 4.8 is the model I kept trust­ing for work where voice, taste, and tech­ni­cal ex­e­cu­tion all have to hap­pen side-by-side.

Claude Opus 4.8 feels like a ma­jor qual­ity-of-life up­date over Opus 4.7: faster, eas­ier to col­lab­o­rate with, and bet­ter at car­ry­ing con­text and style di­rec­tion across a long ses­sion. Opus 4.8 is the model I kept trust­ing for work where voice, taste, and tech­ni­cal ex­e­cu­tion all have to hap­pen side-by-side.

Claude Opus 4.8 is the strongest com­puter-use and browser-agent model we’ve tested, scor­ing 84% on Online-Mind2Web, which is a mean­ing­ful jump over both Opus 4.7 and GPT-5.5. It stays re­flec­tive and on-task in the way our cus­tomers’ agent work­loads need to be re­li­able end-to-end.

Claude Opus 4.8 is the strongest com­puter-use and browser-agent model we’ve tested, scor­ing 84% on Online-Mind2Web, which is a mean­ing­ful jump over both Opus 4.7 and GPT-5.5. It stays re­flec­tive and on-task in the way our cus­tomers’ agent work­loads need to be re­li­able end-to-end.

Claude Opus 4.8 uses tools cleanly and fol­lows in­struc­tions with the con­sis­tency our au­tonomous en­gi­neer­ing work­loads need to keep run­ning un­at­tended. It im­proves on Opus 4.6 and fixes the com­ment-ver­bosity and tool-call­ing is­sues we saw with Opus 4.7. This re­lease from Anthropic trans­lates di­rectly into faster ca­pa­bil­ity gains for en­gi­neers build­ing on Devin.

Claude Opus 4.8 uses tools cleanly and fol­lows in­struc­tions with the con­sis­tency our au­tonomous en­gi­neer­ing work­loads need to keep run­ning un­at­tended. It im­proves on Opus 4.6 and fixes the com­ment-ver­bosity and tool-call­ing is­sues we saw with Opus 4.7. This re­lease from Anthropic trans­lates di­rectly into faster ca­pa­bil­ity gains for en­gi­neers build­ing on Devin.

On our long-run­ning evals, Claude Opus 4.8’s analy­sis was con­sis­tently higher qual­ity than prior Opus mod­els. It fin­ished faster and pro­duced richer, more in­for­ma­tion dense out­puts. Overall, a no­tice­ably bet­ter sig­nal to noise ra­tio. The biggest dif­fer­en­tia­tor was Opus 4.8’s ten­dency to proac­tively flag is­sues with the in­puts and out­puts of an analy­sis, some­thing other mod­els rou­tinely missed and left to the users to catch.

On our long-run­ning evals, Claude Opus 4.8’s analy­sis was con­sis­tently higher qual­ity than prior Opus mod­els. It fin­ished faster and pro­duced richer, more in­for­ma­tion dense out­puts. Overall, a no­tice­ably bet­ter sig­nal to noise ra­tio. The biggest dif­fer­en­tia­tor was Opus 4.8’s ten­dency to proac­tively flag is­sues with the in­puts and out­puts of an analy­sis, some­thing other mod­els rou­tinely missed and left to the users to catch.

Across CoCounsel Legal, Claude Opus 4.8 de­liv­ered mean­ing­ful im­prove­ments in con­sis­tency and rea­son­ing qual­ity com­pared to prior Opus mod­els. For the high-stakes pro­fes­sional work­flows our cus­tomers de­pend on, that re­li­a­bil­ity mat­ters. As we build fidu­ciary-grade AI sys­tems for le­gal and tax pro­fes­sion­als, ad­vances like these help raise the stan­dard for trusted AI per­for­mance in real-world work­flows.

Across CoCounsel Legal, Claude Opus 4.8 de­liv­ered mean­ing­ful im­prove­ments in con­sis­tency and rea­son­ing qual­ity com­pared to prior Opus mod­els. For the high-stakes pro­fes­sional work­flows our cus­tomers de­pend on, that re­li­a­bil­ity mat­ters. As we build fidu­ciary-grade AI sys­tems for le­gal and tax pro­fes­sion­als, ad­vances like these help raise the stan­dard for trusted AI per­for­mance in real-world work­flows.

Claude Opus 4.8 sets a new bar for en­ter­prise AI. In Genie, Databricks’ AI agent for data and knowl­edge work, the new Opus model un­locks a step change in agen­tic rea­son­ing, tack­ling deeper, mul­ti­step ques­tions faster than any prior Opus. Its mul­ti­modal strength also lets Genie rea­son di­rectly over PDFs, di­a­grams, and other un­struc­tured con­tent at 61% cheaper to­ken cost than Opus 4.7.

Claude Opus 4.8 sets a new bar for en­ter­prise AI. In Genie, Databricks’ AI agent for data and knowl­edge work, the new Opus model un­locks a step change in agen­tic rea­son­ing, tack­ling deeper, mul­ti­step ques­tions faster than any prior Opus. Its mul­ti­modal strength also lets Genie rea­son di­rectly over PDFs, di­a­grams, and other un­struc­tured con­tent at 61% cheaper to­ken cost than Opus 4.7.

For fi­nan­cial-doc­u­ment work­flows in Hebbia’s or­ches­tra­tor, Claude Opus 4.8 de­liv­ers the same strong qual­ity as Opus 4.7 with no­tice­ably bet­ter ci­ta­tion pre­ci­sion and more to­ken ef­fi­ciency on re­trieval, which works in­cred­i­bly well for the kinds of dense fil­ings our cus­tomers run every day.

For fi­nan­cial-doc­u­ment work­flows in Hebbia’s or­ches­tra­tor, Claude Opus 4.8 de­liv­ers the same strong qual­ity as Opus 4.7 with no­tice­ably bet­ter ci­ta­tion pre­ci­sion and more to­ken ef­fi­ciency on re­trieval, which works in­cred­i­bly well for the kinds of dense fil­ings our cus­tomers run every day.

01 /

11

One of the most promi­nent im­prove­ments in Opus 4.8 is its hon­esty. We train all our mod­els to be hon­est—for in­stance, to avoid mak­ing claims that they can’t sup­port. But a gen­eral prob­lem with AI mod­els is that they some­times jump to con­clu­sions, con­fi­dently claim­ing to have made progress in their work de­spite the ev­i­dence be­ing thin. Early testers re­port that Opus 4.8 is more likely to flag un­cer­tain­ties about its work and less likely to make un­sup­ported claims. This is borne out in our eval­u­a­tions, which show that Opus 4.8 is around four times less likely than its pre­de­ces­sor to al­low flaws in code it has writ­ten to pass un­re­marked.

As al­ways, we ran a de­tailed align­ment as­sess­ment on the model be­fore re­lease. In terms of pos­i­tive traits, our Alignment team con­cluded that Opus 4.8 reaches new highs on our mea­sures of proso­cial traits like sup­port­ing user au­ton­omy and act­ing in the user’s best in­ter­est.” The as­sess­ment also showed Opus 4.8 to have rates of mis­aligned be­hav­ior (such as de­cep­tion or co­op­er­a­tion with mis­use) that are sub­stan­tially lower than Opus 4.7, and sim­i­lar to our best-aligned model, Claude Mythos Preview. The full align­ment as­sess­ment, ac­com­pa­nied by a suite of pre-de­ploy­ment safety tests, is re­ported in the Claude Opus 4.8 System Card.

Also launch­ing to­day

In ad­di­tion to Claude Opus 4.8, we’re mak­ing the fol­low­ing up­dates:

Dynamic work­flows. This new fea­ture, avail­able in re­search pre­view, al­lows Claude to take on even big­ger tasks in Claude Code. Claude can plan the work and then run hun­dreds of par­al­lel sub­agents in a sin­gle ses­sion (and with Opus 4.8, the agents can run for even longer). It then ver­i­fies its out­puts be­fore re­port­ing back to the user. For ex­am­ple, Claude Code with Opus 4.8 can now carry out code­base-scale mi­gra­tions across hun­dreds of thou­sands of lines of code from kick­off to merge, with the ex­ist­ing test suite as its bar. You can read more about dy­namic work­flows—avail­able in Claude Code for Enterprise, Team, and Max plans—in this post.

Effort con­trol in claude.ai and Cowork. A new con­trol along­side the model se­lec­tor lets users choose how much ef­fort Claude puts into a re­sponse. On higher ef­fort set­tings, Claude will think more fre­quently and more deeply to give bet­ter re­sponses. On lower ef­fort set­tings, Claude will re­spond faster and use up a user’s rate lim­its more slowly. Users now have this choice—the ef­fort con­trol is avail­able on all plans.

The Messages API now ac­cepts sys­tem en­tries in­side the mes­sages ar­ray. Developers can up­date Claude’s in­struc­tions mid-task with­out break­ing the prompt cache or rout­ing the up­date through a user turn. This can be used in a given har­ness to up­date per­mis­sions, to­ken bud­gets, or en­vi­ron­ment con­text as an agent runs.

A note on ef­fort

Opus 4.8 de­faults to high ef­fort, which we judge to be the best over­all bal­ance of qual­ity and user ex­pe­ri­ence. On cod­ing tasks, this ef­fort level spends a sim­i­lar num­ber of to­kens as Opus 4.7’s de­fault, but with bet­ter per­for­mance. Users can choose extra” (“xhigh” in Claude Code) or max,” and the model will spend more to­kens to get bet­ter re­sults; we rec­om­mend us­ing extra” for dif­fi­cult tasks and long-run­ning asyn­chro­nous work­flows. We have in­creased rate lim­its in Claude Code to ac­com­mo­date the higher to­ken us­age of higher ef­fort lev­els; users can se­lect whichever makes sense for their par­tic­u­lar pro­ject.

What’s next?

Users will find Opus 4.8 to be a mod­est but tan­gi­ble im­prove­ment on its pre­de­ces­sor. There’s still more to be done: we’re work­ing on de­vel­op­ing and re­leas­ing mod­els that pro­vide many of the same ca­pa­bil­i­ties as Opus at a lower cost.

Not only that, but we plan to re­lease a new class of model with even higher in­tel­li­gence than Opus. As part of Project Glasswing, a small num­ber of or­ga­ni­za­tions are cur­rently us­ing Claude Mythos Preview for cy­ber­se­cu­rity work. Models of this ca­pa­bil­ity level re­quire stronger cy­ber safe­guards be­fore they can be gen­er­ally re­leased. We’re mak­ing swift progress on de­vel­op­ing these safe­guards and ex­pect to be able to bring Mythos-class mod­els to all our cus­tomers in the com­ing weeks.

Availability

Claude Opus 4.8 is avail­able every­where to­day. Pricing for reg­u­lar us­age is un­changed from Opus 4.7: $5 per mil­lion in­put to­kens and $25 per mil­lion out­put to­kens. Pricing for fast mode is $10 per mil­lion in­put to­kens and $50 per mil­lion out­put to­kens. Developers can use claude-opus-4 – 8 via the Claude API.

Related con­tent

Anthropic raises $65B in Series H fund­ing at $965B post-money val­u­a­tion

Read more

Anthropic opens Milan of­fice to sup­port Italian en­ter­prise, re­search, and de­vel­op­ers

We’re open­ing a new of­fice in Milan, our sixth in Europe.

Read more

Anthropic ap­points KiYoung Choi as Representative Director of Korea ahead of Seoul of­fice open­ing

Read more

Beyond Benchmarks: Disagreement Among Frontier LLMs on Real-World Fact-Checks

lenz.io

Lenz Research · Snapshot v1.0 · data as of May 21, 2026

67%

of real fact-checks, top AI mod­els don’t agree on the an­swer.

1,000 claims, rated by 5 fron­tier LLMs.

Jordanov, Kosta · Lenz Research · kosta@lenz.io

We pre­sented 1,000 re­cent real user claims to the five top fron­tier LLMs and asked each one for a ver­dict. These aren’t bench­mark items with pub­lic an­swer keys — they’re claims real users sub­mit­ted for ver­i­fi­ca­tion to a fact-check­ing plat­form. Only one ver­dict bucket can be cor­rect per claim, so any dis­agree­ment among the panel means at least one mod­el’s ver­dict is la­bel-in­con­sis­tent un­der this 4-bucket rubric (True / Mostly True / Misleading / False). On 67% of claims, the panel splits.

Key find­ings

67% of claims (672 / 1,000; 95% CI: 64 – 70%) have at least one fron­tier model dis­sent­ing from the panel ma­jor­ity — or no ma­jor­ity forms at all.

34% of claims (343 / 1,000; 95% CI: 31 – 37%) in­volve a 2+ bucket gap be­tween the most-dis­agree­ing pair of fron­tier ver­dicts — a sub­stan­tive dis­agree­ment on the an­swer, not just a cal­i­bra­tion shift.

Krippendorff’s α (ordinal) = 0.639 across 5 raters on 1,000 items — non­triv­ial but lim­ited agree­ment.

The panel con­verges on de­fin­i­tive ver­dicts; the mid­dle of the rubric is where it frac­tures. Within the 328 unan­i­mous claims, only 4 are unan­i­mous-Mis­lead­ing and 0 are unan­i­mous-Mostly-True.

Some mod­els con­cen­trate ver­dicts at the True/False poles; oth­ers spread across the mid­dle two buck­ets.

Contents

How of­ten the fron­tier dis­agrees

Substantive vs nu­ance dis­agree­ment

Model-vs-model agree­ment

Per-model be­hav­ior (verdict dis­tri­b­u­tion + agree­ment-with-rest)

Detailed re­sults (by do­main, ma­jor­ity ver­dict, unan­i­mous)

Data

Methodology

Reproducibility

Limitations

FAQ

Ethics & data use

Changelog

Appendix: Example claims where the fron­tier frac­tures

1How of­ten the fron­tier dis­agrees

On 67% of claims (672 / 1,000; 95% CI: 64 – 70%), the fron­tier panel does­n’t agree — at least one model dis­sents from the ma­jor­ity ver­dict, or no strict ma­jor­ity forms at all. The break­down:

For each claim we looked at the five fron­tier ver­dicts and asked: did at least three pick the same an­swer (a strict ma­jor­ity)? If yes, how many of the re­main­ing mod­els dis­sented? If no clear ma­jor­ity emerged at all — ver­dicts split across three or four dif­fer­ent buck­ets — the claim falls in the Models split, no ma­jor­ity row. Most of these claims are un­likely to ap­pear in any train­ing cor­pus with a gold la­bel at­tached — there’s no canon­i­cal an­swer key to pat­tern-match against, no bench­mark leader­board to an­chor to.

We re­fer be­low to the majority” and to dissent from the ma­jor­ity.” A ma­jor­ity of fron­tier mod­els is not ground truth. The ma­jor­ity ver­dict is some­times wrong; an in­di­vid­ual dis­sent­ing model is some­times right. We use the ma­jor­ity as a struc­tural ref­er­ence point for mea­sur­ing dis­agree­ment, not as a stand-in for cor­rect­ness.

Panel agree­ment: Krippendorff’s α (ordinal) = 0.639 (n=1000 claims, 5 raters). This in­di­cates non­triv­ial but lim­ited agree­ment: the mod­els’ ver­dicts are struc­tured rather than ran­dom, but not con­sis­tent enough to treat the panel as a sin­gle in­ter­change­able judge. Ordinal α is the stan­dard Krippendorff vari­ant for an or­dered cat­e­gor­i­cal scale (True / Mostly True / Misleading / False). See §7.5 Statistical analy­sis for choice of met­ric.

Lower bound on model er­ror. For each claim, ex­actly one of the four ver­dict buck­ets is the cor­rect an­swer. If we as­sume the pan­el’s most pop­u­lar bucket is the cor­rect one — the most char­i­ta­ble as­sump­tion — the min­i­mum num­ber of mod­els that picked a wrong ver­dict is:

≥1 model wrong on 67% of claims (any non-unan­i­mous panel)

≥2 wrong on 45% of claims (3 – 2, 3 – 1-1, or no-ma­jor­ity splits)

≥3 wrong on 13% of claims (no bucket reaches a ma­jor­ity, so at most 2 can be right)

Relaxing the most pop­u­lar is cor­rect” as­sump­tion can only raise these counts, never lower them. The ac­tual er­ror rates are likely higher still: even the 33% of cases where all five agree can and likely does in­clude shared blind spots.

2Substantive vs nu­ance dis­agree­ment

On 34% of claims (343 / 1,000; 95% CI: 31 – 37%), at least two fron­tier mod­els pick ver­dicts that are 2 or more buck­ets apart in our 4-bucket rubric — a dis­agree­ment that goes be­yond cal­i­bra­tion.

Not every dis­agree­ment is equal. A True” vs Mostly True” split is a con­fi­dence-cal­i­bra­tion shift. A True” vs False” split is a sub­stan­tive dis­agree­ment about the an­swer. We mea­sure this as the max pair­wise bucket dis­tance across the 5 ver­dicts on each claim, where the ver­dicts are or­dered True (0) → Mostly True (1) → Misleading (2) → False (3).

Caveat. Bucket dis­tance treats True / Mostly True / Misleading / False as an or­di­nal scale; an equal-spaced in­ter­pre­ta­tion is a sim­pli­fi­ca­tion. A 2-bucket gap can still re­flect rubric am­bi­gu­ity, tem­po­ral-fram­ing dif­fer­ences, or dif­fer­ing in­ter­pre­ta­tions of Misleading.” We re­port it as a coarse substantive vs nu­ance” in­di­ca­tor, not a met­ric of er­ror mag­ni­tude.

3Model-vs-model agree­ment

Highest peer agree­ment: Gemini 3 Pro × Gemini 3 Pro + Search (75%) — un­sur­pris­ing, since they share a base model. Lowest: Claude Opus 4.7 × Gemini 3 Pro, Claude Opus 4.7 × Gemini 3 Pro + Search and Gemini 3 Pro × Sonar Pro (53%) — three pairs tie at the floor.

How of­ten each pair of fron­tier mod­els picked the same ver­dict la­bel, across all claims in the cor­pus.

4Per-model be­hav­ior

Two an­gles on the same five mod­els: how each one dis­trib­utes its ver­dicts (4.1), and how of­ten each one’s ver­dict matches the strict ma­jor­ity of the other four (4.2).

4.1 Verdict dis­tri­b­u­tion

Some mod­els con­cen­trate ver­dicts at the True/False poles; oth­ers dis­trib­ute more broadly across the mid­dle two buck­ets. This re­flects model-level de­ci­sion pri­ors in­ter­act­ing with the spe­cific claims — with­out ground truth, we can’t sep­a­rate the two. The table be­low shows the share of claims each model as­signed to each bucket, with 95% Wilson CIs un­der­neath each cell.

4.2 Agreement with the rest of the panel

Across the five mod­els, peer-ma­jor­ity agree­ment ranges from 69% to 81%. This is peer-align­ment in this cor­pus, not cor­rect­ness — no model is treated as ground truth here, and el­i­gi­ble n dif­fers per row.

For each model, how of­ten does its ver­dict match the strict ma­jor­ity (≥3/4) of the other four? A claim is el­i­gi­ble only when a ≥3/4 ma­jor­ity ex­ists among the other four.

5Detailed re­sults

5.1 Per-domain fron­tier dis­agree­ment

Denominator per row: claims in that do­main (the Claims col­umn).

5.2 Per-verdict panel agree­ment

When the panel does land on a mid­dle bucket, it al­most never con­verges. Mostly True and Misleading ma­jori­ties reach una­nim­ity at most 5% of the time, vs 43 – 47% for True and False ma­jori­ties.

Consistent with this, work on a dif­fer­ent real-world cor­pus (17,856 PolitiFact claims with a sin­gle-fam­ily Llama-3 ab­la­tion, Schwab et al. 2025) finds nu­anced la­bels are where fact-check ver­dict mod­els con­cen­trate their er­rors — a re­lated ob­ser­va­tion from a dif­fer­ent method­olog­i­cal setup (single-family ab­la­tion, not a fron­tier panel).

Denominator: claims with a strict ≥3/5 fron­tier ma­jor­ity on this ver­dict. Reveals which ver­dict zones the panel is most/​least con­fi­dent about.

Viewed from the other di­rec­tion — of the 328 claims where all 5 fron­tier mod­els con­verged on the same ver­dict, the dis­tri­b­u­tion across ver­dicts:

6Data

1,000 claims — the most re­cent real-world user sub­mis­sions to a fact-check­ing plat­form that pass every el­i­gi­bil­ity fil­ter listed un­der Exclusions be­low. None of these claims is older than February 15, 2026. Unless oth­er­wise stated, every met­ric on this page uses this set as its de­nom­i­na­tor; ta­bles that use a dif­fer­ent de­nom­i­na­tor (e.g. claims with a strict ≥3/5 fron­tier ma­jor­ity on a ver­dict) state it in­line.

Provenance

These claims were sub­mit­ted to Lenz, a fact-check­ing plat­form. We chose this cor­pus be­cause it rep­re­sents or­ganic real-world fact-check re­quests rather than cu­rated bench­mark items. Lenz’s own ver­dict on each claim is not used in this analy­sis — this pa­per mea­sures fron­tier-model dis­agree­ment only, not Lenz vs the fron­tier.

Claim nor­mal­iza­tion

The atom­ic_­claim field in the CSV is not the user’s raw sub­mis­sion. It’s the out­put of Lenz’s fram­ing step, which strips emo­tional lan­guage and bias and dis­tills the in­put into a sin­gle neu­tral, testable propo­si­tion an­chored to the sub­mis­sion date. Frontier mod­els were rated against the framed claim, not the raw text. A user who types Canadian au­thor­i­ties are throw­ing Christians in jail for quot­ing the Bible!!!” is rated on the propo­si­tion As of April 4, 2026, Canadian au­thor­i­ties have jailed in­di­vid­u­als for pub­licly quot­ing the Bible be­cause of their Christian be­liefs.”

Exclusions

The cor­pus ex­cludes:

Claims marked pri­vate by the sub­mit­ting user

Claims from plat­form staff, in­ter­nal ac­counts, or agent/​API sub­mis­sions (only real user web sub­mis­sions ap­pear in the cor­pus)

Claims with ed­i­to­r­ial sta­tus pend­ing (not yet re­viewed) or hid­den — ei­ther depub­lished af­ter ed­i­to­r­ial re­view or auto-flagged at sub­mis­sion time by Lenz’s PII screen­ing step (containing per­sonal in­for­ma­tion about non-pub­lic in­di­vid­u­als)

Near-duplicate claims — pairs within a co­sine dis­tance of 0.2 on OpenAI text-em­bed­ding-3-small em­bed­dings (1536-dim) of the atom­ic_­claim are col­lapsed to a sin­gle canon­i­cal row. The newer claim be­comes canon­i­cal when the propo­si­tion is time-de­pen­dent; oth­er­wise the ex­ist­ing claim with the most views on Lenz wins. Only canon­i­cals ap­pear in this cor­pus.

Claims where at least one of the five fron­tier mod­els failed to pro­duce a parseable ver­dict, even af­ter one retry. Most of these resid­ual er­rors come from Gemini’s grounded-search API, which oc­ca­sion­ally re­turns mal­formed re­sponses; the rest are rare Anthropic re­fusals. Only claims with all five mod­els suc­cess­ful are in­cluded in the co­hort.

Claims older than 180 days (recency win­dow ap­plied at har­vest time)

7Methodology

7.1 Model se­lec­tion

Five fron­tier mod­els, cho­sen to cover two ca­pa­bil­ity sur­faces:

Parametric (training-only): GPT-5.4 (OpenAI), Claude Opus 4.7 (Anthropic), Gemini 3 Pro (Google)

Retrieval-augmented: Gemini 3 Pro + Search (Google), Sonar Pro (Perplexity)

7.2 Prompt

Each claim is pre­sented with an as of YYYY-MM-DD” an­chor match­ing the sub­mis­sion date, ask­ing the model to pick one of four ver­dicts:

Classify this claim as of <date>: <atomic claim>”

Output ex­actly one la­bel: True, Mostly True, Misleading, or False. No ex­pla­na­tions, no qual­i­fiers.

Verbatim prompt tem­plate ver­sion: us­r_v2. No Abstain op­tion is of­fered (a forced choice keeps the com­par­i­son sym­met­ric across mod­els). Unparseable out­puts are not re­clas­si­fied into a ver­dict bucket; claims with any parse er­ror are ex­cluded from the com­plete-claim co­hort.

7.3 LLM call con­fig­u­ra­tion

All five mod­els re­ceived the same sys­tem place­holder (.) and the same user prompt tem­plate (usr_v2). No struc­tured-out­put schema, tool-call schema, seed, top-p, or logit-bias con­trols were used. The har­vester re­quested de­ter­min­is­tic de­cod­ing where sup­ported (temperature=0.0); GPT-5.4 and Claude Opus 4.7 were called with­out an ex­plicit tem­per­a­ture be­cause their provider adapters re­ject cus­tom tem­per­a­ture set­tings. Output length was capped at 16 to­kens for GPT-5.4, Claude Opus 4.7, and Sonar Pro; Gemini 3 Pro and Gemini 3 Pro + Search used a 1024-token cap (lower caps pro­duced provider-side er­rors dur­ing har­vester de­vel­op­ment). Gemini 3 Pro + Search en­abled Google Search ground­ing; Sonar Pro was treated as re­trieval-aug­mented through Perplexity’s search-backed API. Parseable out­puts had to equal ex­actly one of the four la­bels af­ter nor­mal­iza­tion.

7.4 Grading

No LLM grader. All mea­sure­ments de­rive from di­rect parsed-la­bel equal­ity be­tween the 5 fron­tier ver­dicts on the same claim. No ref­er­ence la­bel or ground truth is used.

7.5 Statistical analy­sis

Sampling frame & in­fer­en­tial tar­get. The cor­pus is the 1,000 most re­cent el­i­gi­ble claims sub­mit­ted to this sin­gle fact-check­ing plat­form (per the fil­ters in §6) — not a prob­a­bil­ity sam­ple from any wider pop­u­la­tion, and not a com­plete enu­mer­a­tion (older el­i­gi­ble claims ex­ist but are ex­cluded by the cap). Reported Wilson 95% CIs are nom­i­nal bi­no­mial in­ter­vals un­der a model where each claim is an in­de­pen­dent draw from a hy­po­thet­i­cal stream of sim­i­lar el­i­gi­ble sub­mis­sions to this same plat­form un­der the same screen­ing rules. They are not cov­er­age state­ments about all real-world fact-checks.”

Non-iid caveat. Lenz claims are not in­de­pen­dently and iden­ti­cally dis­trib­uted: users clus­ter sub­mis­sions around news events, screen­ing se­lects for cer­tain top­ics, and in­di­vid­ual users of­ten sub­mit mul­ti­ple re­lated claims in a sin­gle ses­sion. True sam­pling vari­abil­ity un­der a more hon­est clus­ter model (e.g. clus­ter boot­strap) would likely be larger than what Wilson re­ports. We sur­face CIs as a min­i­mum pre­ci­sion floor, not a guar­an­teed cov­er­age in­ter­val.

Wilson 95% con­fi­dence in­ter­vals on every re­ported rate. We use the Wilson score in­ter­val [1] rather than the Wald (normal-approximation) in­ter­val be­cause it has bet­ter small-N be­hav­ior and han­dles bound­ary cases (p=0/n, p=n/​n) with­out pro­duc­ing de­gen­er­ate zero-width in­ter­vals. It is the de-facto stan­dard in mod­ern ML eval­u­a­tion lit­er­a­ture. Wilson CIs ap­pear in­line next to every rate in §1, §2, §3, §4.2, §5, and the ap­pen­dix; the printed bounds are ex­act, not cen­tered on the raw point es­ti­mate.

Inter-rater re­li­a­bil­ity — Krippendorff’s α (ordinal). The ver­dict scale (True / Mostly True / Misleading / False) is or­di­nal, so we score with Krippendorff’s α at the or­di­nal level of mea­sure­ment [2] rather than Fleiss’ κ (which treats cat­e­gories as nom­i­nal and would un­der­es­ti­mate agree­ment — a True ↔ Mostly True 1-bucket dis­agree­ment is much smaller than a True ↔ False po­lar split, and the or­di­nal met­ric re­flects that). α is re­ported as a sin­gle panel-level num­ber along­side the §1 re­sults table.

No model-vs-model sig­nif­i­cance test­ing. We re­port pair­wise agree­ment rates with 95% Wilson CIs as de­scrip­tive sta­tis­tics rather than treat­ing the page as a model leader­board. Pairwise sig­nif­i­cance tests are sen­si­tive to the com­par­i­son tar­get and el­i­gi­bil­ity set: for ex­am­ple, peer-ma­jor­ity agree­ment is a paired claim-level out­come, but each model has a dif­fer­ent set of claims where the other four mod­els form a strict ma­jor­ity.

References. [1] Wilson, E.B. (1927). Probable in­fer­ence, the law of suc­ces­sion, and sta­tis­ti­cal in­fer­ence.” Journal of the American Statistical Association 22, 209 – 212. [2] Krippendorff, K. (2004). Reliability in Content Analysis: Some Common Misconceptions and Recommendations.” Human Communication Research 30(3), 411 – 433.

8Reproducibility

Full per-claim data: down­load CSV. One row per claim — claim ID and URL, atomic claim text, the 5 fron­tier ver­dicts, max pair­wise bucket dis­tance, do­main, and cre­ation date. Strictly rec­tan­gu­lar, no pre­am­ble com­ments. The claim_url col­umn links each row back to the orig­i­nal claim page on Lenz; some pages may be un­avail­able if the user who sub­mit­ted the claim later deleted or pri­va­tized it.

PDF ar­ti­fact: down­load PDF. Browser-independent ren­der­ing of this page for of­fline read­ing, ci­ta­tion, or arxiv-style preprint host­ing. Hash-pinned in the snap­shot man­i­fest (pdf_sha256) so the PDF served at /v1.0/pdf is byte-iden­ti­cal across re-de­ploys.

This snap­shot is v1.0, data as of May 21, 2026. The archival URL /research/llm-disagreement/v1.0 per­ma­nently serves the v1.0 snap­shot — ci­ta­tion-sta­ble even when the bare URL bumps to a fu­ture ver­sion.

Harvester prompt ver­sion: us­r_v2. Grader: di­rect parsed-la­bel equal­ity across the 5 fron­tier ver­dicts. No LLM grader, no ref­er­ence ver­dict.

Permanent record & ci­ta­tion: doi.org/​10.5281/​zen­odo.20344847. The Zenodo de­posit mir­rors the PDF ar­ti­fact un­der a per­ma­nent DOI for ci­ta­tion in aca­d­e­mic and preprint con­texts.

9Limitations

nytimes.com

www.nytimes.com

Please en­able JS and dis­able any ad blocker

Massively Multiplayer Online Rave

hallucinate.site

University of California math professors demand return of SAT for STEM admissions - Los Angeles Times

www.latimes.com

More than 600 University of California fac­ulty mem­bers, led by math­e­mati­cians at UC Berkeley, are call­ing on the sys­tem to re­in­state stan­dard­ized test­ing re­quire­ments for sci­ence, tech­nol­ogy, en­gi­neer­ing and math­e­mat­ics ap­pli­cants, say­ing that six years of test-free ad­mis­sions has not re­li­ably as­sessed readi­ness and pro­fes­sors are of­ten teach­ing mid­dle school math to in­com­ing stu­dents.

Without stan­dard­ized test­ing in ad­mis­sions, pro­fes­sors said they don’t know whether in­com­ing stu­dents can han­dle col­lege-level math. The open let­ter, ad­dressed to top UC lead­ers, asks for SAT or ACT ex­ams to be re­quired be­gin­ning in fall 2027 and for STEM fac­ulty to be given for­mal over­sight of readi­ness stan­dards in their ma­jors.

We now ob­serve prepa­ra­tion gaps so se­vere that in­struc­tors must reteach mid­dle-school math­e­mat­ics while si­mul­ta­ne­ously teach­ing the ma­te­r­ial stu­dents need for sci­ences, en­gi­neer­ing, eco­nom­ics, and other quan­ti­ta­tively de­mand­ing fields,” they warned.

Over three years — from fall 2021 to fall 2023 — the let­ter said, at least 20% of Berkeley first-se­mes­ter cal­cu­lus stu­dents who took a di­ag­nos­tic exam showed deficits. Basic math­e­mat­i­cal flu­ency is anal­o­gous to lit­er­acy; with­out it, suc­cess in uni­ver­sity-level STEM be­comes struc­turally un­at­tain­able for stu­dents,” fac­ulty wrote.

The let­ter lands days be­fore the UC Academic Senate’s Board of Admissions and Relations with Schools is sched­uled to dis­cuss sys­tem-wide ad­mis­sions changes, which could be the first step to­ward a pos­si­ble re­turn of stan­dard­ized test­ing at the na­tion’s largest pub­lic re­search uni­ver­sity sys­tem.

A land­mark de­ci­sion un­der scrutiny

UC gained na­tional at­ten­tion in May 2020 when re­gents unan­i­mously voted to sus­pend SAT and ACT test­ing re­quire­ments and elim­i­nate them en­tirely by 2025. Board mem­bers cited con­cerns the tests were bi­ased against stu­dents of color and those from lower-in­come fam­i­lies — in­clud­ing stu­dents who did not have ac­cess to prep courses.

At the time, some hailed the vote as a bold and vi­sion­ary move to ex­pand ac­cess and eq­uity.

But the vote went against the UC Academic Senate’s own Standardized Testing Task Force, which said use of test scores could ac­tu­ally boost ad­mis­sion rates for stu­dents from dis­ad­van­taged back­grounds and school dis­tricts. The re­port also found that test scores are a bet­ter pre­dic­tor of col­lege per­for­mance than high school grades, but that UC weighed grades more heav­ily in ad­mis­sion de­ci­sions.

Then in 2020, a California state court judge is­sued an in­junc­tion in a law­suit brought by stu­dents, which forced UC to stop us­ing the scores ear­lier than planned.

In the midst of the COVID-19 pan­demic, cam­puses across the coun­try also sus­pended ad­mis­sions test­ing re­quire­ments, in­clud­ing many of the na­tion’s most pres­ti­gious in­sti­tu­tions. The re­quire­ment has largely re­sumed at elite uni­ver­si­ties.

Harvard, Brown, Dartmouth, the University of Pennsylvania, Stanford and Caltech each re­stored stan­dard­ized test­ing re­quire­ments for ap­pli­cants in 2024 or 2025. USC is test-op­tional and scores are con­sid­ered as part of holis­tic re­view, but stu­dents are not pe­nal­ized if they do not sub­mit them.

UCs pol­icy — as well as California State University‘s — per­mits ap­pli­cants to sub­mit scores for course place­ment pur­poses, but only af­ter ad­mis­sions de­ci­sions have been made.

UC lead­er­ship has not for­mally en­dorsed the fac­ulty let­ter on test­ing, but sys­tem lead­ers said Wednesday that they were lis­ten­ing to the un­der­ly­ing con­cerns.

Rachel Zaentz, a UC spokesper­son, said in a state­ment that the sys­tem will con­tinue to fo­cus on strength­en­ing in­struc­tion, col­lab­o­ra­tion and sup­port” for math readi­ness in part­ner­ship with K-12 and higher ed­u­ca­tion in­sti­tu­tions.

Ahmet Palazoglu, chair of the UC sys­temwide Academic Senate, said in a state­ment that he has heard concerns raised by UC fac­ulty about stu­dent pre­pared­ness for un­der­grad­u­ate study,” and that he has called on the sys­tem-wide ad­mis­sions board to ad­dress timely top­ics tied to stu­dents’ col­lege readi­ness and UCs ad­mis­sion process.”

The board, he said, is in the process of propos­ing a roadmap of pol­icy work and part­ner­ship build­ing with other state and K-12 ed­u­ca­tion lead­ers in the next aca­d­e­mic year and be­yond.”

Mounting UC con­cerns over math

Fissures have erupted within UC over ad­mis­sions tests and math readi­ness. In November, a UC San Diego Academic Senate work group re­port said it doc­u­mented a roughly thirty-fold in­crease be­tween 2020 and 2025 in in­com­ing first-year stu­dents whose math skills tested be­low high school level. The re­port said 70% of those stu­dents fell be­low mid­dle school lev­els.

Work group mem­bers ad­vo­cated for a systemwide re­ex­am­i­na­tion of stan­dard­ized test­ing, as many peer in­sti­tu­tions have al­ready done.”

Zvezda Stankova, a teach­ing pro­fes­sor in the Berkeley math­e­mat­ics de­part­ment who is one of the let­ter’s lead or­ga­niz­ers, said the im­pe­tus to pub­licly speak out came in part from her own class­rooms. She de­scribed a chal­leng­ing spring 2023 cal­cu­lus II class, which stood out in her nearly 30 years of teach­ing.

Something had changed dras­ti­cally. The bot­tom was taken out, and there were 25 to 30% of the stu­dents who were in free fall. There was noth­ing you could do for them. They were just not pre­pared.”

Stankova said her col­leagues were brac­ing for sharp crit­i­cism. Our let­ter is go­ing to be at­tacked from all sides,” she said. The math pro­fes­sor ar­gued that the SAT push was in aid of dis­ad­van­taged stu­dents.

I don’t see SAT hurt­ing di­ver­sity. I ac­tu­ally see it help­ing it, be­cause you have right now the lack of SATs hurt­ing the un­der­rep­re­sented mi­nori­ties. You give them a ticket, an en­trance ticket to a great uni­ver­sity sys­tem like UC, only that they fail. How is that di­ver­sity?” Stankova said.

Not all see a re­turn to test­ing as the best path. A September 2025 re­port by Saul Geiser of the UC Berkeley Center for Studies in Higher Education and a for­mer se­nior UC ad­mis­sions of­fi­cial, said the SAT is a poor fit for America’s pub­lic uni­ver­si­ties.”

Geiser ar­gued that the high school GPA out­per­forms the SAT in pre­dict­ing first-year stu­dent suc­cess once in­come and race are con­trolled. He also ar­gued that rank­ing ap­pli­cants by SAT scores ends up dis­ad­van­tag­ing high-achiev­ing low-in­come, first-gen­er­a­tion and un­der­rep­re­sented mi­nori­ties.

How pre­pared are California high school stu­dents in math?

California’s ag­gre­gate test­ing data com­pli­cate the pic­ture.

Overall, in math, the state’s stu­dents are about a quar­ter-year in in­struc­tion be­hind where they were prior to the start of the COVID-19 pan­demic in March 2020. A quar­ter-year of in­struc­tion trans­lates to about 45 school days or about nine weeks of the school year.

Statewide, 37.3% of stu­dents meet math learn­ing stan­dards in the grades that are tested.

In 11th grade, the most rel­e­vant grade re­lat­ing to col­lege readi­ness, 30.5% of stu­dents met or ex­ceeded math learn­ing stan­dards. Of these, nearly half ex­ceeded the learn­ing stan­dard — mark­ing them as likely to be the best pre­pared for a col­lege STEM ma­jor.

Any change to UC ad­mis­sions re­quire­ments must move through the Academic Senate ad­mis­sions board com­mit­tee be­fore go­ing to the Board of Regents. Minutes from the ad­mis­sions board‘s March 6 meet­ing show mem­bers sig­naled ten­ta­tive in­ter­est in even­tu­ally re­quir­ing 11th-grade Smarter Balanced as­sess­ment scores for California res­i­dents and SAT or ACT scores for non­res­i­dents.

The board plans to sub­mit an ini­tial draft by Sunday and a final road map” by June 30.

Times staff writer Howard Blume con­tributed to this re­port.

More to Read

I'm Getting Into Mesh Networks... (Meshtastic, MeshCore, and Reticulum)

www.jonaharagon.com

I love net­work­ing, a lot. So much so that I’ve run my own ISP since 2024, com­plete with its own ASN, IPv4/6 ad­dress space, fiber op­tics, etc.

However, do this and you quickly re­al­ize how re­liant you still re­main on cen­tral ser­vice providers. The in­ter­net is a mesh, but the real play­ers on that mesh are few, far be­tween, and easy to co­erce into cen­sor­ship and other bad things. Even af­ter as­cend­ing to the lofty realms of di­rect BGP peer­ing my­self, my ac­cess to those re­sources is locked be­hind yearly fees from ARIN. Ownership of the real es­tate” of the in­ter­net, IP ad­dresses, no longer ex­ists.

The tragedy of mod­ern com­put­ing is that the lo­cal com­pute we own in our of­fices, on our laps, or in the palm of our hands is mas­sively, mas­sively pow­er­ful, but Big Tech com­pa­nies ac­tively refuse to take ad­van­tage of that fact. Why are you, I, and our neigh­bors largely rel­e­gated to con­sum­ing ac­cess from big play­ers when the com­put­ers we have are ca­pa­ble of so much more?

Mesh net­work­ing, send­ing pack­ets of data through many di­rectly in­ter­con­nected peers in­stead of through cen­tral dat­a­cen­ters, promises to free us from our re­liance on cen­tral ser­vice providers, and it’s some­thing I’ve been re­ally ex­cited about lately.

Of course, there is good rea­son for how the in­ter­net is cur­rently de­signed. High band­width con­nec­tions are costly, and for some ap­pli­ca­tions low­er­ing la­tency as much as pos­si­ble is very im­por­tant, which re­al­is­ti­cally re­quires con­ti­nent- and ocean-span­ning fiber op­tics with as few mid­dle-men as pos­si­ble.

This does­n’t mean we need to put all our eggs in this bas­ket though. While band­width-in­ten­sive ser­vices like Netflix or la­tency-sen­si­tive ser­vices like gam­ing are not likely to come to mesh net­works any­time soon, there are a vast num­ber of ap­pli­ca­tions which are per­fectly suited to mesh net­work­ing:

Messaging, so­cial net­work­ing, and gen­eral in­for­ma­tion shar­ing are very prac­ti­cal uses for mesh net­work­ing where ac­cess, cen­sor­ship re­sis­tance, and re­siliency are in­creas­ingly crit­i­cal for many peo­ple around the world.

For ap­pli­ca­tions like this, we don’t need to trench fiber con­nec­tions through the ground to get every­one con­nected. In the mod­ern mesh net­work­ing space, much of the in­no­va­tion is hap­pen­ing in the LoRa ra­dio space.

LoRa ra­dios use li­cense-free, sub-gi­ga­hertz ra­dio bands that are avail­able for pub­lic use in nearly all coun­tries around the world. Compared to the li­cense-free 2.4 GHz or 5 GHz ra­dio you’d rec­og­nize from Wi-Fi (but is also used by many other tech­nolo­gies), LoRa op­er­ates at much lower power and, im­por­tantly and si­mul­ta­ne­ously, at much longer range.

Mesh net­work­ing over the air­waves pre­sents a very unique op­por­tu­nity for our so­ci­eties. We could build a re­silient, peer-to-peer net­work that co­ex­ists with the in­ter­net, en­abling con­nec­tiv­ity in cur­rently un­der­served re­gions and in­creas­ing our per­sonal sov­er­eignty on­line by main­tain­ing a func­tional backup to the in­ter­net for our most crit­i­cal needs.

It’s also just a free­ing feel­ing to be able to send a mes­sage to some­one else re­ly­ing only on de­vices that you and peo­ple in your net­work own out­right, in­stead of rent­ing the ca­pa­bil­ity to do so from your lo­cal ISP or Elon Musk’s Starlink.

Meshtastic

The ob­vi­ous fron­trun­ner in the mesh net­work­ing space is Meshtastic, mostly be­cause they were the first in the con­sumer LoRa mesh space, or at the very least the first to do it pretty well.

It’s easy to see why Meshtastic has quite a bit of pop­u­lar­ity: it’s a real prod­uct de­signed with a spe­cific use-case in mind (mobile mes­sag­ing and de­vice track­ing, pri­mar­ily), not a tech­ni­cal pro­ject just try­ing to build a net­work and hop­ing the use-case comes later.

This is very ap­peal­ing for most peo­ple who just want some­thing they can buy and use out of the box, like a set of walkie-talkies from Wal-Mart. Unfortunately, much like those cheap walkie-talkies in com­par­i­son to more se­ri­ous tech­nol­ogy like am­a­teur ra­dio, Meshtastic’s core de­sign holds the plat­form back from achiev­ing its full po­ten­tial.

Meshtastic’s first-mover ad­van­tage is pretty hard to over­come, es­pe­cially when it al­ready works rea­son­ably well for small, pri­vate groups like hik­ers or event-go­ers.

For a very large and pub­lic mesh, how­ever, it’s be­come clear to most peo­ple that Meshtastic by de­sign is a fairly un­ten­able so­lu­tion. Some pub­lic mesh groups have in­creased the band­width avail­able to Meshtastic by giv­ing up some range, but it’s a stop-gap so­lu­tion that does­n’t fix the prob­lems Meshtastic has in this en­vi­ron­ment at the end of the day.

💡

To be per­fectly up­front with you, this post will be gloss­ing over many Meshtastic and MeshCore fea­tures, be­cause I feel they are both non-se­ri­ous so­lu­tions com­pared to Reticulum for rea­sons I will ex­plain later on in this post. I can al­most guar­an­tee I have been run­ning Meshtastic and MeshCore for longer and with more in­fra­struc­ture than you, and in fact I still do, de­spite not re­ally be­liev­ing in their long-term suc­cess, so… Any omis­sions or lack of prob­lem solv­ing” in this post are not due to a lack of knowl­edge, but sim­ply be­cause fully de­tail­ing the many prob­lems they have is out­side the scope of this ar­ti­cle.

I think most peo­ple who are se­ri­ous about pub­lic mesh net­work­ing have moved on to re­search­ing other so­lu­tions, or will have to soon.

MeshCore

MeshCore is one of these po­ten­tial so­lu­tions some pub­lic mesh groups have be­gun switch­ing to.

While Meshtastic’s orig­i­nal de­sign es­sen­tially floods the net­work with every mes­sage be­ing sent, hop­ing it even­tu­ally reaches the cor­rect des­ti­na­tion, MeshCore has an ac­tual rout­ing sys­tem that can send mes­sages only through a path of spe­cific de­vices on the mesh that in­clude the sender and re­cip­i­ent.

This re­sults in a mon­u­men­tal re­duc­tion in ra­dio trans­mis­sions, the ad­van­tages of which can’t be un­der­stated. It makes the net­work less con­gested and more re­li­able, and for peo­ple who are mainly in­ter­ested in mes­sag­ing as op­posed to shar­ing sen­sor/​lo­ca­tion data that Meshtastic re­mains well-suited for, it’s no sur­prise that many larger groups have be­gun to shift to MeshCore.

Unfortunately, MeshCore is not a true mesh in the way pub­lic mesh en­thu­si­asts would prob­a­bly like it to be. At a very high level, de­vices in MeshCore are bro­ken up into two cat­e­gories: com­pan­ions and re­peaters.

Companion de­vices would be what most peo­ple use to send and re­ceive mes­sages, while re­peaters are the de­vices which ac­tu­ally mesh with each other and ex­tend the over­all net­work’s range. What this means is that a com­pan­ion al­ways has to be in range of a re­peater to ac­cess the net­work, com­pan­ions never re­lay mes­sages be­tween them­selves on be­half of other com­pan­ions.

There are ad­van­tages to this ap­proach. MeshCore al­lows mes­sages to tra­verse up to 64 hops away, which is an enor­mous real-world scale when LoRa re­peaters can be many miles apart in ideal con­di­tions. Even in the best pos­si­ble case, Meshtastic’s de­fault 3-hop limit (albeit con­fig­urable up to 7) places a real limit on how far mes­sages can spread.

It’s very true that any­one can par­tic­i­pate in MeshCore as a re­peater, so all the tools to build your own mesh are cer­tainly there. It’s just that it re­quires some ad­di­tional plan­ning, co­or­di­na­tion, and cen­tral­iza­tion that I don’t view as to­tally nec­es­sary.

The big­ger prob­lem I have with MeshCore is that many parts of it are pro­pri­etary. While the un­der­ly­ing pro­to­col and the firmware for some ra­dios is open source, all of the of­fi­cial MeshCore clients are pro­pri­etary, and even have fea­tures pay­walled.

Proprietary soft­ware is sim­ply not dis­as­ter-ready, and that goes dou­bly so for soft­ware re­liant on cen­tral pay­ment proces­sors.

I am not the type of per­son who needs to use open-source soft­ware 100% of the time by any means (although it’d be nice), but the only point in my mind of hav­ing an off-grid mesh net­work in the first place is to­tal free­dom and con­trol, so in this par­tic­u­lar case I sim­ply can­not ever sup­port a closed-source so­lu­tion.

Efforts are al­ready be­ing made to cre­ate un­of­fi­cial open source clients for MeshCore. I won’t dis­count this fact, but at the end of the day most peo­ple in the MeshCore ecosys­tem will be in the of­fi­cial, pro­pri­etary ecosys­tem, and I don’t think MeshCore has enough ad­van­tages, users, or re­li­a­bil­ity to war­rant adopt­ing at this very early stage of mesh net­work­ing.

We have a unique op­por­tu­nity as en­thu­si­asts to adopt the best mesh net­work­ing so­lu­tion we can, be­fore the network ef­fect” truly sets in and locks peo­ple in place to a par­tic­u­lar plat­form. I think we can do bet­ter than Meshtastic and MeshCore.

Problems

Unfortunately, both Meshtastic and MeshCore are highly lim­ited, and don’t scale well. Meshtastic barely can scale to a re­gional mesh in ideal sce­nar­ios, and while MeshCore fares bet­ter here, it’s still un­likely to scale to the size of many larger re­gions, much less coun­tries or plan­ets.

The thing is, Meshtastic and MeshCore are both more ap­pli­ca­tions than they are pro­to­cols. They en­able sim­ple in­stant mes­sag­ing com­mu­ni­ca­tion over LoRa, but they don’t give much thought to mesh net­work­ing ap­pli­ca­tions be­yond what their client apps of­fi­cially sup­port.

They’re geared to­wards com­mu­ni­cat­ing with a small, lo­cal group, and any pub­lic meshes on these net­works are re­ally ex­cep­tions rather than the stan­dard use-case.

Another prob­lem is that Meshtastic and MeshCore both rely on LoRa pretty much ex­clu­sively.

LoRa is very cool for build­ing ad-hoc, low-band­width mesh net­works, and it’s a bless­ing that we have it avail­able in most coun­tries as an un­li­censed op­tion we can op­er­ate on with­out a ham ra­dio li­cense, and with mod­ern dig­i­tal tech­nolo­gies like en­cryp­tion that are mostly for­bid­den on am­a­teur ra­dio.

It’s hardly the per­fect so­lu­tion for many sce­nar­ios though, and it is quite slow.

In a per­fect world, the mesh net­work­ing/​rout­ing soft­ware would be com­pletely in­de­pen­dent of the phys­i­cal net­work that con­nects all these de­vices to­gether.

For ex­am­ple, I want to be able to build cheap, lo­cal LoRa net­works in neigh­bor­hoods and other re­gional com­mu­ni­ties, and in­ter­con­nect them with more pow­er­ful point-to-point mi­crowave con­nec­tions, or even fiber or the in­ter­net.

Meshtastic (and I be­lieve MeshCore via some un­of­fi­cial gateways”) have some hacky meth­ods of in­ter­con­nect­ing dif­fer­ent meshes with MQTT, but the ex­pe­ri­ence is quite poor and it’s clear that this type of con­nec­tiv­ity is not a first-class ex­pe­ri­ence on the net­work. Especially on Meshtastic, bridg­ing to the in­ter­net with MQTT can de­grade the net­work so much that it be­comes im­prac­ti­cal to use among any more than a hand­ful of peo­ple.

It should be pos­si­ble to build a mesh rout­ing so­lu­tion that in­tel­li­gently routes pack­ets be­tween nodes over many dif­fer­ent types of con­nec­tions, com­pletely seam­lessly so that the ex­pe­ri­ence of us­ing the mesh never changes based on your spe­cific in­ter­face.

Luckily for us, this has al­ready been done!

Reticulum

Reticulum is ex­tremely cool.

It’s a net­work­ing stack that in­tel­li­gently pro­vides strongly-en­crypted rout­ing over a wide va­ri­ety of phys­i­cal net­works, in­clud­ing LoRa. Like MeshCore, it has au­to­matic rout­ing over paths on the net­work, but un­like MeshCore, those paths can tra­verse not only over LoRa but over any sup­ported in­ter­face.

Additionally, like Meshtastic, Reticulum es­sen­tially works out of the box with de­vices op­er­at­ing on the same lo­cal net­work. Connect two de­vices on the same LoRa fre­quency and you have a func­tional mesh right away, with no ad­vanced net­work­ing skills or ded­i­cated re­peaters re­quired.

This makes Reticulum very well suited both for small, pri­vate net­works like Meshtastic, and for very large net­works that MeshCore tends to work bet­ter with, likely go­ing be­yond MeshCore’s ca­pa­bil­i­ties with promises of plan­e­tary-level scal­a­bil­ity.

The best part is that you can eas­ily start small with Reticulum and every­thing will work per­fectly fine, but once a mem­ber of your small net­work in­ter­con­nects with a dif­fer­ent Reticulum net­work si­mul­ta­ne­ously, the net­works can seam­lessly com­bine, en­abling com­plete in­ter­op­er­abil­ity with­out the need to fid­dle with set­tings or all agree on some com­mon ra­dio pro­to­col as in Meshtastic-land.

You can mix-and-match Reticulum con­nec­tions over LoRa ra­dio, your lo­cal LAN net­work, point-to-point Wi-Fi or mi­crowave con­nec­tions, the in­ter­net, Tor or I2P, and even over net­works like packet ra­dio for the ham en­thu­si­asts out there.

Many Networks

Essentially, Reticulum can the­o­ret­i­cally sup­port any net­work you can in­ter­act with via TCP, UDP, or just a sim­ple se­r­ial in­ter­face. It takes the band­width of all the net­works you con­nect to into ac­count when de­ter­min­ing the best paths for mes­sages to tra­verse, op­ti­miz­ing both for dis­tance and to use the re­sources of each phys­i­cal net­work ef­fi­ciently.

Heterogeneous con­nec­tiv­ity is Reticulum’s bread and but­ter, ba­si­cally. The pro­jec­t’s doc­u­men­ta­tion prob­a­bly says it best:

In con­ven­tional net­work­ing, mix­ing dif­fer­ent trans­port medi­ums typ­i­cally re­quires gate­ways, trans­la­tion lay­ers, and care­ful con­fig­u­ra­tion. A WiFi net­work does­n’t na­tively in­ter­op­er­ate with a packet ra­dio net­work with­out ad­di­tional in­fra­struc­ture, and you can’t just down­load a car over a se­r­ial port, or send an en­crypted mes­sage in a QR code. Reticulum treats het­ero­gene­ity as a core premise. The pro­to­col is de­signed to seam­lessly mix medi­ums with vastly dif­fer­ent char­ac­ter­is­tics […]

For net­work de­sign­ers, this means you are free to use what­ever medi­ums are avail­able, af­ford­able, or ap­pro­pri­ate for your sit­u­a­tion. You might use LoRa for wide-area low-band­width cov­er­age, WiFi for lo­cal high-ca­pac­ity links, I2P for anony­mous Internet con­nec­tiv­ity, and Ethernet for in­fra­struc­ture back­hauls, all within the same net­work. Reticulum han­dles the trans­la­tion and co­or­di­na­tion au­to­mat­i­cally.

While I don’t think home­grown mesh net­works should be re­liant on the in­ter­net or I2P in the long run, I think first-class sup­port for con­nec­tiv­ity over TCP and other in­ter­net pro­to­cols is still a sig­nif­i­cant ad­van­tage for peo­ple striv­ing to build lo­cal, pub­lic mesh net­works.

Interconnected Local Meshes

Distinct lo­cal groups be­ing able to in­ter­con­nect is a huge boon for con­tent avail­abil­ity on the net­work, and the beauty is that all these net­work links in Reticulum au­to­mat­i­cally be­come re­dun­dant as more con­nec­tions are made.

A lo­cal mesh in Minneapolis could in­ter­con­nect with a lo­cal mesh in Chicago over the in­ter­net, for ex­am­ple, but per­haps in the fu­ture some ded­i­cated net­work op­er­a­tors are also able to es­tab­lish a di­rect con­nec­tion via mi­crowave or LoRa be­tween those cities. Connections may nor­mally con­tinue to tra­verse the in­ter­net at higher speeds, but in the event of an out­age those al­ter­nate/​ad-hoc paths can take over seam­lessly, be­cause they’re all just paths on the same, sin­gle Reticulum net­work.

In the worst case sce­nario, a lo­cal Reticulum mesh lack­ing any con­nec­tions to other Reticulum net­works is go­ing to fall back to only re­gional con­tent avail­abil­ity, which is still the most that you’ll re­al­is­ti­cally get from Meshtastic and MeshCore.

Perhaps just as im­por­tantly, Reticulum en­ables con­nec­tiv­ity across bor­ders. LoRa has a bit of a prob­lem, which is that it op­er­ates on dif­fer­ent fre­quen­cies in dif­fer­ent ju­ris­dic­tions. While LoRa op­er­ates on 915 MHz at up to 1 watt of power in the United States, it runs at 868 MHz (or 433 MHz) at much lower power lev­els in much of Europe, or 923 MHz in Asia, etc.

This means that a Meshtastic or MeshCore net­work in Asia will never na­tively con­nect to one in Europe. Again this could be worked around with hacky” bridge so­lu­tions like MQTT, but Reticulum would na­tively in­ter­con­nect two dif­fer­ent LoRa net­works seam­lessly as long as one com­mon gate­way point could be found: For ex­am­ple, an 868 MHz ra­dio in one coun­try con­nected to a 923 MHz ra­dio in an­other via some other means like a fiber link, or a 2.4 GHz mi­crowave con­nec­tion, or the in­ter­net, or even packet ra­dio. As long as some con­nec­tiv­ity point (or mul­ti­ple) can be found, then Reticulum rout­ing be­tween dif­fer­ent phys­i­cal net­works would work seam­lessly with­out any cen­tral servers re­quired.

This is all pos­si­ble with­out any sort of cen­tral co­or­di­na­tion, which is the most im­por­tant part of build­ing a truly de­cen­tral­ized net­work. Network op­er­a­tors are free to cre­ate seg­ments of the net­work in any way they see fit, and if/​when those seg­ments be­come con­nected, Reticulum han­dles the con­ver­gence of those net­works au­to­mat­i­cally.

Reticulum’s ad­dress space is global, and every node has a unique ad­dress guar­an­teed by the en­cryp­tion Reticulum uses. There’s no po­ten­tial for dif­fer­ent Reticulum net­works to have over­lap­ping ad­dresses, and no need for a cen­tral au­thor­ity to hand out ad­dresses in the fash­ion that IANA/ARIN/RIPE/etc. hand out IP ad­dresses on the tra­di­tional in­ter­net.

Reticulum Applications

Although build­ing the net­work well gives Reticulum a dis­tinct ad­van­tage over Meshtastic and MeshCore, it is not enough to just build a net­work alone.

The de­vel­oper of Reticulum re­al­ized this, of course, and built a num­ber of ap­pli­ca­tions that work on Reticulum seam­lessly: NomadNet is one of the most pop­u­lar, pro­vid­ing mes­sag­ing, file shar­ing, and text-based brows­ing in a ter­mi­nal app (which has mouse sup­port too, thank­fully).

Using a ter­mi­nal won’t ap­peal to many peo­ple though, so you can also use Sideband, a GUI app for Android and PC, or Meshchat to com­mu­ni­cate, as well as many other apps which use Reticulum.

Fortunately, many of these com­mu­ni­ca­tions apps work with each other, so you can choose be­tween them at will. While you can re­ally build es­sen­tially any app or pro­to­col on top of Reticulum, many of the mes­sen­gers stan­dard­ize on a hand­ful of home­made pro­to­cols: LXMF, LXST, and RRC.

While un­der­stand­ing what these pro­to­cols mean is not par­tic­u­larly im­por­tant for peo­ple just look­ing to use the net­work, the im­por­tant thing to know is that Reticulum al­ready has an ecosys­tem of apps that con­nect with each other gen­er­ally us­ing the same un­der­ly­ing pro­to­cols, and pro­vide sim­i­lar func­tion­al­ity to Meshtastic’s and MeshCore’s apps for mes­sag­ing and other func­tions.

Reticulum’s Biggest Problem

Unfortunately, de­spite be­ing what I’d con­sider per­haps the per­fect pub­lic mesh net­work­ing plat­form, Reticulum has a big draw­back hold­ing it back from sup­plant­ing pub­lic MeshCore and Meshtastic net­works to­day, and it’s not the apps or any­thing to do with the soft­ware.

Reticulum’s main prob­lem is that it does not have ded­i­cated firmware for LoRa ra­dios in the same way Meshtastic and MeshCore do.

Installing Meshtastic on a cheap de­vice like a Heltec V3 cre­ates a full, stand­alone Meshtastic node ca­pa­ble of send­ing & re­ceiv­ing mes­sages, and re­lay­ing data through­out the net­work.

With Reticulum you can use the same cheap hard­ware with firmware called RNode to make LoRa con­nec­tions. However, Reticulum’s RNode firmware func­tions ba­si­cally as a LoRa mo­dem for a con­nected com­puter, not as a stand­alone mesh node it­self.

RNode is com­pletely un­in­tel­li­gent, re­quir­ing a con­nec­tion to a com­puter run­ning Reticulum to send and re­ceive mes­sages and to route mes­sages to other nodes on the Reticulum net­work.

For most peo­ple who would sim­ply be us­ing the net­work, this is not ac­tu­ally a prob­lem. Even with Meshtastic, it is very rare for most peo­ple to try and com­mu­ni­cate us­ing only a stand­alone de­vice (one ex­cep­tion be­ing the LILYGO T-Deck).

Instead, peo­ple com­monly con­nect their Meshtastic-enabled LoRa ra­dios to their phones or their com­put­ers, all of which are very ca­pa­ble de­vices which could eas­ily run Reticulum while con­nected to RNode if peo­ple wanted to switch.

Where this does pre­sent a real prob­lem to Reticulum is where it comes to in­fra­struc­ture.

In the world of Meshtastic and MeshCore, many peo­ple are run­ning re­mote and of­ten so­lar-pow­ered nodes on high hill­tops or build­ings to self­lessly in­crease the ca­pac­ity of the net­work.

When it comes to Reticulum, these re­mote nodes would not only need a LoRa ra­dio run­ning RNode, but a full com­puter run­ning Reticulum to en­able the mesh ca­pa­bil­i­ties. This com­puter could be as sim­ple as a Raspberry Pi Zero, but even that level of ad­di­tional price and added power con­sump­tion makes this setup fairly un­ten­able for many un­at­tended in­stal­la­tions, es­pe­cially so­lar-pow­ered ones.

Progress is be­ing made on this front. In par­tic­u­lar, a port called mi­croRetic­u­lum for ESP32 and bet­ter de­vices is con­tin­u­ally be­ing de­vel­oped. I re­ally hope it suc­ceeds, be­cause the abil­ity for cur­rent Meshtastic and MeshCore op­er­a­tors to switch to Reticulum rout­ing with no ad­di­tional hard­ware nec­es­sary could in­stantly, rad­i­cally boost the adop­tion of a much more ca­pa­ble pub­lic mesh net­work like Reticulum in many com­mu­ni­ties.

What’s Next

There’s work to be done here, but at the end of the day Reticulum is the only so­lu­tion that promises to let peo­ple cre­ate lo­cal net­works, large and small, and in­ter­con­nect those net­works or­gan­i­cally into a seam­less, global mesh.

It’s a killer fea­ture that I think many peo­ple in­tu­itively want from Meshtastic and MeshCore, but that those net­works will never ac­tu­ally be able to pro­vide.

I see a lot of ad­van­tages to all three of these apps. Meshtastic is re­ally slick for a group of hik­ers who just want to text and share GPS eas­ily in­stead of us­ing voice walkie-talkies. MeshCore has some com­pelling fea­tures for lo­cal/​neigh­bor­hood mes­sag­ing, per­haps off-grid mes­sag­ing at a large event like DEF CON.

If that’s your use-case, then cool, but I am beg­ging peo­ple with grander am­bi­tions in the mesh net­work­ing space to fo­cus on the most prac­ti­cal so­lu­tion to the prob­lem here.

There are many groups cre­at­ing a re­gion-wide or larger pub­lic Meshtastic net­work, when it is sim­ply the wrong so­lu­tion to the prob­lem at hand, and this should be self-ev­i­dent to any­one who’s tried to use Meshtastic in this sce­nario.

I think many peo­ple get swept up in collecting nodes” on Meshtastic and the thrill of see­ing as many connected” de­vices as they pos­si­bly can, but at least in my lo­cal area I see lit­tle con­sid­er­a­tion to how well-con­nected and func­tional the net­work ac­tu­ally is. When you look be­yond be­ing sim­ply aware of nodes in your lo­cal area, and ac­tu­ally try to in­ter­act with those nodes, failed mes­sages and com­mu­ni­ca­tion is­sues are com­mon.

Reticulum of­fers not just a mes­sen­ger app or a way to share GPS & other sen­sor data, but a full-fledged al­ter­na­tive to the in­ter­net it­self. This en­ables crit­i­cal ap­pli­ca­tions that would not be pos­si­ble on Meshtastic and MeshCore.

For ex­am­ple, I can share ac­cess to any Kiwix file, in­clud­ing the en­tirety of Wikipedia, to any­one on Reticulum, which would make quick in­for­ma­tion shar­ing a breeze in a dis­as­ter sce­nario.

All of this is why I’m set­tling on Reticulum for what I want to build and see in the world. If other peo­ple are in­ter­ested, I’ll share more de­tails about what I’m ac­tu­ally work­ing on build­ing my­self here, and other mesh top­ics.

And cer­tainly if you’re in Northeast Minneapolis, get in touch. God only knows how crit­i­cal tools like this could quickly be­come around here, and it’s bet­ter to pre­pare to­day than be forced to learn all of this to­mor­row 🙂

AMD Pulls a Bait-and-Switch on Linux Users with Vivado Licensing Changes

itsfoss.com

Big tech com­pa­nies have a habit of of­fer­ing some­thing for free, watch­ing the user base grow, and then qui­etly walk­ing it back once peo­ple are too in­vested to leave eas­ily. A bait-and-switch, so to speak.

Redis did ex­actly this back in March 2024, drop­ping its long-stand­ing BSD li­cense for the more re­stric­tive dual li­cens­ing model, and the blow­back was se­vere enough that the com­mu­nity forked it into Valkey al­most im­me­di­ately.

Linux tends to get hit hard­est by these moves. Its com­par­a­tively smaller user base means less com­mer­cial pres­sure, mak­ing it an easy tar­get to throw un­der the bus when­ever com­pa­nies feel like cut­ting costs or boost­ing prof­its.

One such case has now sur­faced that will make you won­der if this par­tic­u­lar com­pa­ny’s de­ci­sion was re­ally short­sighted or is just a cash grab.

It does­n’t make sense!

Vivado is AMDs de­sign suite for its FPGAs and adap­tive SoCs. It is what en­gi­neers, stu­dents, and hard­ware hob­by­ists use to write, syn­the­size, and test their FPGA de­signs. Until now, it has been avail­able for free on both Windows and Linux un­der what AMD called the Standard Edition.

Starting with the 2026.1 re­lease, AMD is switch­ing to a tiered li­cens­ing model. The free Basic tier cov­ers en­try-level de­vices but is re­stricted to Windows only. Linux sup­port does not show up un­til the Core” tier, which costs some­where be­tween $1,200-$1,800 per year.

AMD framed all of this on its down­load page as a move to­ward more flex­i­ble li­cens­ing. On its ded­i­cated li­cens­ing op­tions page, the com­pany told free-tier users the only thing chang­ing was a sim­ple an­nual li­cense re­newal.

That’s not all. 🤷

When users went to AMDs sup­port fo­rum ask­ing for an ex­pla­na­tion, fo­rum mod­er­a­tor Anatoli Curran showed up in the thread. His first or­der of busi­ness was to warn peo­ple about bad lan­guage or abu­sive be­hav­iour to­wards AMD,” be­fore get­ting around to ad­dress­ing any­thing of sub­stance.

When pushed for a real an­swer, Anatoli pointed un­happy users to­ward Vivado 2025.2, sug­gest­ing they sim­ply stick with it if they did not want to pay. He did men­tion that 2025.2 loses of­fi­cial sup­port once Vivado 2026.3 ships, but that de­tail was buried in a thread re­ply, leav­ing users with lit­tle more than a dead-end rec­om­men­da­tion.

Anatoli also started putting out num­bers, claim­ing that 70% of their cus­tomers still use Windows. As ex­pected, some­one cross-ques­tioned him, ask­ing if much of their users were on Windows, then why was Linux sup­port be­ing locked away be­hind a pay­wall.

To which he replied in a very PR-coded man­ner, com­pletely dis­re­gard­ing what was brought up:

From Core and higher tiers, both Windows and Linux are sup­ported plat­forms. As stated al­ready, AMD ex­pec­ta­tion is that the BASIC tier is used for sim­ple, en­try‑level needs. While more ad­vanced, pro­duc­tion work­flows are aligned with paid tiers. These tiers are specif­i­cally de­signed to de­liver the full flex­i­bil­ity and ca­pa­bil­i­ties needed for se­ri­ous de­vel­op­ment.

As stated al­ready, AMD ex­pec­ta­tion is that the BASIC tier is used for sim­ple, en­try‑level needs. While more ad­vanced, pro­duc­tion work­flows are aligned with paid tiers. These tiers are specif­i­cally de­signed to de­liver the full flex­i­bil­ity and ca­pa­bil­i­ties needed for se­ri­ous de­vel­op­ment.

Hence, all paid tier lev­els have op­tions of both Windows and Linux plat­form us­age. Only BASIC tier lim­ited to Windows ONLY plat­form sup­port.

That re­ally does­n’t in­still one bit of clar­ity and shows how ap­a­thetic tech gi­ants like AMD can some­times get. The con­ver­sa­tion in the thread then con­tin­ued along the lines of how Xilinx, and later, AMD, had gained the trust of Linux users by keep­ing an open out­look to­wards the com­mu­nity.

But pulling off such a thing with­out con­sid­er­ing how peo­ple ben­e­fited from hav­ing Vivado on Linux tells you a lot about what this com­pany ac­tu­ally thinks of its non-en­ter­prise user base.

Students, hard­ware tin­ker­ers, and aca­d­e­mic re­searchers who have re­lied on a na­tive Linux work­flow are now left hang­ing. Keep in mind that many of those peo­ple even­tu­ally end up in en­gi­neer­ing and pro­cure­ment roles where they have real in­flu­ence over hard­ware-re­lated de­ci­sions.

What now?

As of writ­ing this, AMD had­n’t put out a state­ment re­gard­ing this, and the stonewalling has con­tin­ued. Of course, more and more peo­ple are get­ting to know about this. It is just a mat­ter of time be­fore some­one at PR has to do some­thing about this.

Plus, with the kind of flak they have been get­ting over some of their most bizarre choices lately, I would han­dle this now in­stead of later.

Until then, you can par­tic­i­pate in the con­ver­sa­tion ei­ther on the main thread where this shady be­hav­ior was first re­ported, or you could head over to Hacker News and join the oth­ers in call­ing out AMD.

Suggested Read 📖: Bambu Lab Has Been Violating AGPLv3

Bambu Lab Has Been Violating AGPLv3 for Years, SFC Says

They are work­ing on a new pro­ject called baltobu’, which will re­verse-en­gi­neer Bambu’s pro­pri­etary com­po­nents.

It’s FOSSSourav Rudra

About the au­thor

Sourav Rudra

A nerd with a pas­sion for open source soft­ware, cus­tom PC builds, mo­tor­sports, and ex­plor­ing the end­less pos­si­bil­i­ties of this world.

MyBrickLog – Free LEGO® Collection Tracker & Price Guide

www.mybricklog.com

MyBrickLog is a free web­site for LEGO® col­lec­tors to track their sets, check prices, and man­age wish­lists. Browse over 20,000 LEGO sets across every theme ever re­leased. Please en­able JavaScript to use the full app.

Track which LEGO® sets you own and how many copies­Track minifig­ures for every set in your col­lec­tion­Browse every LEGO theme and sub­theme ever re­leased

Google employee charged with $1M Polymarket insider trading bet on search term

www.cnbc.com

Signage at the Situation Room by Polymarket pop-up bar in Washington, DC, US, on Friday, March 20, 2026.

Graeme Sloan | Bloomberg | Getty Images

Federal pros­e­cu­tors charged a Google em­ployee with fraud on Wednesday, al­leg­ing that he made $1.2 mil­lion off of bets us­ing in­sider in­for­ma­tion on Polymarket.

Prosecutors claim that Michele Spagnuolo, a staff in­for­ma­tion se­cu­rity en­gi­neer at Google, used con­fi­den­tial in­for­ma­tion to place trades cor­rectly bet­ting that singer d4vd would be Google’s most searched per­son in 2025.

Spagnuolo has been charged with money laun­der­ing, com­modi­ties fraud and wire fraud. The com­plaint, filed in the Southern District of New York, was un­sealed on Wednesday.

ABC News first re­ported on the com­plaint. Spagnuolo was ar­rested Wednesday morn­ing in New York, ABC re­ported.

Spagnuolo had ac­cess to Google’s in­ter­nal data sys­tems, in­clud­ing a par­tic­u­lar Google in­ter­nal soft­ware tool that pro­vided him ac­cess to con­fi­den­tial, non­pub­lic Year in Search data,” the pros­e­cu­tors said in their com­plaint.

Some ob­servers of the Polymarket plat­form flagged the user AlphaRaccoon” back in December for sus­pi­cious trades on the most searched per­son con­tracts. The com­plaint Wednesday said that Spagnuolo was the per­son be­hind that ac­count.

Google of­fi­cially and pub­licly an­nounced its Year in Search 2025 re­sults on or about December 4, 2025. Soon af­ter it did so, Spagnuolo’s AlphaRaccoon ac­count, prof­ited ap­prox­i­mately $1.2 mil­lion on his Google Year in Search 2025-related bets,” the com­plaint said.

Spagnuolo ap­peared be­fore a fed­eral mag­is­trate judge Wednesday, He did not en­ter a plea and was re­leased on a $2.25 mil­lion bond, ABC re­ported.

We’re work­ing with law en­force­ment on their in­ves­ti­ga­tion,” Google said in a state­ment. The em­ployee ac­cessed our mar­ket­ing ma­te­r­ial us­ing a tool avail­able to all em­ploy­ees, but us­ing such con­fi­den­tial in­for­ma­tion to place bets is a se­ri­ous breach of our poli­cies.”

We’ve placed the em­ployee on leave and will take the ap­pro­pri­ate ac­tion,” the com­pany added.

Polymarket worked closely with the U.S. Attorney’s Office for the Southern District of New York and the CFTC, and is the only pre­dic­tion plat­form to date whose co­op­er­a­tion has led to in­sider trad­ing charges in the United States,” a Polymarket spokesper­son said in a state­ment. We are com­mit­ted to main­tain­ing ac­cu­rate, fair, and trans­par­ent mar­kets as well as en­forc­ing our rules and work­ing with our reg­u­la­tors and law en­force­ment.”

Spagnuolo is also fac­ing a civil case from the Commodity Futures Trading Commission, where he’s charged with in­sider trad­ing. The com­plaint de­tailed that Spagnuolo cor­rectly pre­dicted the out­comes of a slew of other search mar­kets, in­clud­ing con­tracts like Will Zohran Mamdani rank in the Top 5 most searched” and Will Squid Game be the #1 searched TV show.”

Spagnuolo mis­ap­pro­pri­ated the ma­te­r­ial Confidential Information by know­ingly or reck­lessly us­ing it to trade the 2025 Year in Search List Contracts in breach of his du­ties of trust and con­fi­den­tial­ity,” the CFTC com­plaint al­leged.

The fed­eral com­plaint marks the sec­ond high-pro­file in­sider trad­ing case on Polymarket in just over a month.

In April, then-ac­tive U.S. Army Special Forces mas­ter sergeant Gannon Ken Van Dyke was ar­rested over charges that he used clas­si­fied in­for­ma­tion to bet on con­tracts re­lated to the U.S. op­er­a­tion to cap­ture Venezuela President Nicolás Maduro. Prosecutors said Van Dyke made more than $400,000 off his trades.

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

Visit pancik.com for more.