10 interesting stories served every morning and every evening.

Daily links from Cory Doctorow

pluralistic.net

Today’s links

Spying on kids to save kids from spy­ing is very, very stu­pid: First they came for the VPNs.

Hey look at this: Delights to delec­tate.

Object per­ma­nence: RIP Darwin’s tor­toise; ISPs con­spire to cre­ate copy­right jail; Waxy v fair use; Broken Windows is BS; Google is a ma­chine-learn­ing com­pany; Writing the Other”; Canadian wealth-tax.

Upcoming ap­pear­ances: Toronto, NYC, Philadelphia, Chicago, London, Edinburgh, Sydney, Melbourne, Brighton, London, South Bend.

Recent ap­pear­ances: Where I’ve been.

Latest books: You keep read­in’ em, I’ll keep writ­in’ em.

Upcoming books: Like I said, I’ll keep writ­in’ em.

Colophon: All the rest.

Spying on kids to save kids from spy­ing is very, very stu­pid (permalink)

The lit­er­a­ture on harms to kids from on­line plat­forms is com­plex and nu­anced, rife with peo­ple cit­ing small, am­bigu­ous stud­ies as iron-clad ev­i­dence that kids are be­ing de­stroyed by the in­ter­net:

https://​www.youtube.com/​watch?v=Ype6c6D­dHQY

It’s a weird coali­tion of anti-Big Tech cam­paign­ers (who are rightly an­gry at the plat­forms’ cal­lous dis­re­gard for user wel­fare) and Heritage Foundation-backed cul­ture war­riors (who think that if their kids aren’t ex­posed to LGBTQ con­tent they won’t come out as queer). While there’s plenty these groups dis­agree about, they share one con­sen­sus: there should be a minimum age” for cer­tain kinds of in­ter­net use.

The prob­lem is, there’s no such thing as age ver­i­fi­ca­tion” for the in­ter­net. What we call age ver­i­fi­ca­tion” is ac­tu­ally mass sur­veil­lance, so in­va­sive and per­va­sive that it makes the ad-tech in­dus­try’s com­mer­cial sur­veil­lance look like some kind of cypher­punk dark­net pi­rate utopia:

https://​plu­ral­is­tic.net/​2025/​08/​14/​bellovin/#​wont-some­one-think-of-the-cryp­tog­ra­phers

Age ver­i­fi­ca­tion” means that every­one who does any­thing on­line will have to sub­mit to fine-grained track­ing and record­ing of all their on­line ac­tiv­i­ties. This night­mare is the sur­veil­lance ad­ver­tis­ing in­dus­try’s fond­est dream, a world where it’s lit­er­ally il­le­gal to avoid their track­ing, all in the name of sav­ing kids…from them!

So it’s not just a weird al­liance of anti-Big Tech cru­saders and the con­spir­a­to­r­ial right that’s push­ing for age ver­i­fi­ca­tion — they are un­wit­ting al­lies of the very tech in­dus­try they think they’re fight­ing. Those tech in­dus­try in­sid­ers are fully aware that an age ver­i­fi­ca­tion” man­date is re­ally a way for the gov­ern­ment to teach every child how to use a VPN. They’re also fully aware that the next move is to ban VPNs:

https://​www.ex­press.co.uk/​news/​uk/​2217934/​vpn-ban-table-july-labour

Tech bosses are the ones sit­ting on our shoul­ders say­ing, Go ahead, swal­low that fly — it’ll be fine. And if you do have to swal­low a spi­der af­ter­ward, well, that’ll surely be the end of it”:

https://​plu­ral­is­tic.net/​2026/​05/​19/​shes-dead-of-course/#​con­sen­sus-hal­lu­ci­na­tion

Behind them is a long line of caliper-wield­ing grifters who claim they can use your phone’s cam­era to dis­tin­guish a child who is 17 years, 364 days old from an adult who’s just turned 18:

https://​www.gov.uk/​gov­ern­ment/​pub­li­ca­tions/​fa­cial-age-es­ti­ma­tion

It’s be­yond farce. After all, what­ever harms you be­lieve the in­ter­net is in­flict­ing on kids — and there’s ab­solutely some kids who are be­ing harmed by their in­ter­net use — those harms all start with sur­veil­lance. Your kids can’t be tar­geted by al­go­rithms with­out the sur­veil­lance data that’s be­ing used to tar­get them. They can’t be fun­neled into pro-anorexia con­tent or ex­treme misog­yny fo­rums with­out that fun­nel be­ing primed by com­mer­cial spy­ing.

Why do tech com­pa­nies spy on your kids? The same rea­son your dog licks its balls: be­cause they can, and no one stops them:

https://​plu­ral­is­tic.net/​2026/​03/​10/​ice-tech/#​fore­see­able-out­comes

America has­n’t up­dated its con­sumer pri­vacy laws since 1988 (when Congress banned the dis­clo­sure of your VHS rentals). The EU has the GDPR, but it also has Ireland, the coun­try where all GDPR cases against Big Tech go to die, be­cause any tax haven in­evitably be­comes a crime haven:

https://​plu­ral­is­tic.net/​2025/​10/​31/​los­ing-the-crypto-wars/#​sur­veil­lance-mo­nop­o­lism

Other coun­tries have pri­vacy laws to vary­ing de­grees, but are grossly out­matched by US tech gi­ants, who have fused with the Trump regime, to the ex­tent that Trump will im­pose penal­ties on your coun­try if you at­tempt to reg­u­late his tech com­pa­nies — he’ll even have your top of­fi­cials cut off from the in­ter­net in re­tal­i­a­tion:

https://​plu­ral­is­tic.net/​2026/​04/​04/​dig­i­tal-sub­ju­ga­tion/#​green­lands-next

Any at­tempt to save kids from on­line harms should start with sav­ing kids from on­line sur­veil­lance, but that’s the op­po­site of what we’re do­ing to­day. After decades of fail­ing to pass and en­force pri­vacy con­trols for the in­ter­net, those same gov­ern­ments are break­ing all land-speed records to pass age ver­i­fi­ca­tion” laws that make pri­vacy il­le­gal:

https://​bsky.app/​pro­file/​re­bec­ca­w­illiams.info/​post/​3moviqzdit22z

The fact that these bills have the firm back­ing of the tech in­dus­try’s most con­trol­ling, most spy­ing com­pa­nies tells you every­thing you need to know about them:

https://​web.archive.org/​web/​20260315022337/​https://​tbotepro­ject.com/

Kids are be­ing harmed by on­line spy­ing, and so are the rest of us. Whether you think that the al­go­rithm made Grampy go Qanon or you’re sus­pi­cious that on­line sur­veil­lance data was used to deny you a loan, a job, or a lease, you should want pri­vacy:

https://​plu­ral­is­tic.net/​2023/​12/​06/​pri­vacy-first/#​but-not-just-pri­vacy

Online sur­veil­lance is be­ing used to raise the prices you pay and lower the wages you’re of­fered:

https://​plu­ral­is­tic.net/​2026/​04/​06/​em­piri­cism-wash­ing/#​veena-dubal

And the same data that’s be­ing used to verify age” to­day will be used by ICE to­mor­row to fig­ure out who to round up for a con­cen­tra­tion camp:

https://​www.wired.com/​story/​ice-asks-com­pa­nies-about-ad-tech-and-big-data-tools/

You can’t pro­tect kids from on­line sur­veil­lance by spy­ing on them. You just can’t. Anyone who tells you oth­er­wise is try­ing to get you to swal­low a fly so they can sell you a spi­der, a bird, a cat, and an ICE chud in a gaiter, Oakleys and plate car­rier (beneath which lurks a stick-and-poke Totenkopf tat­too).

Hey look at this (permalink)

AI doomerism is mis­placed. Here’s what it will take to pop the bub­ble https://​www.sa­lon.com/​2026/​06/​22/​ai-doomerism-is-mis­placed-heres-what-it-will-take-to-pop-the-bub­ble/

Visa and Mastercard: The Original Gangsters of Electronic Collusion https://​www.thes­ling.org/​visa-and-mas­ter­card-the-orig­i­nal-gang­sters-of-elec­tronic-col­lu­sion/

Visa and Mastercard: The Original Gangsters of Electronic Collusion https://​www.thes­ling.org/​visa-and-mas­ter­card-the-orig­i­nal-gang­sters-of-elec­tronic-col­lu­sion/

Has it hap­pened yet? https://​ha­sithap­penedyet.org/

Has it hap­pened yet? https://​ha­sithap­penedyet.org/

Platform-Controlled Search and Distortions in Attention Allocation https://​tin­ber­gen.nl/​dis­cus­sion-pa­per/​6496/​26 – 035-vii-plat­form-con­trolled-search-and-dis­tor­tions-in-at­ten­tion-al­lo­ca­tion

Platform-Controlled Search and Distortions in Attention Allocation https://​tin­ber­gen.nl/​dis­cus­sion-pa­per/​6496/​26 – 035-vii-plat­form-con­trolled-search-and-dis­tor­tions-in-at­ten­tion-al­lo­ca­tion

Object per­ma­nence (permalink)

#20yrsago Darwin’s tor­toise dead at 176 https://​web.archive.org/​web/​20060704143750/​http://​news.ya­hoo.com/​s/​afp/​20060623/​od_afp/​aus­trali­aan­i­mal_060623102146;_ylt=Ave_b4P­s2r9T­GX­qs5nZIV­Io­FO7gF;_ylu=X3oDM­TA5bGV­na3N­hB­HN­lY­wNzc3JlbA–zoo

#15yrsago Major US ISPs set to limit re­peat in­fringers with throt­tling, lim­it­ing ac­cess to 200 web­sites, and copy­right reed­u­ca­tion school https://​web.archive.org/​web/​20111105225114/​http://​news.cnet.com/​8301 – 31001_3 – 20073522-261/​ex­clu­sive-top-isps-poised-to-adopt-grad­u­ated-re­sponse-to-piracy/

#15yrsago Why fair use does­n’t work un­less you’ve got a huge war-chest for pay­ing lawyers https://​waxy.org/​2011/​06/​kind_of_screwed/

#15yrsago Model net neu­tral­ity rule for mu­nic­i­pal­i­ties https://​web.archive.org/​web/​20110626114610/​http://​en­vi­sion­seat­tle.org/​2011/​06/​model-net-neu­tral­ity-or­di­nance-for-seat­tle.html

#15yrsago Campus hookups: col­lege sex is­n’t new, but hookups are dif­fer­ent https://​the­so­ci­ety­pages.org/​socim­ages/​2011/​06/​21/​the-promise-and-per­ils-of-hook-up-cul­ture/

#15yrsago A Brief History of the Corporation: un­der­stand­ing what an at­ten­tion econ­omy is and where it comes from https://​rib­bon­farm.com/​2011/​06/​08/​a-brief-his­tory-of-the-cor­po­ra­tion-1600-to-2100/

#15yrsago Eliza: what makes you think I’m a psy­chother­a­peu­tic chat­bot? https://​www.fil­fre.net/​2011/​06/​eliza-part-1/

#10yrsago Broken Windows polic­ing is non­sense https://​www.nyc.gov/​as­sets/​oignypd/​down­loads/​pdf/​Qual­ity-of-Life-Re­port-2010 – 2015.pdf

#10yrsago How it feels to be un­der DDoS at­tack https://​www.or­eilly.com/​radar/​ddos-emo­tions/

#10yrsago 2016: the first pres­i­den­tial elec­tion in 50 years with­out Voting Rights Act pro­tec­tions https://​www.rolling­stone.com/​pol­i­tics/​pol­i­tics-news/​wel­come-to-the-first-pres­i­den­tial-elec­tion-since-vot­ing-rights-act-gut­ted-179737/​3/

#10yrsago Google is re­struc­tur­ing to put ma­chine learn­ing at the core of all it does https://​web.archive.org/​web/​20180530051703/​https://​www.wired.com/​2016/​06/​how-google-is-re­mak­ing-it­self-as-a-ma­chine-learn­ing-first-com­pany/

#10yrsago Misconfigured data­base ex­poses sen­si­tive data for 154 mil­lion US vot­ers https://​dai­ly­dot.com/​pol­i­tics/​154-mil­lion-voter-files-ex­posed-l2

#10yrsago To un­der­stand the Trump cam­paign, study real-es­tate de­vel­oper hus­tle https://​web.archive.org/​web/​20161028030522/​https://​storify.com/​KC_EDM/​trump-is-run­ning-his-cam­paign-like-a-real-es­tate-d

#10yrsago Writing the Other: in­tensely prac­ti­cal ad­vice for rep­re­sent­ing other cul­tures in fic­tion https://​memex.craphound.com/​2016/​06/​23/​writ­ing-the-other-in­tensely-prac­ti­cal-ad­vice-for-rep­re­sent­ing-other-cul­tures-in-fic­tion/

#1yrago The case for a Canadian wealth tax https://​plu­ral­is­tic.net/​2025/​06/​23/​bil­lion­aires-eh/#​galen-we­ston-is-a-rat

Upcoming ap­pear­ances (permalink)

Toronto: The Sovereignty Debate (IAB Canada’s State of the Nation), Jun 23 https://​iab­canada.com/​state-of-the-na­tion-2026

Toronto: The Reverse Centaur’s Guide to Life After AI (Osler Records/Type Books), Jun 23 https://​www.eventbrite.com/​e/​cory-doc­torow-book-launch-and-talk-tick­ets-1991501299998

Toronto: The Reverse Centaur’s Guide to Life After AI (Osler Records/Type Books), Jun 23 https://​www.eventbrite.com/​e/​cory-doc­torow-book-launch-and-talk-tick­ets-1991501299998

NYC: The Reverse Centaur’s Guide to Life After AI with Jonathan Coulton (The Strand), Jun 24 https://​www.strand­books.com/​cory-doc­torow-the-re­verse-cen­taur-s-guide-to-life-af­ter-ai.html

NYC: The Reverse Centaur’s Guide to Life After AI with Jonathan Coulton (The Strand), Jun 24 https://​www.strand­books.com/​cory-doc­torow-the-re­verse-cen­taur-s-guide-to-life-af­ter-ai.html

Philadelphia: The Reverse Centaur’s Guide to Life After AI with David Williams (Fitler Club/Philadelphia Citizen), Jun 25 https://​www.eventbrite.com/​e/​cory-doc­torow-book-event-tick­ets-1990110326559

Philadelphia: The Reverse Centaur’s Guide to Life After AI with David Williams (Fitler Club/Philadelphia Citizen), Jun 25 https://​www.eventbrite.com/​e/​cory-doc­torow-book-event-tick­ets-1990110326559

Chicago: The Reverse Centaur’s Guide to Life After AI with Rick Perlstein (Exile in Bookville), Jun 26 https://​ex­ilein­bookville.com/​events/​50628

Chicago: The Reverse Centaur’s Guide to Life After AI with Rick Perlstein (Exile in Bookville), Jun 26 https://​ex­ilein­bookville.com/​events/​50628

London: Idler Festival, Jul 11 https://​www.idler.co.uk/​fes­ti­val/

London: Idler Festival, Jul 11 https://​www.idler.co.uk/​fes­ti­val/

Edinburgh International Book Festival with Jimmy Wales, Aug 17 https://​www.ed­book­fest.co.uk/​events/​the-front-list-cory-doc­torow-and-jimmy-wales

Edinburgh International Book Festival with Jimmy Wales, Aug 17 https://​www.ed­book­fest.co.uk/​events/​the-front-list-cory-doc­torow-and-jimmy-wales

Sydney: The Festival of Dangerous Ideas, Aug 23 – 24 https://​fes­ti­val­of­dan­ger­ousideas.com/​cory-doc­torow/

Sydney: The Festival of Dangerous Ideas, Aug 23 – 24 https://​fes­ti­val­of­dan­ger­ousideas.com/​cory-doc­torow/

Melbourne: Enshittification at the Wheeler Centre, Aug 25 https://​www.wheel­er­centre.com/​events-tick­ets/​sea­son-2026/​cory-doc­torow-en­shit­ti­fi­ca­tion

Melbourne: Enshittification at the Wheeler Centre, Aug 25 https://​www.wheel­er­centre.com/​events-tick­ets/​sea­son-2026/​cory-doc­torow-en­shit­ti­fi­ca­tion

Brighton: The Reverse Centaur’s Guide to Life After AI with Carole Cadwalladr (Brighton Dome), Sep 8 https://​brighton­dome.org/​whats-on/​LSC-cory-doc­torow-the-re­verse-cen­taurs-guide-to-life-af­ter-ai/

Brighton: The Reverse Centaur’s Guide to Life After AI with Carole Cadwalladr (Brighton Dome), Sep 8 https://​brighton­dome.org/​whats-on/​LSC-cory-doc­torow-the-re­verse-cen­taurs-guide-to-life-af­ter-ai/

London: The Reverse Centaur’s Guide to Life After AI with Riley Quinn (Foyle’s Picadilly), Sep 9 https://​www.foyles.co.uk/​events/​en­shit­ti­fi­ca­tion-cory-doc­torow-ri­ley-quinn

London: The Reverse Centaur’s Guide to Life After AI with Riley Quinn (Foyle’s Picadilly), Sep 9 https://​www.foyles.co.uk/​events/​en­shit­ti­fi­ca­tion-cory-doc­torow-ri­ley-quinn

South Bend: An Evening With Cory Doctorow (Notre Dame), Oct 6 https://​franco.nd.edu/​events/​2026/​10/​06/​an-evening-with-cory-doc­torow/

South Bend: An Evening With Cory Doctorow (Notre Dame), Oct 6 https://​franco.nd.edu/​events/​2026/​10/​06/​an-evening-with-cory-doc­torow/

Recent ap­pear­ances (permalink)

How to Mess with Big Tech Oligarchs (Fighting Fascism) https://​pod­casts.ap­ple.com/​us/​pod­cast/​how-to-mess-with-big-tech-oli­garchs-w-cory-doc­torow/​id1888647397?i=1000773711479

Reverse Centaur with Angie Coiro (Kepler’s Books) https://​www.youtube.com/​live/​cWN6XBa73xA

Reverse Centaur with Angie Coiro (Kepler’s Books) https://​www.youtube.com/​live/​cWN6XBa73xA

How to Think About AI Before It’s Too Late (Galaxy Brain) https://​www.youtube.com/​watch?v=SPQN­PJ0­CEPo

How to Think About AI Before It’s Too Late (Galaxy Brain) https://​www.youtube.com/​watch?v=SPQN­PJ0­CEPo

The fu­ture of world gov­er­nance, with Kim Stanley Robinson (UN Independent Expert on International Order) https://​www.youtube.com/​live/​wJvB­vY­daAMY

The fu­ture of world gov­er­nance, with Kim Stanley Robinson (UN Independent Expert on International Order) https://​www.youtube.com/​live/​wJvB­vY­daAMY

How to Think About Artificial Intelligence (KUER) https://​ra­diow­est.kuer.org/​show/​ra­diow­est/​2026 – 06-16/​cory-doc­torow-on-how-to-think-about-ar­ti­fi­cial-in­tel­li­gence

How to Think About Artificial Intelligence (KUER) https://​ra­diow­est.kuer.org/​show/​ra­diow­est/​2026 – 06-16/​cory-doc­torow-on-how-to-think-about-ar­ti­fi­cial-in­tel­li­gence

Latest books (permalink)

Canny Valley”: A lim­ited edi­tion col­lec­tion of the col­lages I cre­ate for Pluralistic, self-pub­lished, September 2025 https://​plu­ral­is­tic.net/​2025/​09/​04/​il­lus­tri­ous/#​chair­man-bruce

GitHub - future-file-format/F3: [SIGMOD 2026] F3: The Open-Source Data File Format for the Future

github.com

F3 is a data file for­mat that is de­signed with ef­fi­ciency, in­ter­op­er­abil­ity, and ex­ten­si­bil­ity in mind. It pro­vides a data or­ga­ni­za­tion that rec­ti­fies the lay­out short­com­ings of the last-gen­er­a­tion for­mats like Parquet, while at the same time main­tain­ing good in­ter­op­er­abil­ity and ex­ten­si­bil­ity (a.k.a fu­ture-proof) via em­bed­ded Wasm de­coders.

⚠️ This pro­ject is a re­search pro­to­type ver­i­fy­ing the ideas in the pa­per. You should not use it in pro­duc­tion.

⚠️ This pro­ject is a re­search pro­to­type ver­i­fy­ing the ideas in the pa­per. You should not use it in pro­duc­tion.

Build in­struc­tions

We only tested on an Intel ma­chine with Debian 12.

git sub­mod­ule up­date –init –recursive ./scripts/setup_debian.sh # build the PoC pack­age of F3 cargo build -p fff-poc # run unit test for F3 cargo test -p fff-poc

Important di­rec­to­ries

for­mat: FlatBuffer de­f­i­n­i­tion of the file for­mat.

fff-poc: The main code of the F3 for­mat. It ref­er­ences other sub­dirs like fff-core, fff-en­cod­ing, fff-for­mat, and fff-ude-wasm.

fff-bench: Benchmarks and ex­per­i­ments ap­peared in the pa­per. Specifically, fff-bench/​ex­am­ples should con­tain most ex­per­i­ments, both mi­cro and e2e.

fff-ude*: ude stand for User-Defined-Encoding and code in those di­rec­to­ries re­lates to the Wasm de­cod­ing im­ple­men­ta­tion.

scripts and ex­p_scripts: scripts re­lated to run the ex­per­i­ments.

Reproduction steps for the ex­per­i­ment re­sults in the pa­per

Please re­fer to doc/​pa­per_re­pro­duc­tion.md for the de­tailed steps.

License

This pro­ject is li­censed un­der the MIT License. See LICENSE for de­tails.

Citation

If you find this pro­ject use­ful, please con­sider cit­ing:

@article{zeng2025f3, au­thor = {Zeng, Xinyu and Meng, Ruijun and Prammer, Martin and McKinney, Wes and Patel, Jignesh M. and Pavlo, Andrew and Zhang, Huanchen}, ti­tle = {F3: The Open-Source Data File Format for the Future}, year = {2025}, is­sue_­date = {September 2025}, pub­lisher = {Association for Computing Machinery}, ad­dress = {New York, NY, USA}, vol­ume = {3}, num­ber = {4}, url = {https://​doi.org/​10.1145/​3749163}, doi = {10.1145/3749163}, ab­stract = {Columnar stor­age for­mats are the foun­da­tion for mod­ern data an­a­lyt­ics sys­tems. The pro­lif­er­a­tion of open-source file for­mats (i.e., Parquet, ORC) al­lows seam­less data shar­ing across dis­parate plat­forms. However, these for­mats were cre­ated over a decade ago for hard­ware and work­load en­vi­ron­ments that are much dif­fer­ent from to­day. Although these for­mats have in­cor­po­rated some up­dates to their spec­i­fi­ca­tion to adapt to these changes, not all de­ploy­ments sup­port those mod­i­fi­ca­tions, and too of­ten sys­tems can­not over­come the for­mats’ de­fi­cien­cies and lim­i­ta­tions with­out a rewrite.In this pa­per, we pre­sent the Future-proof File Format (F3) pro­ject. It is a next-gen­er­a­tion open-source file for­mat with in­ter­op­er­abil­ity, ex­ten­si­bil­ity, and ef­fi­ciency as its core de­sign prin­ci­ples. F3 ob­vi­ates the need to cre­ate a new for­mat every time a shift oc­curs in data pro­cess­ing and com­put­ing by pro­vid­ing a data or­ga­ni­za­tion struc­ture and a gen­eral-pur­pose API to al­low de­vel­op­ers to add new en­cod­ing schemes eas­ily. Each self-de­scrib­ing F3 file in­cludes both the data and meta-data, as well as WebAssembly (Wasm) bi­na­ries to de­code the data. Embedding the de­coders in each file re­quires min­i­mal stor­age (kilobytes) and en­sures com­pat­i­bil­ity on any plat­form in case na­tive de­coders are un­avail­able. To eval­u­ate F3, we com­pared it against legacy and state-of-the-art open-source file for­mats. Our eval­u­a­tions demon­strate the ef­fi­cacy of F3′s stor­age lay­out and the ben­e­fits of Wasm-driven de­cod­ing.}, jour­nal = {Proc. ACM Manag. Data}, month = sep, ar­ti­cleno = {245}, numpages = {27}, key­words = {columnar stor­age, com­pres­sion, ex­ten­si­bil­ity, file for­mat} }

FUTO Swipe

swipe.futo.tech

Fast, ac­cu­rate swipe typ­ing sys­tem. Use it to­day in FUTO Keyboard, our fully of­fline Android key­board app. Or down­load the mod­els and build with it.

This is a server­side demo to keep this web­page small. In pro­duc­tion, it runs on-de­vice, with much lower la­tency.

For a long time, good mo­bile swipe typ­ing was locked be­hind pri­vacy-in­va­sive key­board apps or un­li­censed pri­vate li­braries.

FUTO Swipe is our fam­ily of open mod­els and al­go­rithms that aims to solve this prob­lem. We de­vel­oped this pri­mar­ily for FUTO Keyboard, but we also wel­come the broader com­mu­nity to make use of the FUTO Swipe mod­els. As this has been a long-term in­vest­ment for us, we ask that an at­tri­bu­tion is made vis­i­ble to end-users. Read li­cense

Dataset

In August 2024, we launched a dataset col­lec­tion ef­fort on the swipe.futo.org do­main to col­lect QWERTY English swipes. Users would vol­un­tar­ily visit the web­page on their mo­bile phone and be given in­struc­tions and in­for­ma­tion about the dataset. After con­sent­ing, they would be given sen­tences, pri­mar­ily from Wikipedia, and would be asked to swipe them word-by-word.

In the end, this pro­duced over 1 mil­lion swipes. We fil­tered out a small set of low-qual­ity swipes. In March 2025, we re­leased a dataset of 1 mil­lion swipes un­der the MIT li­cense, and it is avail­able to­day on HuggingFace.

We made heavy use of this data to train our mod­els and to eval­u­ate dif­fer­ent swipe typ­ing sys­tems.

Models

Our ar­chi­tec­ture in­cludes three model types.

The Encoder model is a uni­ver­sal lay­out-ag­nos­tic and lan­guage-ag­nos­tic, and is used for mak­ing swipe typ­ing pre­dic­tions in the gen­eral case. However, it does not of­fer cut­ting-edge ac­cu­racy.

The ContextLM model is a very small lan­guage model that is trained for a sin­gle lan­guage. It’s used to im­prove the qual­ity of pre­dic­tions by elim­i­nat­ing non­sen­si­cal words given the pre­ced­ing words in the sen­tence. It only re­quires text data for train­ing.

Finally, the de­coder is a lan­guage-spe­cific and lay­out-spe­cific model that learns lay­out’s pe­cu­liar­i­ties and achieves lead­ing ac­cu­racy. As it re­quires swipe typ­ing data for a spe­cific lay­out and lan­guage for train­ing, we only have a QWERTY English de­coder for now.

With all 3 mod­els and with a beam width of 300, we achieve a top-4 fail rate of only ~4% on our test set. Ignoring out-of-vo­cab­u­lary cases, the er­ror rate is be­low 1%.

Note: These num­bers heav­ily de­pend on the bench­mark, so real-world use may vary, but we be­lieve we match big tech’s key­boards.

Footprint

The en­coder model is just 635,140 pa­ra­me­ters, and the de­coder is 304,155 ex­tra. The biggest one is the ContextLM at 1.5 mil­lion, but 1.1 mil­lion of that is just em­bed­dings. This brings us to 1,364,271 ac­tive pa­ra­me­ters, or 2,494,767 to­tal pa­ra­me­ters.

This means the foot­print of the mod­els are very small, and the model can run on low-end de­vices in mil­lisec­onds. In ad­di­tion, the en­vi­ron­men­tal costs in­volved in train­ing the mod­els were also very low, be­cause we never needed more than 1 work­sta­tion GPU!

C++ Library

The mod­els them­selves are only half of the story when go­ing from a swipe to word pre­dic­tions. The model pre­dic­tions are not very use­ful on their own and it’s nec­es­sary to per­form a dic­tio­nary-con­strained beam search to score a set of words and find the most likely can­di­dates.

For this, we re­lease swipe-li­brary, a li­brary writ­ten in C++ that han­dles the en­tire in­fer­ence, de­cod­ing, and beam search part so you can eas­ily go from swipe paths to word pre­dic­tions.

Make some­thing cool!

…or on a lap­top track­pad

Want to build with FUTO Swipe?

The FUTO Swipe mod­els are avail­able un­der the FUTO Model License, and the in­fer­ence li­brary is un­der GPL. We are work­ing on a pa­per that will de­tail more on the train­ing and ar­chi­tec­ture.

The Map — Jerry's Map

www.jerrysmap.com

Landing

The Map

Exhibitions

Videos

Sales

About/Contact

.

What is it?

In the sum­mer of 1963 Jerry be­gan draw­ing a map of an imag­i­nary city. The work started as a doo­dle done in the spare time he had while work­ing at a te­dious job. He con­tin­ued to add to that map through the years un­til, in 1983, he set it aside to put his free time to other use.

It was stored in the at­tic of his home in Cold Spring, New York. It gath­ered dust. Jerry’s son, Henry, found it one day while rum­mag­ing around. He brought it down and asked what it was. Seeing it then trig­gered Jerry to dust it off and con­tinue the pro­ject.

Years later, the Map is now a two-di­men­sional virtual world” art pro­ject which is now com­prised of over 4000 in­di­vid­ual eight by ten inch pan­els. When as­sem­bled, these pan­els form an ap­prox­i­mate cir­cle. The panel lo­ca­tions are de­fined by N, S, E, and W co­or­di­nates that orig­i­nate at the cen­ter of the cir­cle. The lo­ca­tions in the ma­trix do not change, but the pan­els them­selves are con­tin­u­ally re­vised based on in­struc­tions drawn from the artist’s cus­tom deck of cards.

Its ex­e­cu­tion, in acrylic, marker, col­ored pen­cil, ink, col­lage, and inkjet print on heavy pa­per, is dic­tated by the in­ter­play be­tween an elab­o­rate set of rules and ran­domly gen­er­ated in­struc­tions.

Jerry main­tained a blog about the pro­ject for many years. He no longer up­dates it, but the old posts are still avail­able on Blogger. And also be sure to check out r/​jer­rymap­ping,  an in­ter­est­ing sub­red­dit de­voted to map mak­ing in the style of Jerry’s Map.

Time lapse show­ing 20 years of changes to an 8 by 16 panel por­tion of the map. Sharp eyes will note that there are not ac­tu­ally 20 changes in this an­i­ma­tion. This is be­cause for some of those years, no cards were drawn that called for changes on these par­tic­u­lar pan­els.

The Creative Process

The Card Deck

The en­tire process is dri­ven by in­struc­tions on a card drawn from a spe­cial deck cre­ated by the artist. Each cy­cle be­gins only when the artist’s tasks from the pre­vi­ous card are com­plete. This could take any­where from a few min­utes to a few days.

The cards were first in­tro­duced as a sim­ple ran­dom num­ber gen­er­a­tor. When Jerry was first cre­at­ing the map it was sim­ple enough to work sheet to sheet, but as the map grew to hun­dreds of in­di­vid­ual pan­els it be­came very te­dious to make his way through the set.

I wanted to move through the stack faster, and the eas­i­est ran­dom num­ber sys­tem I could come up with was a deck of cards. I’d draw a card and move down that many pan­els in the stack.”

As Jerry be­gan work­ing on ways of sys­tem­atiz­ing the process of work­ing on the map he be­gan to in­cor­po­rate in­struc­tions on the cards. The con­tem­po­rary deck of cards has been adapted from play­ing cards and the to­tal num­ber varies as cards have been added, re­vised, and re­moved. Cur­rently there are ap­prox­i­mately 100 cards.

Sometimes I have feel­ings about the deck of cards. There’s a mes­sage in those cards. There’s no big man with a beard who has or­dered the cards, but I’m very in­ter­ested in see­ing what comes out of it. There’s a re­al­ity in there wait­ing to get out. It’s the map’s fu­ture pre­dic­tor and as it is al­ways chang­ing its alive…My hand puts the paint on the pa­per, I’ll step back and look at the sheets as though I was­n’t the per­pe­tra­tor but merely the ob­server.”

Interpreting the Cards

The in­struc­tions on each card have these three el­e­ments:

Card in­struc­tions for the Artist are in these five gen­eral cat­e­gories:

Next higher di­men­sion (void, red, black, zig­gu­rat)Spat­ter paint four con­tigu­ous pan­els (current panel plus the 3 clos­est to the cen­ter of that panel)Cre­ate a new seed pan­elMix a new paint col­orScreen print the 9 con­tigu­ous pan­els (current panel plus 8 sur­round­ing pan­els)

Next higher di­men­sion (void, red, black, zig­gu­rat)

Next higher di­men­sion (void, red, black, zig­gu­rat)

Spatter paint four con­tigu­ous pan­els (current panel plus the 3 clos­est to the cen­ter of that panel)

Spatter paint four con­tigu­ous pan­els (current panel plus the 3 clos­est to the cen­ter of that panel)

Create a new seed panel

Create a new seed panel

Mix a new paint color

Mix a new paint color

Screen print the 9 con­tigu­ous pan­els (current panel plus 8 sur­round­ing pan­els)

Screen print the 9 con­tigu­ous pan­els (current panel plus 8 sur­round­ing pan­els)

Update and copy the mas­ter (map el­e­ment) on the top of the stack of mas­tersCopy the cur­rent panel on la­bel pa­per so por­tions can be used in col­lage

Update and copy the mas­ter (map el­e­ment) on the top of the stack of mas­ters

Update and copy the mas­ter (map el­e­ment) on the top of the stack of mas­ters

Copy the cur­rent panel on la­bel pa­per so por­tions can be used in col­lage

Copy the cur­rent panel on la­bel pa­per so por­tions can be used in col­lage

Use a patch­work of re-used printed pa­per­board (e.g ce­real boxes)Use a photo from the artist’s file­sUse a lu­men print (objects scanned and printed)

Use a patch­work of re-used printed pa­per­board (e.g ce­real boxes)

Use a patch­work of re-used printed pa­per­board (e.g ce­real boxes)

Use a photo from the artist’s files

Use a photo from the artist’s files

Use a lu­men print (objects scanned and printed)

Use a lu­men print (objects scanned and printed)

Add or sub­tract from the num­ber on a spec­i­fied num­ber cards (e.g. add 3 to the num­ber on the last 2 cards”)Elim­i­nate a card from or add a card to the deck (eliminated cards are retired” for­ever, but the in­struc­tions are kept on a list for pos­si­ble re-use on a fu­ture card)Copy and re­tire the last 9 cards (physical cards are re­tired but the copies stay in the deck)Shuf­fle the deck

Add or sub­tract from the num­ber on a spec­i­fied num­ber cards (e.g. add 3 to the num­ber on the last 2 cards”)

Add or sub­tract from the num­ber on a spec­i­fied num­ber cards (e.g. add 3 to the num­ber on the last 2 cards”)

Eliminate a card from or add a card to the deck (eliminated cards are retired” for­ever, but the in­struc­tions are kept on a list for pos­si­ble re-use on a fu­ture card)

Eliminate a card from or add a card to the deck (eliminated cards are retired” for­ever, but the in­struc­tions are kept on a list for pos­si­ble re-use on a fu­ture card)

Copy and re­tire the last 9 cards (physical cards are re­tired but the copies stay in the deck)

Copy and re­tire the last 9 cards (physical cards are re­tired but the copies stay in the deck)

Shuffle the deck

Shuffle the deck

Do a blog en­tryDo a jour­nal en­try (also print and make col­lage ma­te­r­ial of the en­try)Do a Reddit post­Cal­cu­late the sales value of the en­tire set of pan­els (based lat­est eBay sale)

Do a blog en­try

Do a blog en­try

Do a jour­nal en­try (also print and make col­lage ma­te­r­ial of the en­try)

Do a jour­nal en­try (also print and make col­lage ma­te­r­ial of the en­try)

Do a Reddit post

Do a Reddit post

Calculate the sales value of the en­tire set of pan­els (based lat­est eBay sale)

Calculate the sales value of the en­tire set of pan­els (based lat­est eBay sale)

Card in­struc­tions for the Artist’s Helper are re­lated to:

The card in­di­cates the num­ber of pan­els to be scanned and added to the dig­i­tal li­brary.

The card in­di­cates the num­ber of pan­els to be scanned and added to the dig­i­tal li­brary.

The card in­di­cates the num­ber of pan­els to be scanned and added to the dig­i­tal li­brary.

The card asks the helper to sort re­tired pan­els and archive them.

The card asks the helper to sort re­tired pan­els and archive them.

The card asks the helper to sort re­tired pan­els and archive them.

The helper makes copies of the num­ber of cur­rent pan­els in­di­cated on the card, and the orig­i­nal pan­els are re­tired and archived.

The helper makes copies of the num­ber of cur­rent pan­els in­di­cated on the card, and the orig­i­nal pan­els are re­tired and archived.

The helper makes copies of the num­ber of cur­rent pan­els in­di­cated on the card, and the orig­i­nal pan­els are re­tired and archived.

The card asks the helper to up­date the in­ven­tory of the archives.

The card asks the helper to up­date the in­ven­tory of the archives.

The card asks the helper to up­date the in­ven­tory of the archives.

A TYPICAL DAY’S WORK

Jerry draws a card and works through the tasks it de­fines. This video gives some in­sight into what a typ­i­cal day’s work looks like.

The Principles

These are the in­struc­tions and rules which guide the Artist in the cre­ation of the map:

Each card has a large black or red num­ber in an up­per cor­ner. A task” is de­fined as the com­ple­tion of the num­ber of work units as spec­i­fied by the num­ber on the card that is drawn. A work unit is the num­ber of one inch squares to be cov­ered. The num­ber drawn and the ef­fort re­quired can be highly vari­able, so a day’s work could con­sist of one card’s work units, or just a por­tion of one. Work on an in­com­plete work unit con­tin­ues at the next work ses­sion.

Each card has a large black or red num­ber in an up­per cor­ner. A task” is de­fined as the com­ple­tion of the num­ber of work units as spec­i­fied by the num­ber on the card that is drawn. A work unit is the num­ber of one inch squares to be cov­ered. The num­ber drawn and the ef­fort re­quired can be highly vari­able, so a day’s work could con­sist of one card’s work units, or just a por­tion of one. Work on an in­com­plete work unit con­tin­ues at the next work ses­sion.

When a card is drawn you must fol­low the spe­cific in­struc­tions on the card, but those in­struc­tions may be changed for the next time that card is drawn.

When a card is drawn you must fol­low the spe­cific in­struc­tions on the card, but those in­struc­tions may be changed for the next time that card is drawn.

Work di­rec­tion is de­ter­mined by color of the drawn card - black is clock­wise, red is counter-clock­wise.

Work di­rec­tion is de­ter­mined by color of the drawn card - black is clock­wise, red is counter-clock­wise.

Every page has a center” point from which the work em­anates. The center” of the new page is the same as the par­en­t’s.

Every page has a center” point from which the work em­anates. The center” of the new page is the same as the par­en­t’s.

New pan­els are gen­er­ated by draw­ing a new panel” card, or a new panel is re­quired to com­plete a sec­tion of art.

New pan­els are gen­er­ated by draw­ing a new panel” card, or a new panel is re­quired to com­plete a sec­tion of art.

When a new page is added, the new page will use the color of the day”.

When a new page is added, the new page will use the color of the day”.

The lo­ca­tion of the new page is de­ter­mined by plac­ing a com­pass point in the center” of the par­ent page and de­ter­min­ing the clos­est edge of the map (this keeps the map roughly cir­cu­lar and grow­ing gen­er­ally equally in all di­rec­tions).

The lo­ca­tion of the new page is de­ter­mined by plac­ing a com­pass point in the center” of the par­ent page and de­ter­min­ing the clos­est edge of the map (this keeps the map roughly cir­cu­lar and grow­ing gen­er­ally equally in all di­rec­tions).

Master map shows the lo­ca­tions of the pan­els as de­fined by co­or­di­nates.

Master map shows the lo­ca­tions of the pan­els as de­fined by co­or­di­nates.

Colors are more ab­stract and do not nec­es­sar­ily rep­re­sent the phys­i­cal world. Colors may be ap­plied with ei­ther paint or mark­ers, or by us­ing col­lage. The 42 col­ors are con­tin­u­ally remixed to en­sure a spec­trum of paints.

Colors are more ab­stract and do not nec­es­sar­ily rep­re­sent the phys­i­cal world. Colors may be ap­plied with ei­ther paint or mark­ers, or by us­ing col­lage. The 42 col­ors are con­tin­u­ally remixed to en­sure a spec­trum of paints.

GitHub - baidu/Unlimited-OCR: Unlimited OCR Works: Welcome the Era of One-shot Long-horizon Parsing.

github.com

Welcome the Era of One-shot Long-horizon Parsing.

Release

[2026/06/23] 📄 Our pa­per is now avail­able on arXiv.

[2026/06/23] 🤝 Thanks to the ModelScope com­mu­nity for their sup­port. Our model is now avail­able at ModelScope.

[2026/06/22] 🚀 We pre­sent Unlimited-OCR, aim­ing to push Deepseek-OCR one step fur­ther.

Inference

Transformers

Inference us­ing Huggingface trans­form­ers on NVIDIA GPUs. Requirements tested on python 3.12.3 + CUDA12.9:

torch==2.10.0 torchvi­sion==0.25.0 trans­form­ers==4.57.1 Pillow==12.1.1 mat­plotlib==3.10.8 einops==0.8.2 ad­dict==2.4.0 easy­dict==1.13 pymupdf==1.27.2.2 psu­til==7.2.2

im­port os im­port torch from trans­form­ers im­port AutoModel, AutoTokenizer

mod­el_­name = baidu/Unlimited-OCR’

to­k­enizer = AutoTokenizer.from_pretrained(model_name, trust_re­mote_­code=True) model = AutoModel.from_pretrained( mod­el_­name, trust_re­mote_­code=True, use_safeten­sors=True, torch_d­type=torch.bfloat16, ) model = model.eval().cuda()

# ── Single im­age sup­ports two con­figs: gun­dam or base ── # gun­dam: base_­size=1024, im­age_­size=640, crop_­mode=True # base: base_­size=1024, im­age_­size=1024, crop_­mode=False model.in­fer( to­k­enizer, prompt=‘<im­age>doc­u­ment pars­ing.’, im­age_­file=‘your_im­age.jpg’, out­put_­path=‘your/​out­put/​dir’, base_­size=1024, im­age_­size=640, crop_­mode=True, max_length=32768, no_re­peat_n­gram_­size=35, ngram_win­dow=128, save_re­sults=True, )

# ── Multi page / PDF only uses base (image_size=1024) ── model.in­fer­_­multi( to­k­enizer, prompt=‘<im­age>Multi page pars­ing.’, im­age_­files=[‘page1.png’, page2.png’, page3.png’], out­put_­path=‘your/​out­put/​dir’, im­age_­size=1024, max_length=32768, no_re­peat_n­gram_­size=35, ngram_win­dow=1024, save_re­sults=True, )

# ── PDF (convert pages to im­ages, then multi-page pars­ing) ── im­port temp­file, fitz # PyMuPDF

def pdf_­to_im­ages(pdf_­path, dpi=300): doc = fitz.open(pdf_­path) tm­p_dir = temp­file.mkdtemp(pre­fix=‘pdf_ocr_’) mat = fitz.Ma­trix(dpi / 72, dpi / 72) paths = [] for i, page in enu­mer­ate(doc): out = os.path.join(tm­p_dir, f’­page_{i+1:04d}.png’) page.get_pixmap(ma­trix=mat).save(out) paths.ap­pend(out) doc.close() re­turn paths

model.in­fer­_­multi( to­k­enizer, prompt=‘<im­age>Multi page pars­ing.’, im­age_­files=pdf_­to_im­ages(‘your_­doc.pdf’, dpi=300), out­put_­path=‘your/​out­put/​dir’, im­age_­size=1024, max_length=32768, no_re­peat_n­gram_­size=35, ngram_win­dow=1024, save_re­sults=True, )

SGLang

Set up the en­vi­ron­ment (uv-managed vir­tualenv). Install the lo­cal SGLang wheel first, then pin ker­nels==0.9.0 and in­stall PyMuPDF for PDF-to-image con­ver­sion:

uv venv –python 3.12 source .venv/bin/activate

uv pip in­stall wheel/​sglang-0.0.0.de­v11416+g92e8b­b79e-py3-none-any.whl uv pip in­stall ker­nels==0.11.7 uv pip in­stall pymupdf==1.27.2.2

Start the SGLang server:

python -m sglang.launch_server \ –model baidu/​Un­lim­ited-OCR \ –served-model-name Unlimited-OCR \ –attention-backend fa3 \ –page-size 1 \ –mem-fraction-static 0.8 \ –context-length 32768 \ –enable-custom-logit-processor \ –disable-overlap-schedule \ –skip-server-warmup \ –host 0.0.0.0 \ –port 10000

Send stream­ing re­quests to the OpenAI-compatible API:

im­port base64 im­port json im­port os im­port temp­file

im­port fitz im­port re­quests from sglang.srt.sam­pling.cus­tom_log­it_proces­sor im­port DeepseekOCRNoRepeatNGramLogitProcessor

serv­er_url = http://​127.0.0.1:10000

ses­sion = re­quests.Ses­sion() ses­sion.trust_env = False

def pdf_­to_im­ages(pdf_­path, dpi=300): doc = fitz.open(pdf_­path) tm­p_dir = temp­file.mkdtemp(pre­fix=“pdf_ocr_”) mat = fitz.Ma­trix(dpi / 72, dpi / 72) im­age_­paths = [] for i, page in enu­mer­ate(doc): im­age_­path = os.path.join(tm­p_dir, f”page_{i + 1:04d}.png”) page.get_pixmap(ma­trix=mat).save(im­age_­path) im­age_­paths.ap­pend(im­age_­path) doc.close() re­turn im­age_­paths

def en­code_im­age(im­age_­path): ext = os.path.spli­text(im­age_­path)[1].lower() mime = image/jpeg” if ext in (”.jpg”, .jpeg”) else f”im­age/{​ext.lstrip(‘.’)}” with open(im­age_­path, rb”) as f: data = base64.b64en­code(f.read()).de­code(“utf-8″) re­turn {“type”: image_url”, image_url”: {“url”: f”data:{mime};base64,{data}“}}

def build_­con­tent(prompt, im­age_­paths): re­turn [{“type”: text”, text”: prompt}] + [encode_image(path) for path in im­age_­paths]

def gen­er­ate(prompt, im­age_­paths, im­age_­mode, ngram_win­dow): pay­load = { model”: Unlimited-OCR”, messages”: [{“role”: user”, content”: build_­con­tent(prompt, im­age_­paths)}], temperature”: 0, skip_special_tokens”: False, images_config”: {“image_mode”: im­age_­mode}, custom_logit_processor”: DeepseekOCRNoRepeatNGramLogitProcessor.to_str(), custom_params”: { ngram_size”: 35, window_size”: ngram_win­dow, }, stream”: True, } re­sponse = ses­sion.post( f”{serv­er_url}/​v1/​chat/​com­ple­tions”, head­ers={“Con­tent-Type”: application/json”}, data=json.dumps(pay­load), time­out=1200, stream=True, ) re­sponse.raise_­for_s­ta­tus()

chunks = [] for line in re­sponse.iter_­lines(chunk_­size=1, de­code_u­ni­code=True): if not line or not line.startswith(“data: ): con­tinue data = line[len(“data: ):] if data == [DONE]”: break event = json.loads(data) delta = event[“choices”][0].get(“delta”, {}).get(“content”, ”) if delta: print(delta, end=“”, flush=True) chunks.ap­pend(delta) print() re­turn ”.join(chunks)

# Single im­age sup­ports two con­figs: gun­dam or base. Example be­low uses gun­dam. gen­er­ate(“doc­u­ment pars­ing.”, [“your_image.jpg”], im­age_­mode=“gun­dam”, ngram_win­dow=128)

# Multi im­age (base only) gen­er­ate(“Multi page pars­ing.”, [“page1.png”, page2.png”], im­age_­mode=“base”, ngram_win­dow=1024)

# PDF (base only) gen­er­ate(“Multi page pars­ing.”, pdf_­to_im­ages(“your_­doc.pdf”, dpi=300), im­age_­mode=“base”, ngram_win­dow=1024)

For batch in­fer­ence, in­fer.py starts the SGLang server au­to­mat­i­cally and sends con­cur­rent re­quests for an im­age di­rec­tory or PDF:

# Image di­rec­tory python in­fer.py \ –image_dir ./examples/images \ –output_dir ./outputs \ –concurrency 8 \ –image_mode gun­dam

# PDF pages python in­fer.py \ –pdf ./examples/document.pdf \ –output_dir ./outputs \ –concurrency 8 \ –image_mode gun­dam

Useful op­tions:

–model_dir baidu/​Un­lim­ited-OCR # Local path or Hugging Face model ID –gpu 0 # CUDA_VISIBLE_DEVICES value –server_log ./log/sglang_server.log

Visualization

Acknowledgement

We would like to thank Deepseek-OCR, Deepseek-OCR-2, PaddleOCR for their valu­able mod­els and ideas.

Citation

@misc{yin2026unlimitedocrworks, ti­tle={Un­lim­ited OCR Works}, au­thor={Youyang Yin and Huanhuan Liu and YY and Qunyi Xie and Chaorun Liu and Shiqi Yang and Shaohua Wang and Zhanlong Liu and Hao Zou and Jinyue Chen and Shu Wei and Jingjing Wu and Mingxin Huang and Zhen Wu and Guibin Wang and Tengyu Du and Lei Jia}, year={2026}, eprint={2606.23050}, archivePre­fix={arXiv}, pri­ma­ryClass={cs.CV}, url={https://​arxiv.org/​abs/​2606.23050}, }

Mistral OCR 4 : SOTA OCR for Document Intelligence

mistral.ai

Today, we’re re­leas­ing Mistral OCR 4, fea­tur­ing bound­ing boxes, block clas­si­fi­ca­tion, and in­line con­fi­dence scores along­side ex­tracted text. The model sup­ports 170 lan­guages across 10 lan­guage groups, runs in a sin­gle con­tainer for fully self-hosted de­ploy­ments, and serves as an in­ges­tion com­po­nent for en­ter­prise search, RAG, and do­main-spe­cific re­trieval pipelines. OCR 4 is a small, fo­cused model, and this post cov­ers what’s new, how it per­forms on pub­lic and in­ter­nal bench­marks, the known lim­i­ta­tions of those bench­marks, and guid­ance on when to use the model API ver­sus Document AI.

Highlights

Breakthrough per­for­mance. Independent an­no­ta­tors pre­fer OCR 4 over every lead­ing OCR and doc­u­ment-AI sys­tem tested, with win rates av­er­ag­ing 72%, along­side the top over­all score on OlmOCRBench (85.20). See Benchmarks be­low for method­ol­ogy and known scor­ing lim­i­ta­tions.

Breakthrough per­for­mance. Independent an­no­ta­tors pre­fer OCR 4 over every lead­ing OCR and doc­u­ment-AI sys­tem tested, with win rates av­er­ag­ing 72%, along­side the top over­all score on OlmOCRBench (85.20). See Benchmarks be­low for method­ol­ogy and known scor­ing lim­i­ta­tions.

Segmentation, not just text. Alongside the ex­tracted text, OCR 4 re­turns bound­ing boxes, typed-block clas­si­fi­ca­tion (titles, ta­bles, equa­tions, sig­na­tures, and more), and in­line con­fi­dence scores. Bounding boxes, our most-re­quested ca­pa­bil­ity, lo­cal­ize text for in-con­text high­light­ing and re­li­able data pipelines. At the same time, block types and con­fi­dence scores drive source-grounded ci­ta­tions, redac­tions, and hu­man-in-the-loop ver­i­fi­ca­tion.

Segmentation, not just text. Alongside the ex­tracted text, OCR 4 re­turns bound­ing boxes, typed-block clas­si­fi­ca­tion (titles, ta­bles, equa­tions, sig­na­tures, and more), and in­line con­fi­dence scores. Bounding boxes, our most-re­quested ca­pa­bil­ity, lo­cal­ize text for in-con­text high­light­ing and re­li­able data pipelines. At the same time, block types and con­fi­dence scores drive source-grounded ci­ta­tions, redac­tions, and hu­man-in-the-loop ver­i­fi­ca­tion.

Integrated with Mistral Search Toolkit (public pre­view). OCR 4 is an in­ges­tion com­po­nent of Search Toolkit, Mistral’s open-source, com­pos­able search frame­work, an­nounced at the AI Now Summit. Its struc­tured out­put sup­plies ci­ta­tion-ready in­puts to the toolk­it’s in­ges­tion, re­trieval, and eval­u­a­tion work­flow for RAG and en­ter­prise search.

Integrated with Mistral Search Toolkit (public pre­view). OCR 4 is an in­ges­tion com­po­nent of Search Toolkit, Mistral’s open-source, com­pos­able search frame­work, an­nounced at the AI Now Summit. Its struc­tured out­put sup­plies ci­ta­tion-ready in­puts to the toolk­it’s in­ges­tion, re­trieval, and eval­u­a­tion work­flow for RAG and en­ter­prise search.

Multilingual cov­er­age. Support for 170 lan­guages across 10 lan­guage groups, with mea­sur­able gains on spe­cial­ized and low-re­source lan­guages where sev­eral com­pet­ing sys­tems de­grade.

Multilingual cov­er­age. Support for 170 lan­guages across 10 lan­guage groups, with mea­sur­able gains on spe­cial­ized and low-re­source lan­guages where sev­eral com­pet­ing sys­tems de­grade.

Run on your own in­fra­struc­ture. OCR 4 is com­pact enough to de­ploy on a sin­gle con­tainer, keep­ing doc­u­ment data in your en­vi­ron­ment for res­i­dency, sov­er­eignty, and com­pli­ance, while sup­port­ing cost-ef­fi­cient, high-through­put batch pro­cess­ing. Self-managed de­ploy­ment is avail­able to en­ter­prise cus­tomers.

Run on your own in­fra­struc­ture. OCR 4 is com­pact enough to de­ploy on a sin­gle con­tainer, keep­ing doc­u­ment data in your en­vi­ron­ment for res­i­dency, sov­er­eignty, and com­pli­ance, while sup­port­ing cost-ef­fi­cient, high-through­put batch pro­cess­ing. Self-managed de­ploy­ment is avail­able to en­ter­prise cus­tomers.

Overview

Mistral OCR 4 ex­tracts and struc­tures con­tent from a wide range of doc­u­ments. Where pre­vi­ous gen­er­a­tions fo­cused on con­vert­ing a page into clean text and ta­bles, OCR 4 re­turns a struc­tured rep­re­sen­ta­tion of the doc­u­ment. Each block is lo­cal­ized with a bound­ing box, clas­si­fied by type, and in­line con­fi­dence scores are gen­er­ated per-page and per-word. Downstream sys­tems, there­fore, have ac­cess not only to what the doc­u­ment says but also to where each el­e­ment sits, what role it plays, and how con­fi­dent the model is in each re­gion.

This struc­ture sup­ports sev­eral down­stream work­loads:

Semantic chunk­ing for RAG: clean, clas­si­fied blocks be­come bet­ter re­trieval units.

Semantic chunk­ing for RAG: clean, clas­si­fied blocks be­come bet­ter re­trieval units.

Structural prim­i­tives for agents: agents move from read­ing doc­u­ments to act­ing on them (form fill­ing, in­voice pro­cess­ing, com­pli­ance checks).

Structural prim­i­tives for agents: agents move from read­ing doc­u­ments to act­ing on them (form fill­ing, in­voice pro­cess­ing, com­pli­ance checks).

Structured con­tent for con­nec­tors: con­sis­tent, typed out­put for in­ges­tion and in­dex­ing pipelines.

Structured con­tent for con­nec­tors: con­sis­tent, typed out­put for in­ges­tion and in­dex­ing pipelines.

OCR 4 ac­cepts com­mon en­ter­prise for­mats, in­clud­ing PDF, DOC, PPT, and OpenDocument, and sup­ports 170 lan­guages across 10 lan­guage groups, in­clud­ing spe­cial­ized and low-re­source lan­guages that many sys­tems han­dle poorly. As a com­pact model de­ploy­able in a sin­gle con­tainer, it is suited to both cost-sen­si­tive and high-vol­ume de­ploy­ments. It can run fully self-hosted, al­low­ing or­ga­ni­za­tions with data-sov­er­eignty re­quire­ments to keep doc­u­ment data within their own in­fra­struc­ture.

Developers in­te­grate the model via API, and teams can use Document AI in Mistral Studio for an ap­pli­ca­tion-level, no-code path to the same en­gine. Mistral OCR 4 through the API is priced at $4 per 1,000 pages, with a 50% Batch-API dis­count, re­duc­ing the cost to $2 per 1,000 pages. Document AI is priced at $5 per 1,000 pages.

Benchmarks

We bench­marked Mistral OCR 4 against the lead­ing agen­tic doc­u­ment parsers across a chart and fig­ure dense fi­nan­cial QA dataset and reached equiv­a­lent ac­cu­racy at roughly 8x lower cost and 17x lower la­tency. For pro­duc­tion use cases at scale, that delta com­pounds fast.” - Aidan Donohue, AI Engineer, Rogo

To eval­u­ate OCR 4, we com­pared it against lead­ing AI-native OCR mod­els, fron­tier gen­eral-pur­pose mod­els, en­ter­prise doc­u­ment ser­vices, and our own Mistral OCR 3.

Human Preference Evaluations

Automated bench­marks carry the scor­ing ar­ti­facts de­scribed above, so we com­ple­mented them with a head-to-head hu­man eval­u­a­tion on doc­u­ments cho­sen to re­flect real us­age. We as­sem­bled 600+ doc­u­ments across 12+ lan­guages, sourced from third-party ven­dors to rep­re­sent real in­dus­try use cases, and asked in­de­pen­dent an­no­ta­tors to blindly rank each com­peti­tor’s out­put against OCR 4′s, doc­u­ment by doc­u­ment.

Annotators pre­ferred OCR 4 in the ma­jor­ity of doc­u­ments across all sys­tems tested. Because these are hu­man judg­ments on re­al­is­tic doc­u­ments rather than string com­par­isons against fixed ref­er­ences, they side­step much of the an­no­ta­tion and for­mat­ting noise that af­fects au­to­mated scores.

Overall Performance

Mistral OCR is roughly 4x faster per page than our in­cum­bent provider, an im­pres­sive re­sult for the high-vol­ume dock­et­ing work­flows where speed is crit­i­cal to man­ag­ing our cus­tomers’ IP time­lines.”  - Ivan Mihailov, AI en­gi­neer, Anaqua

In ad­di­tion to plac­ing first in our hu­man pref­er­ences, OCR 4 achieves the top over­all score amongst the mod­els we tested on the pub­lic OlmOCRBench (85.20) and leads our in­ter­nal Crawl Multilingual eval­u­a­tion (.98), ahead of both AI-native and en­ter­prise so­lu­tions.

On OmniDocBench, OCR 4 achieves a score of 93.07. We re­port this fig­ure with a caveat: both OlmOCRBench and OmniDocBench have known lim­i­ta­tions in how they score cer­tain out­puts, and a sin­gle ag­gre­gate num­ber can both un­der­state and over­state real-world per­for­mance.

When we au­dited the mis­matches be­hind our scores, most were not model er­rors but ar­ti­facts of how the bench­marks com­pare out­put. The re­cur­ring cat­e­gories:

Ground-truth er­rors. Some ref­er­ence an­no­ta­tions are them­selves in­cor­rect: miss­ing or ex­tra text, tran­scrip­tions of redacted re­gions, or ty­pos (for ex­am­ple, a cited au­thor’s name mis­spelled in the ref­er­ence but read cor­rectly by the model from the page). The out­put matches the source doc­u­ment, yet it is still marked wrong.

Ground-truth er­rors. Some ref­er­ence an­no­ta­tions are them­selves in­cor­rect: miss­ing or ex­tra text, tran­scrip­tions of redacted re­gions, or ty­pos (for ex­am­ple, a cited au­thor’s name mis­spelled in the ref­er­ence but read cor­rectly by the model from the page). The out­put matches the source doc­u­ment, yet it is still marked wrong.

Equivalent math no­ta­tion. Different LaTeX that ren­ders iden­ti­cally is counted as a mis­match, The ren­dered equa­tion is cor­rect; the string com­par­i­son is not.

Equivalent math no­ta­tion. Different LaTeX that ren­ders iden­ti­cally is counted as a mis­match, The ren­dered equa­tion is cor­rect; the string com­par­i­son is not.

Equation seg­men­ta­tion. Whether an ex­pres­sion is emit­ted as a sin­gle equa­tion or split into sev­eral in­line frag­ments af­fects the match, even when the ren­dered con­tent is iden­ti­cal, be­cause the matcher can­not align the pieces.

Equation seg­men­ta­tion. Whether an ex­pres­sion is emit­ted as a sin­gle equa­tion or split into sev­eral in­line frag­ments af­fects the match, even when the ren­dered con­tent is iden­ti­cal, be­cause the matcher can­not align the pieces.

Multi-column read­ing or­der. Words split across a col­umn bound­ary (for ex­am­ple, certifi-cates”) and col­umn-or­der­ing as­sump­tions cause cor­rect ex­trac­tions to be scored as read­ing-or­der fail­ures.

Multi-column read­ing or­der. Words split across a col­umn bound­ary (for ex­am­ple, certifi-cates”) and col­umn-or­der­ing as­sump­tions cause cor­rect ex­trac­tions to be scored as read­ing-or­der fail­ures.

Block-type at­tri­bu­tion. The bench­mark does not ex­pect head­ers/​foot­ers in the out­put. To re­solve this we strip head­ers foot­ers from our out­put be­fore scor­ing. But the test then checks for a string that also hap­pens to be the ti­tle of the page which should ac­tu­ally be pre­sent and flags it in­cor­rectly.

Block-type at­tri­bu­tion. The bench­mark does not ex­pect head­ers/​foot­ers in the out­put. To re­solve this we strip head­ers foot­ers from our out­put be­fore scor­ing. But the test then checks for a string that also hap­pens to be the ti­tle of the page which should ac­tu­ally be pre­sent and flags it in­cor­rectly.

These ar­ti­facts con­cen­trate in math­e­mat­i­cal, sci­en­tific, and multi-col­umn doc­u­ments, and they more of­ten pe­nal­ize cor­rect out­put than re­ward in­cor­rect out­put. We there­fore treat the ag­gre­gate score as di­rec­tional rather than de­fin­i­tive.

These bench­marks are di­rec­tional. All com­peti­tor scores re­flect in­ter­nal re­pro­duc­tions. We rec­om­mend eval­u­at­ing on your own doc­u­ments.

Performance Details

Crawl Multilingual break­down. On our in­ter­nal mul­ti­lin­gual eval­u­a­tion, OCR 4 leads across all eight lan­guage groups — English, Western Europe, Eastern Europe, Middle Eastern, Chinese, East Asian, Southeast Asian, and spe­cial­ized lan­guages (Hindi, Japanese, Georgian, Bengali, Armenian, Hebrew, Greek, Gujarati, Tamil, Malayalam, Kannada, Telugu). The gap is widest for spe­cial­ized and low-re­source lan­guages, where many com­pet­ing sys­tems de­grade sharply, while OCR 4 main­tains high ac­cu­racy.

Recommended use cases

OCR 4 sup­ports both high-vol­ume pipelines and in­ter­ac­tive doc­u­ment work­flows, in­clud­ing:

Document pars­ing and ex­trac­tion: com­plex, mul­ti­lin­gual doc­u­ments.

Document pars­ing and ex­trac­tion: com­plex, mul­ti­lin­gual doc­u­ments.

Retrieval-Augmented Generation (RAG): struc­tured, clas­si­fied, ci­ta­tion-ready con­tent for se­man­tic chunk­ing and source-grounded an­swers. With Search Toolkit, OCR 4 out­put can be fed di­rectly into re­trieval pipelines.

Retrieval-Augmented Generation (RAG): struc­tured, clas­si­fied, ci­ta­tion-ready con­tent for se­man­tic chunk­ing and source-grounded an­swers. With Search Toolkit, OCR 4 out­put can be fed di­rectly into re­trieval pipelines.

Agentic work­flows: pro­vid­ing agents with the struc­tural prim­i­tives to com­plete tasks such as form fill­ing, in­voice pro­cess­ing, and com­pli­ance checks, es­pe­cially in le­gal, fi­nan­cial ser­vices, and health­care.

Agentic work­flows: pro­vid­ing agents with the struc­tural prim­i­tives to com­plete tasks such as form fill­ing, in­voice pro­cess­ing, and com­pli­ance checks, es­pe­cially in le­gal, fi­nan­cial ser­vices, and health­care.

Structured data pipelines us­ing con­fi­dence scores to en­able ef­fi­cient use of hu­man ver­i­fiers: form/​in­voice ex­trac­tion, redac­tions, and com­pli­ance-dri­ven processes.

Structured data pipelines us­ing con­fi­dence scores to en­able ef­fi­cient use of hu­man ver­i­fiers: form/​in­voice ex­trac­tion, redac­tions, and com­pli­ance-dri­ven processes.

Enterprise search and knowl­edge bases: OCR as a data-source com­po­nent for cus­tom in­ges­tion and en­tity ex­trac­tion.

Enterprise search and knowl­edge bases: OCR as a data-source com­po­nent for cus­tom in­ges­tion and en­tity ex­trac­tion.

Early users are ap­ply­ing OCR 4 to turn in­voices into struc­tured fields, dig­i­tize com­pany archives, ex­tract clean text from tech­ni­cal and sci­en­tific re­ports, and power en­ter­prise search.

A note on out-of-scope use. OCR 4 is a doc­u­ment-un­der­stand­ing model, not a de­ci­sion-maker. It is not in­tended for med­ical di­ag­no­sis, le­gal ad­vice or judg­ment, high-stakes fi­nan­cial de­ci­sions, safety-crit­i­cal sys­tems, real-time/​la­tency-sen­si­tive pro­cess­ing, or non-doc­u­ment in­puts (raw au­dio, video, etc.).

OCR 4 API: Understanding Your Options

Mistral’s OCR 4 is avail­able through a sin­gle API end­point. Every re­quest runs the same un­der­ly­ing OCR model and al­ways re­turns ex­tracted con­tent, bound­ing boxes, block types, con­fi­dence scores, and mark­down-struc­tured text. What varies is how much you layer on top.

Use OCR 4 in pure ex­trac­tion mode when you want to:

Embed fast, ac­cu­rate doc­u­ment ex­trac­tion di­rectly into your ap­pli­ca­tion, agent, or data pipeline.

Embed fast, ac­cu­rate doc­u­ment ex­trac­tion di­rectly into your ap­pli­ca­tion, agent, or data pipeline.

Work di­rectly with the raw re­sponse, bound­ing boxes, block types, and con­fi­dence scores to drive cus­tom down­stream logic.

Work di­rectly with the raw re­sponse, bound­ing boxes, block types, and con­fi­dence scores to drive cus­tom down­stream logic.

Run high-vol­ume or batch in­ges­tion with full con­trol over through­put and cost via the Batch API.

Run high-vol­ume or batch in­ges­tion with full con­trol over through­put and cost via the Batch API.

Self-host for strict data-pri­vacy, sov­er­eignty, or com­pli­ance re­quire­ments.

Self-host for strict data-pri­vacy, sov­er­eignty, or com­pli­ance re­quire­ments.

Activate Document AI ca­pa­bil­i­ties (same end­point, ad­di­tional pa­ra­me­ters) when you want to:

Return struc­tured JSON in a schema you de­fine — pass a JSON schema along­side your doc­u­ment, and the OCR out­put is fed to mis­tral-small-2603 to gen­er­ate con­tent shaped to your spec.

Return struc­tured JSON in a schema you de­fine — pass a JSON schema along­side your doc­u­ment, and the OCR out­put is fed to mis­tral-small-2603 to gen­er­ate con­tent shaped to your spec.

Annotate de­tected im­ages with struc­tured JSON by pass­ing an im­age an­no­ta­tion schema, trig­ger­ing an ad­di­tional vi­sion-lan­guage model call per im­age.

Annotate de­tected im­ages with struc­tured JSON by pass­ing an im­age an­no­ta­tion schema, trig­ger­ing an ad­di­tional vi­sion-lan­guage model call per im­age.

Use a cus­tom prompt along­side a JSON schema to guide how the ex­tracted con­tent of the full doc­u­ment is in­ter­preted or sum­ma­rized.

Use a cus­tom prompt along­side a JSON schema to guide how the ex­tracted con­tent of the full doc­u­ment is in­ter­preted or sum­ma­rized.

Enable busi­ness users, so­lu­tions teams, or pi­lots to pro­duce struc­tured re­sults with­out writ­ing down­stream pars­ing logic.

Enable busi­ness users, so­lu­tions teams, or pi­lots to pro­duce struc­tured re­sults with­out writ­ing down­stream pars­ing logic.

The prac­ti­cal de­ci­sion rule: if you need raw ex­tracted con­tent, use OCR 4 as-is. If you need the out­put re­shaped into a struc­tured for­mat, an­no­tated with do­main-spe­cific fields, or processed with a cus­tom in­struc­tion, add the Document AI pa­ra­me­ters to the same call. You al­ways get the OCR re­sult re­gard­less; Document AI sim­ply adds struc­tured lay­ers on top of it.

Now avail­able

The avail­abil­ity of Mistral Document AI with OCR 4 in Microsoft Foundry marks an im­por­tant mile­stone in our part­ner­ship. Together, we’re en­abling cus­tomers to bring ad­vanced, struc­tured doc­u­ment un­der­stand­ing di­rectly into their AI work­flows, com­bin­ing Mistral’s in­no­va­tion with Microsoft’s en­ter­prise plat­form to de­liver scal­able, trusted so­lu­tions for real-world busi­ness needs.”-Kimmi Grewal, VP, AI Ecosystem Partnerships, Microsoft

Both Mistral OCRv4 and Document AI (powered by OCRv4) are avail­able via API through Mistral Studio, Amazon SageMaker, Microsoft Foundry, and com­ing soon Snowflake Parse Document. For or­ga­ni­za­tions with strin­gent data-pri­vacy re­quire­ments, OCR 4 also of­fers a self-host­ing op­tion so sen­si­tive in­for­ma­tion stays within your own in­fra­struc­ture. To ex­plore self-de­ploy­ment, let us know.

Get started

We of­fer a few ways to get started and learn more quickly.

Try OCR 4. The new Getting Started with OCR 4 Cookbook walks through a first ex­trac­tion, work­ing with bound­ing boxes, and block clas­si­fi­ca­tion.

Try OCR 4. The new Getting Started with OCR 4 Cookbook walks through a first ex­trac­tion, work­ing with bound­ing boxes, and block clas­si­fi­ca­tion.

OCR 4 we­bi­nar. We’ll cover what’s new in OCR 4 with demos and Q&A on July 7th at 6:00 PM CET. Register for the OCR4 in Production we­bi­nar.

OCR 4 we­bi­nar. We’ll cover what’s new in OCR 4 with demos and Q&A on July 7th at 6:00 PM CET. Register for the OCR4 in Production we­bi­nar.

Contact Sales for more in­for­ma­tion.

Contact Sales for more in­for­ma­tion.

OCR 4

Premier

The world’s best doc­u­ment ex­trac­tion and un­der­stand­ing model.

OCR

Multimodal

Text-to-text

In memory of the man who put red and green squiggles under words

devblogs.microsoft.com

I re­cently learned of the pass­ing of some­one whose work nearly every­body knows, but no­body knows his name.

Tony Krueger is re­mem­bered in Wikipedia as the per­son who ported the game Chip’s Challenge to Windows for the Windows Entertainment Pack.¹ But that’s prob­a­bly not the code he wrote that touched the most peo­ple.

Tony worked on Word 1.0, 1.1, 2.0, then on Word for OS/2 and Word for Mac, then re­turned to Word 6.0 and sev­eral ver­sions be­yond that. He prob­a­bly holds the record for most ver­sions of Word shipped.”

In early ver­sions of Word, the Spell Check fea­ture was some­thing that you ex­plic­itly in­voked, and then you had to sit and wait while the pro­gram looked for all your po­ten­tially-mis­spelled words, and then showed them to you one at a time for a de­ci­sion on what to do for each one. Word did in­tro­duce an Auto Spell Check fea­ture to run spell check when the user was idle, so that when you hit the Spell Check but­ton, the re­sults were ready to go. However, the Auto Spell Check was still a block­ing op­er­a­tion. As a re­sult, a lot of users turned it off be­cause it al­ways seemed to de­cide Now would be a good time to spell-check the doc­u­ment” just as you wanted to do some­thing, forc­ing you to wait for the spell check pass to com­plete be­fore you could, say, save and exit.

Tony made the spell checker much more un­ob­tru­sive so that it did­n’t in­ter­fere with your fore­ground work. And when it found a prob­lem, in­stead of wait­ing for you to trig­ger a spell check, it im­me­di­ately drew red squig­gles un­der po­ten­tially-mis­spelled words (and later green squig­gles un­der po­ten­tial gram­mat­i­cal er­rors).

Tony was an early fan of the magic/​com­edy team Penn and Teller. A friend and col­league at­tended a show and hung out af­ter­ward to ask the duo to sign a photo for his friend Tony. He was on the team that did the red and green squig­gles in Word.”

Upon hear­ing this, Penn Jillette an­nounced in his sten­to­rian voice which filled the en­tire the­ater: The red and green squig­gles!? I love the red and green squig­gles!” Teller silently con­curred.

Tony re­ceived that au­to­graphed photo for his birth­day, and it was­n’t clear which he was more happy about, the au­to­graphed photo or the fact that Penn and Teller loved his fea­ture.

Many years later, Weird Al” Yankovic recorded a par­ody video ti­tled Word Crimes, in which the Word red squig­gles make a brief ap­pear­ance. That same friend got Weird Al” to au­to­graph the screen shot.

Today, there are red (and even green and blue) squig­gles in nearly every word proces­sor, and of­ten out­side word proces­sors. Tony did it first. The next time a red squig­gle catches one of your mis­takes, say thanks to Tony. I think he’d ap­pre­ci­ate it.

¹ Probably not as widely doc­u­mented is that he ac­com­plished this with­out the source code: He re­verse-en­gi­neered the MS-DOS ver­sion and then reim­ple­mented it for Windows.

Author

Raymond has been in­volved in the evo­lu­tion of Windows for more than 30 years. In 2003, he be­gan a Web site known as The Old New Thing which has grown in pop­u­lar­ity far be­yond his wildest imag­i­na­tion, a de­vel­op­ment which still gives him the hee­bie-jee­bies. The Web site spawned a book, co­in­ci­den­tally also ti­tled The Old New Thing (Addison Wesley 2007). He oc­ca­sion­ally ap­pears on the Windows Dev Docs Twitter ac­count to tell sto­ries which con­vey no use­ful in­for­ma­tion.

TikZ Editor

tikz.dev

The Coming Loop

lucumr.pocoo.org

writ­ten on June 23, 2026

I don’t prompt Claude any­more. I have loops run­ning that prompt Claude and fig­ur­ing out what to do. My job is to write loops. — Boris Cherny

I don’t prompt Claude any­more. I have loops run­ning that prompt Claude and fig­ur­ing out what to do. My job is to write loops.

— Boris Cherny

Over the last months I have watched more and more peo­ple build some­thing on top of cod­ing agents that feels mean­ing­fully dif­fer­ent from just us­ing a cod­ing agent. Some of this hap­pens on top of Pi which is cool to see for sure! The pat­tern is the same every­where though: work is put into a queue of sorts, a ma­chine picks it up, at­tempts it, stops, and then some har­ness de­cides whether that was ac­tu­ally the end.

If not, the har­ness con­tin­ues the same ses­sion, in­jects an­other mes­sage, starts a fresh ses­sion with mod­i­fied con­text, or sends the task to an­other ma­chine. The task stays alive be­yond the point where the model by it­self would nor­mally have said: I am done.”

I think about that type of loop more than I want to ad­mit.

There is al­ready an agent loop in­side every cod­ing agent. The model calls a tool, in­cor­po­rates the re­sult, calls an­other tool, reads a file, ed­its a file, runs tests, and even­tu­ally pro­duces some an­swer. That loop is one we have been quite fa­mil­iar with for a long time. The other loop is the har­ness level loop: the loop out­side the agent loop. That loop is also not new. We have been do­ing ver­sions of this since early Claude Code days, but that loop is be­com­ing ever more pre­sent in agen­tic en­gi­neer­ing and in re­cent weeks it has started to dom­i­nate the Twitter dis­course.

I Am Not Good At This Yet

My cur­rent sta­tus is that I have not had much suc­cess with this way of work­ing for code I deeply care about which turns out to be quite a lot of code.

Part of that is taste and part of it is con­trol. I at­tempt to set a high bar for what I want code to look like, and I want to un­der­stand the code I ship. Under pres­sure, or in a dis­cus­sion with an­other hu­man, I want to be able to ex­plain what the sys­tem does with­out first hav­ing to ask a clanker to ex­plain it to me. Now there is ob­vi­ously a ques­tion if this de­sire to un­der­stand the code is one that I will still have a few years from now. For now I have not moved past the point of com­pre­hen­sion be­ing im­por­tant to me.

Given this de­sire, there is some­thing I lack with my ex­pe­ri­ence of code writ­ten with­out me pay­ing at­ten­tion, par­tic­u­larly from loops. Present-day mod­els tend to pro­duce code that is too de­fen­sive, too com­plex, too lo­cal in its rea­son­ing. They avoid strong in­vari­ants. They add fall­backs in­stead of mak­ing bad states im­pos­si­ble. They du­pli­cate code, in­vent bad ab­strac­tions, and pa­per over un­clear de­sign with more ma­chin­ery. Worse though: I so far see very lit­tle progress of this im­prov­ing. If any­thing, on that front it feels to me that we might even be mak­ing steps in the wrong di­rec­tion. At least for my taste, pre­sent-day hands-off har­nesses like Claude Code with ul­tra­code pro­duce worse code than what we were pro­duc­ing last au­tumn. That’s be­cause Claude Code, with Fable for in­stance will be work­ing un­in­ter­rupted on a prob­lem for thirty min­utes or more, when pre­vi­ously the process would have been much more hu­man in the loop.

Furthermore it’s well un­der­stood that mod­els tend to ob­serve some lo­cal fail­ure and add a lo­cal de­fense. Karpathy men­tioned how they are mortally ter­ri­fied of ex­cep­tions”. In sys­tems with im­por­tant in­vari­ants, es­pe­cially per­sisted data for­mats or core in­fra­struc­ture, the right fix is not handle every mal­formed case.” The right fix is to make the mal­formed case un­rep­re­sentable or im­pos­si­ble to write in the first place. Yet even with a lot of man­ual steer­ing, that type of code does not come out of LLMs nat­u­rally, and even if the code comes out nat­u­rally like that, they will still at­tempt to han­dle now im­pos­si­ble er­rors.

When you take that be­hav­ior and you put it be­hind loops, you tend to am­plify it. If each it­er­a­tion adds an­other small de­fense, the sys­tem slowly be­comes less un­der­stand­able while ap­pear­ing more ro­bust. The more hands-off you are, the more that hap­pens. It also teaches re­ally bad prac­tices when tools like this are given to ju­niors with­out clear guid­ance. Because if you ask them, why they are do­ing all that, they will con­vinc­ingly ar­gue their case.

Where Loops Work

At the same time, it would be dis­hon­est to pre­tend the loop pat­tern does not work be­cause it al­ready works as­ton­ish­ingly well in some do­mains.

Porting code one of them. There are al­ready im­pres­sive ex­am­ples of large au­to­matic port­ing ef­forts, in­clud­ing the re­ported work around mov­ing parts of Bun from Zig to Rust. I have used it with suc­cess my­self to port MiniJinja to Go. Performance ex­plo­rations are an­other case where this works beau­ti­fully. A ma­chine can try ex­per­i­ments, bench­mark them, dis­card fail­ures, and keep search­ing. Security scan­ning fits nat­u­rally too and so does al­most any type of re­search: ask­ing a sys­tem to ex­plore a com­plex prob­lem space and re­port back with­out nec­es­sar­ily com­mit­ting last­ing code. One thing that many of these have in com­mon is that they ei­ther do not gen­er­ate new code, but trans­form code that al­ready ex­ists, or they pro­duce code that in­ten­tion­ally does not have a long shelf life. They ei­ther pro­duce proof of con­cepts or ideas, sur­face find­ings or are more akin to mech­ni­cal trans­for­ma­tion.

I be­lieve that loops that pro­duce ar­ti­facts with­out ne­ces­sity of longevity or that cre­ate some form of clearly ver­i­fi­able mech­ni­cal trans­la­tion mat­ters more than the gen­eral abil­ity of a har­ness to me­chan­i­cally mea­sure a goal. Many suc­cess­ful ap­pli­ca­tions of loops use an­other LLM as a judge or as an or­ches­tra­tor. The mech­ni­cal trans­la­tion case can be ver­i­fied with a bi­nary test case, but it can also be judged by an LLM in­stead!

Claude Code, for in­stance, is in­creas­ingly good at cre­at­ing en­tire ex­per­i­men­tal work­flows that it will then ex­e­cute. Sure, the code it pro­duces is slop, but that’s more the fault of the model than the har­ness not be­ing a good judge on if a step in the work­flow re­sulted in a net im­prove­ment or com­ple­tion.

The har­ness just needs some sig­nal that lets it con­tinue. It does not have to be ob­jec­tive or bi­nary, it just has to be use­ful enough to drive an­other it­er­a­tion.

I ab­solutely love loops al­ready that take the bor­ing parts out of my day to ex­per­i­ment and mea­sure and to give me ideas.

Software As Organism

On the other hand us­ing that same loop­ing method­ol­ogy to write last­ing code does not yet sit well with me. The metaphor I like to reach for is one of mov­ing from soft­ware as a de­ter­min­is­tic ma­chine to soft­ware as an or­gan­ism.

I be­came a soft­ware en­gi­neer in an en­v­iorn­ment that en­cour­aged me to un­der­stand the ma­chine. There was al­ways a layer you could peel off to deepen your un­der­stand­ing. Machines that did not ex­hibit de­ter­min­is­tic ob­serv­able be­hav­ior were maybe ac­cepted, but gen­er­ally seen as not ex­actly op­ti­mal. Software ar­chi­tec­ture-wise, I saw it as de­sire­able to push fur­ther to­wards more de­ter­min­ism rather than less. Likewise the abil­ity to un­der­stand the code has been an un­de­ni­able goal. In prac­tice not al­ways pos­si­ble we still took pride in writ­ing code so that it be­came pos­si­ble even for new en­gi­neers to nav­i­gate com­plex code bases through clever ar­chi­tec­ture. On well de­signed sys­tems there were al­ways en­gi­neers that knew where the in­vari­antes lived, which parts were load-bear­ing and which changes were safe. Ideally all of that was also well doc­u­mented. Where that un­der­stand­ing was lack­ing, it was gen­er­ally re­garded as some­thing to im­prove upon.

Obviously that ideal has al­ways been strained. Many soft­ware sys­tems, es­pe­cially very suc­cess­ful ones had pe­ri­ods where en­gi­neers on the team were able to keep them clean. Large soft­ware sys­tems are not in­fre­quently too big, too dy­namic and too de­pen­dent on ex­ter­nal ser­vices to fit into any­one’s head. Even with­out LLMs we al­ready di­ag­nose dis­trib­uted sys­tems some­what like doc­tors in that we ob­serve symp­toms, form hy­pothe­ses, order more tests”, try some reme­dies, and ob­serve again.

Yet with LLMs we’re push­ing much fur­ther in that di­rec­tion and much quicker. We use them to write the code and we also use them for di­ag­no­sis and rem­edy. There are plenty of en­gi­neers that al­ready live in a world in which the first step af­ter the oc­cur­rence of a pro­duc­tion is­sue is fol­lowed by hav­ing a clanker read logs, pro­pose root causes and proac­tively put up a patch. The re­sult­ing patch is then of­ten picked up by an­other ma­chine that re­views, some­times even land­ing it on main with­out any hu­man su­per­vi­sion.

Obviously that is pow­er­ful and I can­not deny that it sounds ap­peal­ing. But giv­ing in to that idea, par­tic­u­larly with less and less hu­man over­sight means ac­cept­ing that we may no longer un­der­stand the whole sys­tem in the same way. We treat it, we mon­i­tor it, we sta­bi­lize it, but we do not nec­es­sar­ily com­pre­hend it.

I have no doubts that for some soft­ware, that is okay. Not every line of code de­serves hu­man au­thor­ship and worse code might have been writ­ten in the past.

But do I want all soft­ware to be au­thored this way?

You Cannot Quite Opt Out

What’s very un­com­fort­able is that opt­ing out of this fully ma­chine-dri­ven fu­ture may not be an op­tion.

Security is the clear­est ex­am­ple to­day. Even if you do not use loops to build your soft­ware, other peo­ple will use loops against your soft­ware. Attackers will run ma­chines con­tin­u­ously and even if it’s not at­tack­ers, then se­cu­rity re­searchers will and some of that au­to­mated work will throw up dust but also find real is­sues. And both the sig­nal and the noise will come your way at a vol­ume that makes it al­most im­pos­si­ble to deal with un­less you your­self throw a ma­chine at the prob­lem.

Daniel Stenberg’s post about curl’s sum­mer of bliss is a good ex­am­ple of the pres­sure main­tain­ers are al­ready un­der. As far as I know, AI does not play a tremen­dous role in the core de­vel­op­ment of curl to­day. Yet de­spite all of this, main­tain­ers are over­whelmed by re­ports, most of which are now AI-generated ones.

If at­tack­ers and re­porters loop, de­fend­ers will even­tu­ally need to loop too to keep up. Maybe not to write patches di­rectly, maybe just to triage and re­pro­duce and pres­sure will in­crease.

The same is true com­pet­i­tively as some teams will out-build oth­ers through raw speed. Some pro­jects will sud­denly move faster be­cause a tiny group fig­ures out how to or­ches­trate ma­chines ef­fec­tively. Some star­tups will do with five peo­ple what used to re­quire fifty. Some peo­ple might lit­er­ally put a ma­chine against your prod­uct in a loop and ask it to make it like the other one.” And if their users are happy, does it re­ally mat­ter?

Not all soft­ware will be equally af­fected. Some do­mains will pun­ish slop­pi­ness and de­mand trust and re­spon­si­bil­ity, but a lot of soft­ware lives in a world where raw speed, quick ex­per­i­men­ta­tion, and vast cov­er­age mat­ter enor­mously.

Building New Dependencies

The scari­est part to me is that we be­come de­pen­dent on these new ma­chines in new ways. Software has al­ways de­pended on tools. I re­mem­ber the time when I had to pay for com­pil­ers. These new tools are a flash­back to times where cre­at­ing soft­ware came with real costs. But now it’s no longer a one-time pay­ment, it’s a con­stant de­pen­dency. Not just a de­pen­dency on a filled wal­let, but also a cog­ni­tive de­pen­dency.

If a code­base is pro­duced by loops, re­viewed by loops, patched by loops, and kept alive by loops, what hap­pens when you no longer have ac­cess to the same class of sys­tems? What hap­pens when some trade re­stric­tions take away ac­cess to the most pow­er­ful mod­els? What if just the cost be­comes un­bear­able? What if you and your team just lose the last re­main­ing abil­ity to un­der­stand the code with­out us­ing the ma­chine?

We may cre­ate code­bases that are not merely hard to main­tain by hu­mans, but that as­sume ma­chine par­tic­i­pa­tion as part of their main­te­nance model. This is al­ready hap­pen­ing! It’s not hap­pen­ing every­where, and it might not even be hap­pen­ing in ways that are seen as prob­lem­atic, but we see more and more of it. People more and more merge code they can­not fully ex­plain. People lose their abil­ity to cre­ate is­sue re­ports or dis­cuss things in chat, with­out aug­ment­ing or rephras­ing their mes­sages with the con­text pro­vided by a clanker. Too many peo­ple in­creas­ingly rely on a ma­chine to sum­ma­rize or con­tex­tu­al­ize it. More and more do I en­counter peo­ple who con­verse with me through the in­di­rec­tion of an LLM.

Again, maybe that is not even go­ing to be wrong, but it’s a mas­sive change to how we did things.

Future Harnesses

I have lit­tle doubt that this is where things are go­ing but go­ing there will re­quire us to do some­thing about our tool­ing every­where, and not just in the cod­ing agents.

Just or­ches­trat­ing more loops won’t be enough. Better vi­su­al­iza­tions of changes or or­ches­tra­tion or agents will not re­store our un­der­stand­ing. Either we need to find clever ways to jolt the hu­man back into the loop and make the changes of the loops leg­i­ble long term, or we need to find bet­ter ways to com­pose these ever more com­plex sys­tems.

This is also where my think­ing about the role of Pi is chang­ing. Pi has been cau­tious, and I think that cau­tion is good. I do not want a fu­ture where every in­ter­ac­tion turns into an un­con­trolled swarm of ma­chines mak­ing changes I can­not fol­low. I would not want Pi to be­come an un­main­tain­able mess in an ef­fort to win the race to­wards soft­ware that writes it­self and I would not want Pi to pro­mote this type of en­gi­neer­ing ei­ther. At the same time Pi is a har­ness and har­nesses are at the cen­ter of peo­ple run­ning these new types of ex­per­i­ments.

Task queues for cod­ing tasks, or­ches­tra­tion of agents, sub­agents, durable ses­sions will mat­ter more and more. Even those of us who have their reser­va­tions and are not blindly em­brac­ing loops will have to start do­ing those ex­per­i­ments. We need to, be­cause we need to un­der­stand how to make this fu­ture bounded and sur­viv­able.

Controlling Loops

As you can read from this post, I’m very un­easy about this fu­ture. Not cause of fear, but be­cause of cau­tion given ex­pe­ri­ences with this tech­nol­ogy so far.

Adopting the idea of har­ness loops means that the har­ness de­cides when work is fin­ished. In the agent loop, the model even­tu­ally says done” and I re­view. Even be­fore that, I usu­ally steer along the way. I am in­volved and I en­joy learn­ing along the way. In the har­ness op­er­ated loop I’m not sure what my role even is. Even the done” sig­nal loses all mean­ings and just be­comes com­mu­ni­cated to yet an­other ma­chine that judges. My role is re­duced to that of a mes­sen­ger.

Today I do not like much of the code that I see from sys­tems built that way and nei­ther do I en­joy in­ter­act­ing with too much of soft­ware built with AI as­sis­tence. Looping is pow­er­ful but it re­moves re­spon­si­bil­ity more and more, and it at least to­day very much en­cour­ages us to give in to the ma­chine.

And yet I have no doubts that this loop­ing fu­ture is go­ing to be our fu­ture de­spite the fact that I presently re­sent it. I al­ready see as­ton­ish­ingly small teams build­ing at im­pos­si­ble speed and I see code­bases turn­ing more and more into ob­scure and con­fus­ing or­gan­isms that can only be di­ag­nosed by more ma­chines. Those code­bases are si­mul­tan­iously use­ful and messy.

So I guess I’m com­ing to terms with that the ques­tion is not whether we will loop be­cause clearly we will. Maybe the ques­tion is that in a fu­ture of loops, how do we don’t ab­di­cate judg­ment, how we can re­tain rules of good en­gi­neer­ing, how we can en­sure that re­spon­si­ble hu­man can con­tinue to su­per­vise, how we need to re-think how we ar­chi­tect code to re­tain san­ity along the way.

This en­try was tagged

ai and pi

copy as / view mark­down

Extreme Heat: Improving governance and strengthening action around the world - cancelled - Grantham Research Institute on climate change and the environment

www.lse.ac.uk

Credit: mam­muth/​is­tock

We re­gret that this event has been can­celled due to the red ex­treme heat warn­ing is­sued by the UK Met Office.

This London Climate Action Week event will open with the an­nounce­ment of the in­au­gural Adeline Stuart-Watt Award win­ner and will be fol­lowed by a ses­sion fo­cused on im­prov­ing ex­treme heat gov­er­nance and ac­tion around the world. Hosted in col­lab­o­ra­tion with the Zurich Climate Resilience Alliance.

The Adeline Stuart-Watt Award cel­e­brates the legacy of Ade­line Stuart-Watt, a highly re­spected and very sadly missed friend and col­league at the LSEs Grantham Research Institute on Climate Change and the Environment and the Zurich Climate Resilience Alliance. The Award recog­nises out­stand­ing, pol­icy-rel­e­vant re­search con­tri­bu­tions to the field of cli­mate adap­ta­tion and re­silience by post­grad­u­ate stu­dents.

An overview of the Award process will be pro­vided by Candice Howarth, fol­lowed by the Award win­ner be­ing an­nounced by Professor Lord Nicholas Stern. The win­ner will then re­ceive their award be­fore pre­sent­ing an overview of the win­ning topic. The Adeline Stuart-Watt Award is gen­er­ously sup­ported by the Z Zurich Foundation.

The sec­ond part of the event will fo­cus on Extreme Heat: Improving Governance and strength­en­ing ac­tion around the world.

Chaired by Swenja Suminski, this sec­tion of the event will ex­plore the crit­i­cal need to im­prove ex­treme heat gov­er­nance glob­ally. The event will bring to­gether ex­per­tise from the Grantham Research Institute and Zurich Climate Resilience Alliance part­ners Mercy Corps, Practical Action and the IFRC along­side in­put from lead­ing global part­ners.

The event will share new analy­sis of ex­treme heat gov­er­nance progress and chal­lenges across coun­tries where the Zurich Climate Resilience Alliance op­er­ates along with sto­ries from county pro­grammes.

The event will fin­ish with a fire side chat ses­sion where speak­ers will re­flect on key chal­lenges and op­por­tu­ni­ties for ad­vanc­ing ex­treme heat gov­er­nance glob­ally.

Meet our con­trib­u­tors:

Professor Lord Nicholas Stern, Chair of the LSE Grantham Research Institute

Candice Howarth, Research Director at Quadrature Climate Foundation and Visiting Professor in Practice at the LSE Grantham Research Institute

Swenja Surminski, Managing Director Climate and Sustainability at Marsh and Professor in Practice at the LSE Grantham Research Institute

Anna Beswick, Senior Policy Fellow (Adaptation and Resilience) at the LSE Grantham Research Institute

Martina Podesta, Policy Officer (Adaptation Policy and Governance) at the LSE Grantham Research Institute

Marc Gordon, Global Lead, Extreme Heat Risk Reduction & Senior Coordinator of the Midterm Review of the Sendai Framework (MTR SF), Centre of Excellence for Climate and Disaster Resilience at the United Nations

Ninni Ikkala Nyman, Lead, Climate Change, International Federation of Red Cross and Red Crescent Societies.

Olivia Shears, Head of CCRA and adap­ta­tion progress re­port­ing, UK Climate Change Committee.

Mary McBryde, Chief Program Officer, HERA.

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

Visit pancik.com for more.