10 interesting stories served every morning and every evening.

Enabling ai co author by default by cwebster-99 · Pull Request #310226 · microsoft/vscode

github.com

Skip to con­tent

Navigation Menu

AI CODE CREATIONGitHub CopilotWrite bet­ter code with AIGitHub SparkBuild and de­ploy in­tel­li­gent apps­GitHub ModelsManage and com­pare promptsMCP RegistryNewIntegrate ex­ter­nal toolsView all fea­tures

AI CODE CREATIONGitHub CopilotWrite bet­ter code with AIGitHub SparkBuild and de­ploy in­tel­li­gent apps­GitHub ModelsManage and com­pare promptsMCP RegistryNewIntegrate ex­ter­nal tools

AI CODE CREATION

GitHub CopilotWrite bet­ter code with AI

GitHub CopilotWrite bet­ter code with AI

GitHub SparkBuild and de­ploy in­tel­li­gent apps

GitHub SparkBuild and de­ploy in­tel­li­gent apps

GitHub ModelsManage and com­pare prompts

GitHub ModelsManage and com­pare prompts

MCP RegistryNewIntegrate ex­ter­nal tools

MCP RegistryNewIntegrate ex­ter­nal tools

View all fea­tures

Pricing

Provide feed­back

Saved searches

Use saved searches to fil­ter your re­sults more quickly

Sign up

Appearance set­tings

Notifications

You must be signed in to change no­ti­fi­ca­tion set­tings

Fork

39.7k

Star

185k

Star

185k

Merged

Conversation

Pull re­quest overview

This PR changes the Git ex­ten­sion’s git.ad­dAICoAu­thor set­ting so that AI co-au­thor trail­ers are en­abled by de­fault, mak­ing the de­fault be­hav­ior au­to­mat­i­cally add a Co-authored-by trailer when AI-generated code con­tri­bu­tions are de­tected.

Changes:

Updates git.ad­dAICoAu­thor con­fig­u­ra­tion de­fault from off” to all”.

Copilot’s find­ings

Files re­viewed: 1/1 changed files

Comments gen­er­ated: 1

Screenshot Changes

Base: 3c1b53dd Current: eec3f9cf

Changed (3)

blocks-ci screen­shots changed

Replace the con­tents of test/​com­po­nent­Fix­tures/​blocks-ci-screen­shots.md with:

<!– auto-gen­er­ated by CI — do not edit man­u­ally –>

#### ed­i­tor/​codeEd­i­tor/​CodeEd­i­tor/​Dark

![screenshot](https://​hediet-screen­shots.azureweb­sites.net/​im­ages/​cb32a3e854b5734fe5aa­ca2318f2e0a42ee821b05ea97883ea42c5ba95ed­b3c3)

#### ed­i­tor/​codeEd­i­tor/​CodeEd­i­tor/​Light

![screenshot](https://​hediet-screen­shots.azureweb­sites.net/​im­ages/​42624f­b­ba5e0d­b7f32c224b5e­b9c5d­d3b08245697ae2e7d2a88be0d7c287129b)

NoiceBroice

ref­er­enced

this pull re­quest

in ThomasSnowden37/Harmoniq-Charts

Co-authored-by: Copilot <copi­lot@github.com>

Closed

srid

men­tioned this pull re­quest

Merged

Open

mi­crosoft

locked as spam and lim­ited con­ver­sa­tion to col­lab­o­ra­tors

Labels

None yet

VideoLAN / dav2d · GitLab

code.videolan.org

NetHack 5.0.0: Release Notes

nethack.org

The NetHack DevTeam is an­nounc­ing the re­lease of NetHack 5.0.0 on

May 2, 2026

NetHack 5.0 is an en­hance­ment to the dun­geon ex­plo­ration game NetHack,

which is a dis­tant de­scen­dent of Rogue and Hack, and a di­rect de­scen­dent

of NetHack 3.6.

NetHack 5.0.0 is a re­lease of NetHack. As a .0 ver­sion, there may be some

bugs en­coun­tered. Constructive sug­ges­tions, GitHub pull re­quests, and bug

re­ports are all wel­come and en­cour­aged.

Along with the game im­prove­ments and bug fixes, NetHack 5.0 strives to make

some gen­eral ar­chi­tec­tural im­prove­ments to the game or to its build­ing

process. Among them, 5.0:

Has its source code com­pli­ant with the C99 stan­dard.

Removes bar­ri­ers to build­ing NetHack on one plat­form and op­er­at­ing sys­tem,

for later ex­e­cu­tion on an­other (possibly quite dif­fer­ent) plat­form and/​or

op­er­at­ing sys­tem. That ca­pa­bil­ity is gen­er­ally known as cross-compiling.”

See the file Cross-compiling” in the top-level folder for more in­for­ma­tion

on that.

The build-time yacc and lex”-based level com­piler, the

yacc and lex”-based dun­geon com­piler, and the quest text file pro­cess­ing

pre­vi­ously done by NetHack’s makedefs” util­ity, have been re­placed with

Lua text al­ter­na­tives that are loaded and processed by the game dur­ing play.

A list of over 3100 fixes and changes can be found in the game’s sources

in the file doc/​fix­es5 – 0-0.txt. The text in there was writ­ten for the

de­vel­op­ment team’s own use and is pro­vided as is”. Some en­tries might be

con­sid­ered spoilers”, par­tic­u­larly in the new fea­tures” sec­tion.

Existing saved games and bones files will not work with NetHack 5.0.0.

Checksums (sha256) of bi­na­ries that you have down­loaded from nethack.org

can be ver­i­fied on Windows plat­forms us­ing:

  certUtil -hashfile nethack-500-win-x64.zip SHA256

or

  certUtil -hashfile nethack-500-win-ar­m64.zip SHA256

The fol­low­ing com­mand can be used on most plat­forms to help con­firm the lo­ca­tion of

var­i­ous files that NetHack may use:

  nethack –showpaths

As with all re­leases of the game, we ap­pre­ci­ate your feed­back. Please sub­mit any

bugs us­ing the prob­lem re­port form. Also, please check the known bugs” list

be­fore you log a prob­lem - some­body else may have al­ready found it.

Happy NetHacking!

This Month in Ladybird - April 2026 - Ladybird

ladybird.org

Hello friends! In April we merged 333 PRs from 35 con­trib­u­tors, 7 of whom made their first-ever com­mit to Ladybird! Here’s what we’ve been up to.

Ladybird is en­tirely funded by the gen­er­ous sup­port of com­pa­nies and in­di­vid­u­als who be­lieve in the open web. This month, we’re ex­cited to wel­come the fol­low­ing new spon­sors:

Human Rights Foundation (via the AI for Individual Rights” pro­gram) with $50,000

Jakub Stęplowski with $1,000

We’re in­cred­i­bly grate­ful for their sup­port. If you’re in­ter­ested in spon­sor­ing the pro­ject, please con­tact us.

Inline PDF viewer

PDFs now ren­der in­line through the bun­dled pdf.js viewer (#9132). pdf.js is a full-fea­tured PDF viewer writ­ten en­tirely in JavaScript, HTML, and CSS, with page nav­i­ga­tion, text se­lec­tion, zoom, and find-in-doc­u­ment. Profiling pdf.js load­ing the Intel ISA Manual also drove im­prove­ments to our typed-ar­ray view cache and :has() in­val­i­da­tion.

Browsing his­tory and rich ad­dress bar au­to­com­plete

Type in the ad­dress bar and you now get rich, his­tory-aware sug­ges­tions: pre­vi­ously vis­ited pages with fav­i­cons and ti­tles, a search-en­gine short­cut, and plain URL com­ple­tions (#8933). Behind the scenes, a SQLite-backed HistoryStore per­sists every nav­i­ga­tion along with its ti­tle, fav­i­con, visit count, and last-visit time, and Clear brows­ing his­tory” is wired up in the Privacy set­tings page. Both the Qt and AppKit UIs ren­der the new rich rows.

Speculative and in­cre­men­tal HTML pars­ing

The HTML parser now con­sumes the re­sponse body in­cre­men­tally (#9151). Bytes flow through a stream­ing text de­coder into the to­k­enizer one chunk at a time, the to­k­enizer pauses when it runs out of in­put, and re­sumes when more ar­rives. This re­places a model where we waited for the full body be­fore start­ing to parse.

We also im­ple­mented the spec­u­la­tive HTML parser (#9114). When the main parser blocks on a syn­chro­nous ex­ter­nal script, a sep­a­rate to­k­enizer scans ahead through the un­parsed in­put and is­sues spec­u­la­tive fetches for the re­sources it finds: <script src>, <link rel=stylesheet|pre­load>, and <img src>. It tracks <base href> and skips into tem­plates and for­eign con­tent cor­rectly. A fol­low-up wired the spec­u­la­tive parser into the doc­u­men­t’s pre­load map (#9164), so re­sources dis­cov­ered spec­u­la­tively get dedu­pli­cated against the reg­u­lar parser’s later fetches in­stead of be­ing re­quested twice.

Off-thread JavaScript com­pi­la­tion

Bytecode gen­er­a­tion for fetched scripts’ top-level code now runs on a back­ground thread pool (#9118). Worker threads pro­duce the byte­code and the data needed to build an Executable, while every­thing that touches the VM or GC heap stays on the main thread. This cov­ers clas­sic scripts, mod­ules, and top-level IIFEs, and shifts roughly 200ms of main thread time onto back­ground threads while load­ing YouTube alone.

Per-Navigable ras­ter­i­za­tion

Each Navigable now ras­ter­izes in­de­pen­dently on its own thread (#8793). Previously, iframes were painted syn­chro­nously as nested dis­play lists in­side their par­en­t’s dis­play list, which meant only the top-level tra­vers­a­ble’s ren­der­ing thread was ever ac­tive. The par­en­t’s dis­play list now ref­er­ences each iframe’s ras­ter­ized out­put through an ExternalContentSource, so iframe in­val­i­da­tions no longer re­quire re-record­ing the par­ent. Beyond the par­al­lelism, this is prep work for mov­ing iframes into sep­a­rate sand­boxed processes.

JavaScript en­gine

With the C++/Rust tran­si­tion be­hind us, we spent April cash­ing in.

Faster JS-to-JS calls. A multi-part se­ries (#8891, #8909, #8912) made Call, Return, and End in­struc­tions stay en­tirely in the AsmInt as­sem­bly in­ter­preter for the com­mon case, with hand-tuned ARM64 paired load/​store (ldp/stp) for reg­is­ter save/​re­store. Native func­tion calls also dis­patch di­rectly from AsmInt now, via a new RawNativeFunction vari­ant that holds a plain func­tion pointer in­stead of an AK::Function (#8922).

O(1) byte­code reg­is­ter al­lo­ca­tor. Generator::allocate_register used to scan the free pool to find the low­est-num­bered reg­is­ter. We were spend­ing ~800ms in this func­tion alone while load­ing x.com. With the C++/Rust pipeline par­ity pe­riod over, the al­lo­ca­tor is now a plain LIFO stack (#9007).

Cached for-in it­er­a­tion. for (key in obj) sites now cache the flat­tened enu­mer­able key snap­shot and reuse it as long as the re­ceiver’s shape, in­dexed stor­age, and pro­to­type chain still match (#8856). Speedometer 2 went from 67.7 to 73.6, and Speedometer 3 from 4.11 to 4.22!

A grab-bag of other im­prove­ments:

The parser uses zero-copy iden­ti­fier name shar­ing across the lexer, parser, and scope col­lec­tor. On a cor­pus of web­site JS, pars­ing is 1.14x faster and uses 282 MB less RSS. (#8801)

Short string con­cate­na­tions skip the rope rep­re­sen­ta­tion when the re­sult is go­ing to be ob­served as a flat string any­way. 2.13x speedup on a tight a + b loop. (#9184)

Lexical-this ar­row func­tions no longer al­lo­cate a func­tion en­vi­ron­ment per call. Another 2.13x on a mi­crobench­mark. (#9192)

Sparse ar­rays no longer pay an ea­ger cost for their holes: Array(20_000_000) stays mostly meta­data in­stead of do­ing work pro­por­tional to twenty mil­lion imag­i­nary el­e­ments. (#8847)

A new lazy JS::Substring type backs reg­exp cap­tures and string builtins like slice, split, and in­dexed ac­cess, gain­ing 1.066x on Octane’s reg­exp bench­mark. (#8863)

Source po­si­tions are pre­served end-to-end in byte­code source maps, sav­ing ~250ms on x.com. (#9027)

Zero-copy TransferArrayBuffer saves ~130ms on YouTube load. (#9088)

Cached typed-ar­ray views switched from a WeakHashSet to an in­tru­sive list, sav­ing ~250ms load­ing the Intel ISA PDF in pdf.js. (#9180)

Every Promise al­lo­cated two PromiseResolvingFunction cells with AK::Function clo­sures that did­n’t ac­tu­ally cap­ture any­thing. They’re now sta­tic func­tions dis­patched by a Kind enum, drop­ping a per-re­solver al­lo­ca­tion across every promise the en­gine cre­ates. (#9188)

Skipping prop­erty-table mark­ing for non-dic­tio­nary shapes cut 1.3 sec­onds off GC time while load­ing map­tiler.com. (#9044)

A fast path for Array.prototype.indexOf on packed ar­rays (#9123)

Array.prototype.sort reuses cached UTF-16 in­stead of re-transcod­ing on every com­par­i­son (#9036)

Imports for WASM, JSON, and CSS mod­ules (#6029)

Removed ShadowRealm sup­port, since the pro­posal has stalled in the stan­dards process (#8753)

GTK4 / libad­waita fron­tend

Ladybird has a new Linux fron­tend built on GTK4 and libad­waita, sit­ting along­side the ex­ist­ing Qt fron­tend (#8691). It’s in­spired by GNOME Web (Epiphany) and fol­lows GNOMEs de­sign guide­lines: no menubar, a ham­burger menu, and AdwTabView for tabs. Out of the box you get au­to­com­plete and se­cu­rity icons in the URL bar, find-in-page, fullscreen, con­text menus, alert/​con­firm/​prompt/​color/​file di­alogs, clip­board, multi-win­dow, light/​dark theme, and DPR scal­ing. It’s still early, so not yet at fea­ture par­ity with the Qt and AppKit fron­tends.

Bookmarks

Last month we got book­marks. This month they got a proper man­age­ment UI:

An about:book­marks page for man­ag­ing book­marks and fold­ers (#8825)

Bookmark im­port and ex­port from the new page (#8938)

Context menus for edit­ing book­marks and fold­ers (#8715)

A date_added time­stamp on every book­mark and folder (#8867)

Bookmarks bar QoL: open in new tab, copy URL, mid­dle-click and Ctrl/Cmd+click to open in new tab (#8758)

The HTML5 drag-and-drop API is now wired up (#8783). about:book­marks uses it for re­order­ing, and it works on reg­u­lar web pages too.

Cache and CacheStorage

We im­ple­mented Cache and CacheStorage end to end, with all nine meth­ods (open, has, delete, keys, match, matchAll, add, ad­dAll, put) backed by an ephemeral in-mem­ory store (#8745).

CSS fea­tures

im­age-set() : Basic sup­port for the stan­dard and -webkit- pre­fixed forms. At paint time we pick the can­di­date whose res­o­lu­tion best matches the de­vice pixel ra­tio, skip­ping un­sup­ported MIME types. This makes header im­ages show up on go­comics.com. (#9090)

po­si­tion-an­chor and CSS an­chor po­si­tion­ing : Initial sup­port for an­chor-po­si­tioned el­e­ments, fix­ing the hand and gun po­si­tion­ing on css­doom.wtf. (#8686)

Color in­ter­po­la­tion rewrite : Aligned with css-color-4. We now in­ter­po­late in float in­stead of u8, han­dle miss­ing and pow­er­less com­po­nents cor­rectly, deal with out-of-gamut sRGB, and ap­ply al­pha mul­ti­pli­ers con­sis­tently. (#8934)

Color in­ter­po­la­tion rewrite : Aligned with css-color-4. We now in­ter­po­late in float in­stead of u8, han­dle miss­ing and pow­er­less com­po­nents cor­rectly, deal with out-of-gamut sRGB, and ap­ply al­pha mul­ti­pli­ers con­sis­tently. (#8934)

Presentational hints through the cas­cade : Legacy pre­sen­ta­tional HTML at­trib­utes (align, bg­color, etc.) used to by­pass the reg­u­lar CSS cas­cade and write di­rectly into the el­e­men­t’s cas­caded prop­er­ties. They now go through the cas­cade as nor­mal au­thor de­c­la­ra­tions, so var() sub­sti­tu­tion and the in­valid-at-com­puted-value-time fall­back work cor­rectly. Fixes a crash on html.spec.whatwg.org. (#9176)

Presentational hints through the cas­cade : Legacy pre­sen­ta­tional HTML at­trib­utes (align, bg­color, etc.) used to by­pass the reg­u­lar CSS cas­cade and write di­rectly into the el­e­men­t’s cas­caded prop­er­ties. They now go through the cas­cade as nor­mal au­thor de­c­la­ra­tions, so var() sub­sti­tu­tion and the in­valid-at-com­puted-value-time fall­back work cor­rectly. Fixes a crash on html.spec.whatwg.org. (#9176)

align on table sec­tions and rows : <thead>, <tbody>, <tfoot>, and <tr> honor the align pre­sen­ta­tional at­tribute, fix­ing but­ton place­ment on brick­link.com. (#9177)

BeforeAfter

align on table sec­tions and rows : <thead>, <tbody>, <tfoot>, and <tr> honor the align pre­sen­ta­tional at­tribute, fix­ing but­ton place­ment on brick­link.com. (#9177)

stroke-dashar­ray in­ter­po­la­tion : SVG dashes fi­nally an­i­mate smoothly. (#9133)

stroke-dashar­ray in­ter­po­la­tion : SVG dashes fi­nally an­i­mate smoothly. (#9133)

aut­o­fo­cus : Elements with the aut­o­fo­cus at­tribute ac­tu­ally re­ceive fo­cus on page load now. (#9016)

aut­o­fo­cus : Elements with the aut­o­fo­cus at­tribute ac­tu­ally re­ceive fo­cus on page load now. (#9016)

List mark­ers in RTL text : Bullets now sit on the right side of right-to-left text, fix­ing list ren­der­ing on Arabic Wikipedia. (#9099)

BeforeAfter

List mark­ers in RTL text : Bullets now sit on the right side of right-to-left text, fix­ing list ren­der­ing on Arabic Wikipedia. (#9099)

Inline flex/​grid base­lines : An in­line flex or grid con­tainer now de­rives its base­line from its child’s first line box, not its last wrapped line. Fixes link text and icon align­ment on nos.nl. (#9183)

BeforeAfter

Inline flex/​grid base­lines : An in­line flex or grid con­tainer now de­rives its base­line from its child’s first line box, not its last wrapped line. Fixes link text and icon align­ment on nos.nl. (#9183)

Networking

getad­drinfo no longer blocks the event loop. LibDNS now runs lookups on a thread pool, fires A and AAAA queries in par­al­lel (RFC 8305-ish), and co­a­lesces con­cur­rent lookups for the same name. RequestServer’s pre­con­nect path was sneak­ing past our re­solver and let­ting libcurl spawn its own threaded re­solver that would pthread­_join us on the main thread; that’s now routed through the same DNS pool. (#9109)

Profile of load­ing x.com when DNS is slow, be­fore and af­ter:

Over in RequestServer, drain­ing queued re­sponse data was O(n²) when WebContent was slower than the net­work. RequestServer was spend­ing ~30 sec­onds in mem­cpy and 3 sec­onds in Vector::remove while open­ing a YouTube video! Switching AllocatingMemoryStream to a singly-linked chunk list made con­sump­tion O(1). (#9028)

We now ad­ver­tise AVIF and WebP in our Accept header for im­age re­quests, match­ing other en­gines. Some CDNs use the Accept header to de­cide whether to serve mod­ern for­mats or fall back to JPEG. (#9046)

Style in­val­i­da­tion

Selector in­val­i­da­tion used to be straight­for­ward: se­lec­tors al­ways looked down­ward. :host ru­ined that. :has() made it way worse. Any de­scen­dant change can now force you to walk up the tree find­ing an­ces­tors whose :has() ar­gu­ments just flipped, and a lot of this mon­th’s in­val­i­da­tion work is about mak­ing that walk less waste­ful.

Four big wins this month:

Reddit rule cache re­builds: 13.2s → 3.2s. Stylesheet mu­ta­tions no longer re­build every style scope’s cache when only one scope changed. (#9138)

Reddit in­fi­nite scroll: 11% fewer point­less re­com­putes. Sibling struc­tural in­val­i­da­tion stopped fan­ning out to de­scen­dants that don’t ob­serve the po­si­tion. (#9155)

:has() mu­ta­tion in­val­i­da­tion skips un­af­fected an­chors , with sub­stan­tial re­duc­tions mea­sured on azure.com. (#9168)

:has() child-list vis­its on the Intel ISA PDF: 71k → 1.6k. Coalesced when pend­ing data al­ready cov­ers every con­crete fea­ture bucket the scope cares about, sav­ing ~650ms on the pdf.js load. (#9179)

A large new struc­tural-in­val­i­da­tion test bat­tery ex­posed and fixed sev­eral in­val­i­da­tion holes (#9095), and a string of smaller tight­en­ings landed around hover, stylesheet mu­ta­tion scope, cus­tom-prop­erty maps, and com­puted-style diff­ing (#9077, #9049, #9079, #9080, #9141).

Linux GPU paint­ing via dmabuf

On Linux Vulkan builds, GPU-backed paint­ing was be­ing se­cretly un­done every frame: WebContent painted into a GPU-backed Skia sur­face, but the buffer it shared with the UI process was a CPU bitmap, which forced a full GPU-to-CPU read­back on every flush. SharedImage can now carry a Linux dmabuf han­dle, so the front and back buffers stay GPU-resident the whole way to the UI process. (#8917, #8920)

mi­mal­loc as the main al­lo­ca­tor

Our C++ and Rust code now share a sin­gle al­lo­ca­tor in­stance, mi­mal­loc v2, in­stead of each go­ing through the sys­tem al­lo­ca­tor sep­a­rately (#8752). We don’t over­ride mal­loc() sys­tem-wide, so third-party li­braries keep their own al­lo­ca­tor con­tracts. JS bench­marks im­proved across the board.

Sites that work bet­ter

The biggest vis­i­ble wins this month are on Reddit and YouTube.

Reddit im­age gallery carousels ac­tu­ally work now, af­ter fix­ing two un­re­lated lay­out bugs around ::slotted() match­ing and ab­solutely po­si­tioned de­scen­dants of split in­lines (#9148). And thanks to TextDecoderStream, the SPA stops swal­low­ing link clicks, so you can fi­nally open the com­ments! Infinite scroll also ben­e­fits from the struc­tural-in­val­i­da­tion work cov­ered above.

YouTube ben­e­fits from a stack of un­re­lated im­prove­ments: off-thread top-level JS com­pile, off-thread WOFF2 de­com­pres­sion (saves ~170ms on Gmail too, #8976), re­duced @font-face fetch fanout (177 → ~9 fetches on ini­tial load, #9032), the RequestServer mem­ory churn fix, and zero-copy TransferArrayBuffer.

A hand­ful of smaller fixes:

go­comics.com : Header im­ages show up, thanks to im­age-set().

yan­dex.com/​maps : Vector-tile WebGL ren­der­ing works af­ter a small pile of WebGL fix-ups, in­clud­ing the WEBGL_debug_renderer_info ex­ten­sion (#9043).

strava.com : Login works now that Navigator.getBattery throws the spec-man­dated er­ror type in­stead of one of our own (#8770).

GitHub Insights : Loads ~100ms faster thanks to the Element.matches() and .closest() se­lec­tor cache (#8987).

tweak­ers.net : The lap­top com­par­i­son page is ~31% faster from in­dexed HTMLFormElement prop­erty name lookups (#9009).

neon.com : No longer crashes (#8812).

chan­nel4.com : Vertically mis­aligned cat­e­gory text fixed in flex auto-mar­gin res­o­lu­tion (#9050).

Cloudflare Turnstile : Still does­n’t pass, but we fail it much faster now thanks to auth-scheme han­dling, Array.prototype.shift() op­ti­miza­tions, and a pile of UA event han­dler hard­en­ing on <input> range and num­ber el­e­ments (#9063).

Web Platform Tests (WPT)

Our WPT score went from 2,003,537 to 2,067,263 this month, a head­line gain of 63,726 sub­tests. There’s an as­ter­isk on that num­ber: WPT im­ported test262, the of­fi­cial ECMAScript con­for­mance suite, up­stream this month, which added 53,207 JavaScript sub­tests to the count. We pass 52,045 of them (a 97.8% pass rate), since we’ve been run­ning test262 in­de­pen­dently for years and LibJS con­for­mance is in great shape. So roughly 52k of the 63.7k gain is from the im­port, and the re­main­ing ~11.7k is gen­uine new browser-plat­form progress, in the same ball­park as January’s 13,690.

DO_NOT_TRACK

donottrack.sh

Many CLI tools, SDKs, and frame­works col­lect teleme­try data by de­fault. Each one has its own way to opt out:

You get the idea. There are too many, and they are all dif­fer­ent.

A sin­gle, stan­dard en­vi­ron­ment vari­able that clearly and un­am­bigu­ously ex­presses a user’s wish to opt out of any of the fol­low­ing:

Non-essential-to-functionality re­quests to the cre­ator of the soft­ware or third-party

We just want lo­cal soft­ware.

Add the line above to your shell con­fig­u­ra­tion file so it ap­plies to all your ter­mi­nal ses­sions:

If you de­velop tools that col­lect teleme­try, an­a­lyt­ics, or make non-es­sen­tial net­work re­quests, please check for this vari­able:

If DO_NOT_TRACK is set to 1, dis­able all track­ing

Consider mak­ing teleme­try opt-in rather than opt-out

Six Years Perfecting Maps on watchOS

www.david-smith.org

I love go­ing on wilder­ness ad­ven­tures. I am rarely hap­pier than when I am far off into the moun­tains with­out a soul in sight. As a re­sult, I have spent a lot of time learn­ing how to safely ex­plore and nav­i­gate when I’m away from civ­i­liza­tion. The most im­por­tant habit I’ve found for not get­ting lost is to be very reg­u­lar in check­ing your lo­ca­tion as you go, and the best way I’ve found to do that is to have a map on my wrist.

For more than six years I’ve been work­ing to­wards cre­at­ing the best pos­si­ble map­ping ex­pe­ri­ence on the Apple Watch. With yes­ter­day’s launch of Pedometer++ 8, I feel like this de­sign jour­ney has reached a mean­ing­ful des­ti­na­tion. I would con­tend that Pedometer++’s watchOS map­ping sup­port is the ab­solute best avail­able on the App Store.

So I wanted to walk through the jour­ney it took to get here.

Early Efforts

I have wanted a good map on my wrist since the Apple Watch launched. This was­n’t re­al­is­ti­cally pos­si­ble un­til watchOS 6, which brought SwiftUI to the plat­form and, for the first time, made real” apps pos­si­ble. But in those early days, the screens were tiny, and the proces­sors slow. I could­n’t quite get to where I wanted.

This was my very first at­tempt that shipped in Pedometer++. These maps were gen­er­ated com­pletely on the server, which in­volved send­ing the rel­e­vant work­out data roundtrip every time I wanted to re­fresh the dis­play. This sys­tem let me val­i­date the idea, but it was never go­ing to be prac­ti­cally use­ful for nav­i­ga­tion or reg­u­lar use, and could never work of­fline.

Custom Mapping Engine

I knew that if I wanted to make progress to­wards this goal, I’d need to work at a lower level, so I got to work build­ing a fully SwiftUI-native map ren­der­ing en­gine. SwiftUI was the only choice be­cause it’s all that watchOS sup­ported, and proved to be help­ful for putting maps into wid­gets, which also only sup­port SwiftUI.

In 2021, I got this en­gine to a place where I could re­li­ably and per­for­mantly ren­der a map on watchOS. With it, I can ren­der any tile-based maps and over­lay lo­ca­tion in­for­ma­tion on top.

Map Designs

Next came the ques­tion of how best to sur­face data to users. App de­sign on watchOS is a re­ally fun — but frus­trat­ing — chal­lenge. You are de­sign­ing for a rel­a­tively tiny screen, which must be op­er­ated one-handed. In this case, I want the user to be able to read the map and use it to nav­i­gate, while also hav­ing ac­cess to other work­out-re­lated in­for­ma­tion.

This be­gan a long se­ries of de­sign at­tempts, most of which (if I’m be­ing hon­est) were kinda aw­ful.

In the end, I set­tled on a modal” ap­proach where the user can switch be­tween a map screen and a met­rics screen us­ing a but­ton on the top-left cor­ner.

This in­ter­face pro­vides one con­text where the user can freely pan/​zoom around the map and an­other where I can use the more stan­dard watchOS tabbed page in­ter­face for met­rics and con­trols. I shipped this to Pedometer++, but there was al­ways some­thing that did­n’t quite sit right with me about it.

This de­sign felt like a com­pro­mise, and not in a good way. I felt that in or­der to achieve the goal of mak­ing the map in­ter­ac­tive, I could­n’t have the map be part of any UI struc­ture that in­volved swipes. As the screens on Apple Watches got larger, it felt less needed in or­der to give the map enough space to be use­ful.

So I set about try­ing al­ter­na­tive de­signs. SO many de­signs.

For a while, I thought that I needed to find a way to put the met­rics at the bot­tom of the screen. However, that would lead to other prob­lems on longer out­ings or for work­outs that aren’t nav­i­ga­tion-fo­cused. So I kept it­er­at­ing and came up with even more de­signs.

All of these de­signs suf­fered from the same fun­da­men­tal is­sue: they re­quired the app to dis­play only a fixed set of fields at a time.

I could make the in­ter­face con­fig­urable, but one of the fun­da­men­tal rules of watchOS de­sign is that you should avoid any in­ter­ac­tion that takes more than a few sec­onds on the watch. Any user-con­fig­urable setup is in­her­ently fid­dly, so I did­n’t like this ap­proach.

Dark Mode, Liquid Glass, & Cartography

Around the same time I was still wrestling with the de­sign chal­lenges of how best to struc­ture the app, Apple an­nounced watchOS 26, and the ar­rival of Liquid Glass. One of the core de­sign as­pects of Liquid Glass is lay­er­ing stack­ing el­e­ments on top of each other, but an­other is the types of col­ors that work best with each other.

I was pre­vi­ously us­ing Thunderforest Outdoors as my basemap for the app. I love the con­tent this map in­cludes, but when I started over­lay­ing glassy el­e­ments over it I found that it was­n’t well-suited for Liquid Glass.

So… I com­mis­sioned a cus­tom map. Working with the in­cred­i­ble car­tog­ra­pher Andy Allen, we cre­ated a com­pletely new basemap that would look fan­tas­tic with Liquid Glass.1

We sim­pli­fied the map vi­su­ally, in­creased the con­trast of the el­e­ments, and made the map el­e­ments more sat­u­rated to pre­vent them from be­com­ing a muddy mess when shown be­low glass.

With this work done, I had an­other op­por­tu­nity: I could fi­nally have a dark mode vari­ant of the map tiles. While help­ful on iOS, this re­ally shines on watchOS. Andy and I re­ally worked to­ward some­thing which would be in­cred­i­bly leg­i­ble at ar­m’s length.

The re­sult of these ef­forts is that now I have a great map for watchOS… but a de­sign that did­n’t match that great­ness.

Striving for Great

I kept try­ing. To get me out of my de­sign rut, I en­listed the help of the fan­tas­tic de­signer Rafa Conde. I needed a fresh set of eyes on this and very quickly, this part­ner­ship paid off. They pro­posed a va­ri­ety of al­ter­na­tive lay­outs, but when I saw this one I knew it was the one.

The lay­er­ing of the met­rics on the top-left cor­ner, with the map be­ing the top page of a ver­ti­cal stack, was the cor­rect an­swer. This de­sign han­dles in­ter­ac­tiv­ity by re­quir­ing a tap on the map first to en­ter browse mode”.

Tweaking and Polishing

Now that I had the over­all con­cept locked in, the real fun be­gan, ac­tu­ally build­ing the app and di­al­ing in all the de­tails. I fairly quickly took Rafa’s con­cept and turned it into a work­ing pro­to­type. This let me val­i­date the idea in the field… lit­er­ally. After walk­ing a few hun­dred miles with it, I was con­fi­dent it was the cor­rect ap­proach.

Next, I needed to dial in the font and make more sub­tle de­sign choices.

After a bit more it­er­a­tion, I ar­rived at the de­sign that shipped yes­ter­day. It is leg­i­ble, use­ful, and (in my hum­ble opin­ion) beau­ti­ful.

It feels re­ally good to be able to cap off this six-year jour­ney with a de­sign I could­n’t be more proud of. This screen rep­re­sents so much ac­cu­mu­lated ef­fort and learn­ing. It fi­nally gives me a de­sign which feels na­tive on the plat­form, but also novel and unique.

Here is the evo­lu­tion of this de­sign over the last six years:

Postscript: Considering MapKit

While my work on watchOS map­ping mas­sively pre­dates the ar­rival of Apple’s MapKit onto the plat­form, it is prob­a­bly worth ex­plain­ing why I de­cided to do all of this cus­tom work to avoid us­ing it.

Fundamentally, I find that MapKit is great for ba­sic uses, but does­n’t pro­vide nearly the level of con­fig­ura­bil­ity and util­ity which I want Pedometer++ to of­fer. For ex­am­ple:

MapKit on watchOS al­ways shows in dark mode, which gen­er­ally is a good de­fault, but closes the door on some ac­ces­si­bil­ity and user choice rea­sons. I needed it to be a user-se­lec­table op­tion.

While MapKit on watchOS has got­ten bet­ter over time in terms of what you can do with it, I still find it a bit lim­it­ing in terms of an­i­ma­tions and over­lays.

MapKit’s cov­er­age is im­prov­ing with re­gards to topo­graphic con­tours and trail mark­ing, but there are far too many places where the MapKit map is es­sen­tially blank, but I know there should be more rich de­tails avail­able. For ex­am­ple, here is my map vs MapKit at the trail­head of one of my fa­vorite hikes in Scotland.

I still find it so cool that my work on this al­lows me to say that I commissioned a car­tog­ra­pher” to work on some­thing for me. 😁 ↩

I still find it so cool that my work on this al­lows me to say that I commissioned a car­tog­ra­pher” to work on some­thing for me. 😁 ↩

A Couple Million Lines of Haskell: Production Engineering at Mercury | The Haskell Programming Language's blog

blog.haskell.org

A Couple Million Lines of Haskell: Production Engineering at Mercury

The ed­i­tors of the Haskell Blog are happy to an­nounce a new se­ries of ar­ti­cles called Haskellers from the trenches”,

where we in­vite ex­pe­ri­enced en­gi­neers to talk about their sub­jects of ex­per­tise, best prac­tices, and pro­duc­tion tales.

Engineering rigour and artis­tic cre­ativ­ity are a fan­tas­tic com­bi­na­tion, and this se­ries aims to be the syn­the­sis of these two as­pects within the Haskell world.

I first heard about Haskell when I was six­teen, sit­ting in a high school com­puter sci­ence class where we were writ­ing Java and learn­ing, among other things, that NullPointerException was ap­par­ently a lifestyle choice if you de­cided to go into soft­ware de­vel­op­ment. While look­ing at the /r/programming sub­red­dit af­ter school, I stum­bled across a ref­er­ence to a lan­guage where null pointer ex­cep­tions sim­ply could not hap­pen, where the type sys­tem could pre­vent an en­tire cat­e­gory of bugs that I had been fight­ing with every week. Haskell. I was im­me­di­ately, hope­lessly en­am­ored with the idea.

I have been writ­ing Haskell for nearly two decades now, and I still think the value propo­si­tion I fell in love with at six­teen was ba­si­cally right. What took me longer to learn is what that promise looks like af­ter a code­base gets large, the com­pany grows faster than its doc­u­men­ta­tion, and the sys­tem is al­lowed to touch money. Haskell earns its keep there in nu­mer­ous, sturdy ways. It lets you pack op­er­a­tional knowl­edge into APIs, put dan­ger­ous ma­chin­ery be­hind tight bound­aries, and make the safe path the easy one. At a grow­ing com­pany, those aren’t just mat­ters of taste; they are how you keep a sys­tem un­der­stand­able af­ter the peo­ple who first un­der­stood it have moved on.

Fast for­ward to to­day: I work at Mercury, a fin­tech com­pany that pro­vides bank­ing ser­vices.* We serve over 300,000 busi­nesses. We processed $248 bil­lion in trans­ac­tion vol­ume in 2025 on $650 mil­lion in an­nu­al­ized rev­enue, and are, at the time of writ­ing, in the process of ob­tain­ing a na­tional bank char­ter in the USA from the OCC. We have around 1,500 em­ploy­ees. Our en­gi­neer­ing or­ga­ni­za­tion largely hires gen­er­al­ists, and most of them have never writ­ten a line of Haskell be­fore join­ing.

My time work­ing at Mercury has changed how I think about the lan­guage more than any ser­mon about pu­rity ever did. Elegance is pleas­ant, but keep­ing your busi­ness alive is com­pul­sory.

Our code­base is roughly 2 mil­lion lines of Haskell, once you strip out com­ments and such.

This is the part where you are sup­posed to re­coil in hor­ror.

A cou­ple mil­lion lines of Haskell, main­tained by peo­ple who learned the lan­guage on the job, at a com­pany that moves huge amounts of money? The con­ven­tional wis­dom says this should be a dis­as­ter, but sur­pris­ingly, it is­n’t. The sys­tem we’ve built has worked well for years, through hy­per­growth, through the SVB cri­sis that sent $2 bil­lion in new de­posits our way in five days,1 through reg­u­la­tory ex­am­i­na­tions, through all the or­di­nary and ex­tra­or­di­nary things that hap­pen to a fi­nan­cial sys­tem at scale.

This ar­ti­cle is about why it works. Not in the Haskell is beau­ti­ful” sense, though it is. Not in the the com­piler will save us from our­selves” sense, though I fre­quently feel grat­i­tude in that di­rec­tion. I mean in the much less ro­man­tic and much more use­ful sense that we run this lan­guage in pro­duc­tion, at scale, with a rapidly chang­ing team, and have learned some hard lessons about what it takes to keep the whole en­ter­prise afloat. The beauty of Haskell is charm­ing enough, but there is a whole swath of op­er­a­tional and or­ga­ni­za­tional re­al­ity be­yond it, and if you ig­nore that re­al­ity for too long, your com­pany will likely fire the whole Haskell team2 and start writ­ing PHP or some­thing in­stead.

How We Think About Reliability

Before div­ing into prac­ti­cal ad­vice, a note on phi­los­o­phy.

There is a tra­di­tional way of think­ing about sys­tem re­li­a­bil­ity that fo­cuses on pre­vent­ing fail­ures. You enu­mer­ate the things that can go wrong. You add checks. You write tests for each bad case. You hunt for bugs. This is, of course, nec­es­sary work, and we do it. But it is not suf­fi­cient, and if you ori­ent en­tirely around it you de­velop a spe­cific blind spot: you get very good at cat­a­logu­ing the ways things break and very bad at un­der­stand­ing why they or­di­nar­ily work.3

We try to think about it dif­fer­ently. A sys­tem op­er­ates re­li­ably be­cause it can ab­sorb vari­a­tion: it de­grades grace­fully, its op­er­a­tors can un­der­stand and ad­just it, and the ar­chi­tec­ture makes the right thing easy and the wrong thing dif­fi­cult.4 Reliability is not just the ab­sence of fail­ure. It is the pres­ence of adap­tive ca­pac­ity. It is a sys­tem’s abil­ity to keep func­tion­ing while re­al­ity con­tin­ues its long­stand­ing and re­gret­table habit of re­fus­ing to hold still.

When you have hun­dreds of en­gi­neers work­ing in a multi-mil­lion-line code­base, many of whom are six months into their Haskell ca­reers, adaptive ca­pac­ity” stops be­ing a nifty phrase from a re­silience en­gi­neer­ing pa­per and starts be­ing a daily con­cern. Patrick McKenzie has ob­served that in a com­pany grow­ing at 2x per year, half of your cowork­ers will al­ways have less than a year of ex­pe­ri­ence. A year later, half of your cowork­ers will still have less than a year of ex­pe­ri­ence. For very suc­cess­ful com­pa­nies, this never stops be­ing true.5 You be­come or­ga­ni­za­tion­ally an­cient very quickly, whether you like it or not, and the things you know be­come in­sti­tu­tional dark mat­ter: load-bear­ing, but in­vis­i­ble to most of the peo­ple around you.

So the ques­tions we ask are op­er­a­tional. Can the new hire on your team read this mod­ule and un­der­stand what it does? If the data­base is slow, does this ser­vice de­grade or does it fall over and take its neigh­bors with it? If some­one mis­uses an in­ter­face, does the com­piler tell them, or do we find out when the on-call gets paged? If you don’t have an­swers to those ques­tions, you have a fu­ture in­ci­dent qui­etly un­fold­ing.

This is why I in­creas­ingly think of the type sys­tem as an op­er­a­tional aid more than a cor­rect­ness proof. Its value is not merely that it rules out cer­tain classes of er­rors, though it does. Its value is that it en­codes in­sti­tu­tional knowl­edge in a form that sur­vives the de­par­ture of the per­son who wrote it. In a fast-grow­ing com­pany, peo­ple leave, peo­ple trans­fer teams, peo­ple go on va­ca­tion or parental leave, peo­ple join, and the churn means that things peo­ple knew walk out the door with them un­less you have writ­ten them down some­where. Ideally, you have writ­ten them down in a form that the com­piler can read, be­cause the com­piler is much more dis­ci­plined than the av­er­age wiki page.

This ex­tends be­yond code. As a mem­ber of our sta­bil­ity en­gi­neer­ing team, we con­stantly in­ves­ti­gate the prospec­tive pro­duc­tion be­hav­ior of fea­tures and prod­ucts. We do not do this to slow down prod­uct de­vel­op­ment, but in part­ner­ship with the team ship­ping the fea­ture, to make sure we are pre­pared to deal with the fall­out when it breaks and, if pos­si­ble, to make that fall­out bor­ing rather than ex­cit­ing. We ask things like: what is the blast ra­dius if this fails? Which op­er­a­tions must be idem­po­tent, and how? What does the roll­back look like? What hap­pens to in-flight work? Which sys­tems will ab­sorb the fail­ure, and which ones will am­plify it? The point is to have the con­ver­sa­tion early enough that it shapes the de­sign rather than merely au­dit­ing the launch af­ter all the im­por­tant de­ci­sions have al­ready be­come ex­pen­sive to re­visit.6

Our phi­los­o­phy, stated plainly: we are not the qual­ity po­lice. We are the peo­ple who would like to help you avoid be­ing woken up at 4 AM to deal with the fall­out of a bro­ken fea­ture. Rather than a deeply ide­o­log­i­cal stance, we sim­ply de­sire to help peo­ple.

So, in light of that, how do we make Haskell work in pro­duc­tion?

Purity Is a Boundary, Not a Property

My hot take: the first and most con­se­quen­tial mis­un­der­stand­ing about Haskell is that pu­rity is not some­thing the lan­guage is, so much as that it is some­thing your in­ter­faces en­force.

Under the hood, Haskell is not a mag­i­cal ma­chine that per­forms side ef­fects de­spite be­ing pure. Behind every pure” func­tion in bytestring, text, and vec­tor lies a cheer­ful lit­tle hellscape of mu­ta­ble al­lo­ca­tion, buffer writes, un­safe co­er­cions, and other be­hav­ior that would alarm you if you dis­cov­ered it in a ju­nior en­gi­neer’s side pro­ject. Behind the ST monad lies in-place mu­ta­tion and side ef­fects, ob­serv­able within the com­pu­ta­tion. What makes it ac­cept­able is that the side ef­fects are en­cap­su­lated such that the bound­ary can­not be vi­o­lated.

runST :: (forall s. ST s a) -> a

The rank-2 type (that is, the type s is scoped within the paren­the­sis and can’t es­cape) of runST en­sures that the mu­ta­ble ref­er­ences cre­ated in­side the com­pu­ta­tion can­not es­cape due to be­ing tagged with the type s. Internally, all sorts of im­per­a­tive non­sense may oc­cur. Externally, the func­tion is pure. The world out­side the bound­ary gets none of the mu­ta­tion, only the re­sult.

This is, I think, a won­der­ful de­sign prin­ci­ple in the larger scheme of things when writ large: you can per­mit ar­bi­trar­ily dan­ger­ous op­er­a­tions within a scope, pro­vided the scope’s exit is typed nar­rowly enough that the dan­ger can­not leak. That prin­ci­ple ap­plies every­where in pro­duc­tion. Your data­base layer uses con­nec­tion pool­ing, retry logic, and mu­ta­ble state in­ter­nally. Your cache uses con­cur­rent mu­ta­ble maps. Your HTTP client prob­a­bly has cir­cuit break­ers, pooled con­nec­tions, and a small mu­nic­i­pal gov­ern­men­t’s worth of book­keep­ing. None of this is a prob­lem if the in­ter­face is tight enough to pre­vent mis­use and the bound­ary holds.

In pro­duc­tion, the goal is of­ten not to avoid mu­ta­tion en­tirely, be­cause that is not a se­ri­ous propo­si­tion for most real sys­tems. The goal is to con­tain mu­ta­tion, make the con­tain­ment leg­i­ble, and ver­ify that it stays con­tained. Often the right ques­tion is not is this pure?” but where is the im­pu­rity, and how much of the code­base is al­lowed to know about it?”

For a new en­gi­neer who learned Haskell three months ago, purity is a bound­ary you try to main­tain” is much more use­ful than Haskell is pure.” One tells them what to do when they sit down to de­sign a mod­ule. The other mostly sits there look­ing pro­found.

This bound­ary-ori­ented view of pu­rity sets up a more gen­eral pat­tern that re­curs through­out pro­duc­tion en­gi­neer­ing in Haskell: dan­ger­ous things are tol­er­a­ble when they are fenced in, care­fully ex­posed, and hard to mis­use. That is true of mu­ta­tion. It is true of re­tries, trans­ac­tions, state ma­chines, dis­trib­uted work­flows, and type-level ma­chin­ery. Much of what fol­lows is re­ally just this same idea, wear­ing dif­fer­ent hats.

Make the Right Thing Easy

There is a pat­tern in large code­bases where cor­rect­ness de­pends on per­form­ing op­er­a­tions in a par­tic­u­lar or­der, or in­clud­ing a par­tic­u­lar step that has no vis­i­ble con­nec­tion to the main work.

Remember to flush the au­dit log af­ter every trans­ac­tion.”

Always check the fea­ture flag be­fore call­ing this end­point.”

Make sure to en­queue the no­ti­fi­ca­tion in­side the data­base trans­ac­tion, not af­ter it.”

These are the in­can­ta­tions of op­er­a­tional lore. They live in wiki pages, on­board­ing doc­u­ments, half-for­got­ten de­sign re­views, and the mem­o­ries of se­nior en­gi­neers who are now three teams away and booked solid un­til Thursday. In a com­pany that is hir­ing ag­gres­sively, the half-life of tribal knowl­edge is alarm­ingly short. When an en­gi­neer leaves, the in­can­ta­tions fade. When a dead­line ap­proaches, they are the first thing skipped. When a new en­gi­neer joins, they of­ten have no way to know the in­can­ta­tion ex­ists at all. Nothing says robust sys­tem de­sign” quite like a crit­i­cal in­vari­ant liv­ing in a Slack thread from nine months ago7.

Haskell gives you tools to en­code these in­can­ta­tions in types so they can­not be for­got­ten. This is, for my money, the sin­gle most valu­able thing the lan­guage of­fers a pro­duc­tion en­gi­neer­ing or­ga­ni­za­tion.

Consider a sim­pli­fied ver­sion of a real pat­tern: you need to en­sure that cer­tain side ef­fects (sending a no­ti­fi­ca­tion, pub­lish­ing an event) hap­pen trans­ac­tion­ally with a data­base write. Not be­fore, not af­ter, and not in a sep­a­rate trans­ac­tion. Together, or not at all.

The naïve ap­proach is to tell peo­ple to use the right func­tion:

– Please use this one, not the other one

write­With­Events :: Transaction -> [Event] -> IO ()

– Don’t use this di­rectly (but we can’t stop you)

write­Trans­ac­tion :: Transaction -> IO ()

pub­li­shEvents :: [Event] -> IO ()

This is rookie-level en­gi­neer­ing. It works un­til it does­n’t, and until it does­n’t” tends to ar­rive on a Friday af­ter­noon when the per­son who wrote the wiki page is on va­ca­tion and every­body else is dis­cov­er­ing, in real time, that the wiki page was load-bear­ing.

A bet­ter ap­proach re­struc­tures the types so that the only way to com­mit work is through a path that in­cludes event pub­li­ca­tion:

data Transact a — opaque; can­not be run di­rectly

record :: Transaction -> Transact ()

emit :: Event -> Transact ()

– The *only* way to ex­e­cute a Transact: com­mit and pub­lish atom­i­cally

com­mit :: Transact a -> IO a

Now the in­can­ta­tion is the only door in the room. You can­not for­get it be­cause there is noth­ing else to do. The type sys­tem has not proven any­thing es­pe­cially deep about your events. It has done some­thing more prac­ti­cal: it has made the cor­rect op­er­a­tional pro­ce­dure the path of least re­sis­tance.

That dis­tinc­tion mat­ters. In pro­duc­tion, there are plenty of places where we do not need a the­o­rem. We need a de­sign that makes it dif­fi­cult for an or­di­nary busy en­gi­neer to ac­ci­den­tally do the wrong thing while try­ing to do a dozen other per­fectly rea­son­able things. The com­piler is not merely check­ing logic here; it is pre­serv­ing in­sti­tu­tional mem­ory and turn­ing it into a hard-edged in­ter­face.

When a new en­gi­neer joins and asks how do I write a trans­ac­tion?”, the type sys­tem an­swers them. When a se­nior en­gi­neer leaves, the an­swer re­mains. The in­sti­tu­tional knowl­edge sur­vived not be­cause some­one doc­u­mented it beau­ti­fully, though doc­u­men­ta­tion is pleas­ant when avail­able, but be­cause some­one en­coded it in a form the com­piler en­forces. Again, the com­piler is a bet­ter cus­to­dian of op­er­a­tional lore than the av­er­age wiki and less prone to peo­ple for­get­ting to up­date it as the re­al­ity of the sys­tem changes.

Durable Execution

The pat­tern above — struc­tur­ing types so that the cor­rect op­er­a­tional pro­ce­dure is the only pro­ce­dure — works well within a sin­gle trans­ac­tion. Financial sys­tems, un­for­tu­nately, have never felt ob­lig­ated to re­main in­side a sin­gle trans­ac­tion.

They are full of processes that span mul­ti­ple steps, mul­ti­ple ser­vices, and mul­ti­ple fail­ure modes. Send a pay­ment, wait for a part­ner to ac­knowl­edge it, up­date the ledger, no­tify the cus­tomer, han­dle can­cel­la­tion, han­dle time­out, han­dle the case where the part­ner said yes but your worker died be­fore record­ing the an­swer, han­dle the case where the part­ner said noth­ing be­cause the net­work briefly en­tered a higher plane of ex­is­tence and de­clined to tell you about it. If any step fails, you need to know where you were, what has al­ready hap­pened, and what still needs to hap­pen. You need state. You need re­tries. You need time­outs. You need idem­po­tence. You need all of these things to keep work­ing across process crashes and de­ploy­ments. Very quickly, what be­gan as just some busi­ness logic” amasses a re­mark­able amount of one-off re­peats of com­mon op­er­a­tional con­cerns.

Previously at Mercury, we co­or­di­nated these processes with data­base-backed state ma­chines dri­ven by cron jobs and back­ground work­ers, with retry logic and time­out han­dling scat­tered across the code­base. It worked. It also re­quired the sort of vig­i­lance usu­ally as­so­ci­ated with de­fus­ing un­ex­ploded ord­nance. It was frag­ile, dif­fi­cult to rea­son about, and the source of a dis­pro­por­tion­ate share of our op­er­a­tional in­ci­dents.

Temporal is our durable ex­e­cu­tion frame­work, and adopt­ing it was one of the bet­ter in­fra­struc­ture de­ci­sions we have made. You write your work­flow as or­di­nary se­quen­tial code, and the plat­form records every step in an event his­tory. If a worker crashes mid-work­flow, an­other worker re­plays the de­ter­min­is­tic pre­fix to re­con­struct the state, then con­tin­ues from where it left off. Retries, time­outs, can­cel­la­tion, and er­ror han­dling are pro­vided by the plat­form rather than each team reim­ple­ment­ing them poorly.

I think of Temporal as Frankenstein’s mon­ster, in the flat­ter­ing sense: as­sem­bled from ex­cel­lent parts, an­i­mated by im­prob­a­ble ef­fort, and smarter than many of the peo­ple alarmed by it. It takes durable his­tory, re­play, and de­ter­min­ism (things some plat­forms get na­tively,) and bolts them onto run­times that were never born know­ing how to do any of this. Most of us are not go­ing to rewrite our com­pa­nies in Erlang. Temporal is a pros­thetic for the rest of us. It gives or­di­nary lan­guages a shot at some of the same op­er­a­tional virtues by slightly mad but highly ef­fec­tive means.

The align­ment with Haskell dove­tails nicely with the virtues of­ten at­trib­uted to Haskell: A Temporal work­flow is, in an im­por­tant sense, a pure func­tion over its event his­tory. Temporal Workflows have a de­ter­min­ism re­quire­ment — that a re­played work­flow must pro­duce the same se­quence of com­mands as the orig­i­nal — which is ex­actly the same con­straint Haskell im­poses on pure code: same in­puts, same out­puts. Side ef­fects are iso­lated into ac­tiv­i­ties, which are the work­flow’s equiv­a­lent of IO. The work­flow or­ches­trates; the ac­tiv­i­ties ex­e­cute. If you have spent any time think­ing about pure core / im­pure shell, this is that model with the plat­form en­forc­ing the sep­a­ra­tion rather than re­ly­ing on sheer dis­ci­pline.

We built and open-sourced hs-tem­po­ral-sdk, our Haskell SDK for Temporal, which wraps the of­fi­cial Core SDK (Rust, via FFI) and pro­vides a Haskell-native API for defin­ing work­flows, ac­tiv­i­ties, and work­ers.

I gave a talk about our adop­tion pat­terns at Temporal’s Replay con­fer­ence, and the short ver­sion is: Temporal has let us re­place frag­ile chains of cron jobs and data­base-backed state ma­chines with durable work­flows where the plat­form han­dles the co­or­di­na­tion. The op­er­a­tional im­prove­ment has been sub­stan­tial. It is dif­fi­cult to over­state how pleas­ant it is to delete a hand-rolled dis­trib­uted state ma­chine and re­place it with some­thing whose fail­ure se­man­tics were not im­pro­vised dur­ing sprint plan­ning.

This, again, is adap­tive ca­pac­ity in a dif­fer­ent cos­tume. A sys­tem that can sur­vive worker crashes, process restarts, and long-lived co­or­di­na­tion with­out los­ing its place is a sys­tem whose op­er­a­tors have more lever­age and fewer mys­ter­ies to un­wind dur­ing an in­ci­dent.

Design for Your Domain, Not Your Transport

As your pro­duc­tion sys­tem grows, a com­mon mis­take I’ve ob­served is let­ting the in­vok­ing sys­tem leak into the do­main model.

We have code that throws HTTP sta­tus code ex­cep­tions which re­turn those re­sults di­rectly to the user on the fron­tend. This made sense when the code was writ­ten, be­cause it ran in an HTTP re­quest han­dler. Then, as hap­pens in any grow­ing code­base, pieces of that code got ex­tracted and reused. Now it also runs in cron jobs. It runs in queued back­ground work­ers. It runs in Temporal work­flows. And it still throws StatusCodeException 409 Conflict” when some­thing goes wrong, which is an ab­solutely un­hinged thing for a cron job to do. A cron job does not have a caller wait­ing for a 409. Nobody is read­ing that sta­tus code. The er­ror has prop­a­gated through the sys­tem sim­ply be­cause the orig­i­nal ab­strac­tion was cou­pled to its trans­port layer.

The fix is con­cep­tu­ally sim­ple: model your do­main er­rors as do­main types. A pay­ment that fails be­cause of in­suf­fi­cient funds should be an InsufficientFunds, not a 402. A du­pli­cate re­quest should be a DuplicateRequest, not a 409. These are things your busi­ness logic can match on, retry against, log mean­ing­fully, and han­dle dif­fer­ently de­pend­ing on con­text.

Then you write thin trans­la­tion lay­ers at each bound­ary:

data PaymentError

= InsufficientFunds

| DuplicateRequest RequestId

| PartnerTimeout Partner

to­HttpEr­ror :: PaymentError -> HttpResponse

to­HttpEr­ror InsufficientFunds = er­r402 Insufficient funds”

to­HttpEr­ror (DuplicateRequest _) = er­r409 Duplicate re­quest”

to­HttpEr­ror (PartnerTimeout _) = er­r502 Partner un­avail­able”

toWork­er­Strat­egy :: PaymentError -> WorkerAction

toWork­er­Strat­egy InsufficientFunds = Fail Insufficient funds”

toWork­er­Strat­egy (DuplicateRequest _) = Skip

toWork­er­Strat­egy (PartnerTimeout _) = RetryWithBackoff

Nothing novel about it. But it is re­mark­able how of­ten it gets skipped in prac­tice, be­cause the first ver­sion of any code is writ­ten for one con­text, and by the time you re­al­ize it will be called from three oth­ers, the sta­tus code ex­cep­tions are load-bear­ing be­cause some­one de­cided to catch those ex­cep­tions as part of busi­ness logic and no­body wants to refac­tor them.

The ear­lier you make this sep­a­ra­tion, the less it costs. The later you make it, the stranger the re­sult­ing be­hav­ior be­comes. Eventually you wind up with cron jobs hurl­ing 409s at Sentry and back­ground work­ers in­ter­pret­ing HTTP-specific ex­cep­tions as busi­ness se­man­tics, which is how ab­strac­tions let you know they have es­caped con­tain­ment.

This is the same prin­ci­ple as pu­rity, ex­pressed in do­main lan­guage rather than op­er­a­tional lan­guage. Your trans­port con­cerns be­long at the edges. Your do­main model should sur­vive be­ing in­voked from a web han­dler, a CLI, a cron job, a back­ground worker, or a work­flow en­gine with­out hav­ing to drag an HTTP sta­tus code be­hind it like a tin can tied to a wed­ding car.

The Type Encoding Tradeoff

Here is the part where I tell you not to do too much of the thing I just told you to do.

Encoding in­vari­ants into types is pow­er­ful. It is also ex­pen­sive. Not at run­time, but in cog­ni­tive over­head, in the rigid­ity it in­tro­duces, and in the dif­fi­culty of chang­ing things later when the re­quire­ments shift. And the re­quire­ments will shift. If you work at a com­pany where they do not, I would like to know your se­cret, and also your stock ticker.

Every in­vari­ant you push into the type sys­tem is a con­straint on every fu­ture en­gi­neer who touches that code. If vi­o­lat­ing the con­straint would cause data loss, fi­nan­cial er­rors, reg­u­la­tory trou­ble, or a poor soul’s pager to go off, then the cost is jus­ti­fied. If the con­straint is we cur­rently hap­pen to do things this way,” or I read this ar­ti­cle about de­pen­dent types and I sim­ply must ap­ply that to my au­tho­riza­tion logic,” you have likely just made your code­base harder to change for no op­er­a­tional ben­e­fit. The next per­son to en­counter it will ei­ther spend a week refac­tor­ing the types or, more likely, find a way around them that is worse than what you were try­ing to pre­vent.

There is a spec­trum, and pro­duc­tion code­bases must live on it hon­estly.

At one end: you en­code every­thing. Your types are a faith­ful model of your do­main. Illegal states are un­rep­re­sentable. Refactoring takes weeks be­cause chang­ing a busi­ness rule means thread­ing a type change through fifty mod­ules. New en­gi­neers stare at the type sig­na­tures and won­der what they have done to de­serve this, then qui­etly be­gin dis­cussing their ca­reer op­tions with a ther­a­pist. You have built a cathe­dral. Cathedrals are beau­ti­ful. They are also ex­pen­sive, cold, and not es­pe­cially fa­mous for how quickly one ren­o­vates the plumb­ing.

At the other end: you en­code noth­ing. Your types are String and IO () and, in the worst case, Dynamic. The code is easy to change be­cause there are no con­tracts to vi­o­late. The sys­tem works be­cause the peo­ple who built it are still around and re­mem­ber what the strings mean. When they leave, it stops work­ing, and no­body knows why. You have built a tent. Tents are flex­i­ble, portable, and, un­der cer­tain weather con­di­tions, a very di­rect way to learn about the sky. This is, of course, one of the rea­sons many Haskell de­vel­op­ers flee to Haskell in the first place — to avoid the suf­fer­ing that comes from this ap­proach.

The sweet spot is some­where in the mid­dle. A few heuris­tics I think are use­ful:

Encode in­vari­ants that pro­tect against silent cor­rup­tion. If a vi­o­la­tion would pro­duce wrong data with­out any im­me­di­ate er­ror (a trans­ac­tion com­mit­ted with­out its events, a pay­ment processed with­out an au­dit log, a state tran­si­tion that looks plau­si­ble but is se­man­ti­cally im­pos­si­ble), put it in the types. The feed­back loop for silent fail­ures is too long to rely on hu­man dili­gence.

Use run­time checks for in­vari­ants that fail loudly. If a vi­o­la­tion would pro­duce an im­me­di­ate, ob­vi­ous er­ror (a 500 re­sponse, a failed as­ser­tion, a type mis­match at a JSON bound­ary), a run­time check with a good er­ror mes­sage may be enough. You will catch it be­fore pro­duc­tion, or very quickly af­ter.

Resist the urge to model your en­tire do­main in types. Your do­main is messy. It has edge cases, grand­fa­ther clauses, rules that con­tra­dict each other, and spe­cial be­hav­ior for three spe­cific cus­tomers that dates back to 2018 and that no­body fully un­der­stands. The type sys­tem wants crisp­ness. Your busi­ness does not pro­vide it. Nor will it ever.

Remember that types are for the team, not just for the com­piler. The com­piler is one tool among many. Tests, doc­u­men­ta­tion, code re­view, ex­am­ples, play­books: these all com­bine to pro­vide de­fense in depth. The goal is not to win an ar­gu­ment with the type checker. The goal is to build a sys­tem that a team of hu­mans, in­clud­ing hu­mans who learned Haskell this year, can op­er­ate, ex­tend, and main­tain.

That said, in­tense type-level ma­chin­ery is some­times ex­actly what you need. We have in­ter­nal li­braries where the types are gen­uinely hairy: GADTs, type fam­i­lies, phan­tom types track­ing state tran­si­tions. These tend to be mech­a­nisms where get­ting it wrong means money goes to the wrong place or a reg­u­la­tory in­vari­ant is vi­o­lated. The com­plex­ity is ab­solutely es­sen­tial here.

The key thing, if you want to do this sus­tain­ably, is that we en­cap­su­late the com­plex­ity. The mod­ule that im­ple­ments the type-level state ma­chine typ­i­cally has a small num­ber of au­thors who un­der­stand it deeply and, ide­ally, a thor­ough test suite. The mod­ule that uses it has a sur­face API that looks like five nor­mal func­tions with nor­mal types. A prod­uct en­gi­neer on an­other team can call those func­tions with­out know­ing or car­ing that un­der­neath there is a small type-level the­o­rem prover en­sur­ing they can­not com­mit a trans­ac­tion in the wrong state. The proof oblig­a­tions are dis­charged in­side the bound­ary, not leaked across it.

This is the same con­tain­ment prin­ci­ple as pu­rity, ap­plied one level up. The com­plex­ity it­self is fine, be­cause it buys you some­thing valu­able. What causes prob­lems is com­plex­ity that leaks across mod­ule bound­aries into code main­tained by peo­ple who did not sign up for it. We catch this in code re­view more than any­where else. Someone opens a PR that touches a mod­ule they do not own, and the diff is full of type an­no­ta­tions they had to cargo-cult from a neigh­bor­ing file just to make the com­piler stop scream­ing. That is usu­ally a sign the ab­strac­tion has be­come a form of friendly fire.

An open-weights Chinese model just beat Claude, GPT-5.5, and Gemini in a programming challenge - ThinkPol

thinkpol.ca

By Rohana Rezel

I’m run­ning the on­go­ing AI Coding Contest where I pit ma­jor lan­guage mod­els against each other in real-time pro­gram­ming tasks with ob­jec­tive scor­ing. Day 12 was the Word Gem Puzzle. Ten mod­els en­tered. The re­sults were not what most peo­ple would have pre­dicted.

Kimi K2.6, an open-weights model from Chinese startup Moonshot AI, won the chal­lenge out­right: 22 match points, 7 – 1-0. MiMo V2-Pro from Xiaomi came sec­ond. GPT-5.5 was third. Claude Opus 4.7 fin­ished fifth. Every model from the Western fron­tier labs landed be­low the top two.

The chal­lenge

The Word Gem Puzzle is a slid­ing-tile let­ter puz­zle. The board is a rec­tan­gu­lar grid (10×10, 15×15, 20×20, 25×25, or 30×30) filled with let­ter tiles and one blank space. Bots can slide any ad­ja­cent tile into the blank and at any point claim valid English words formed in straight hor­i­zon­tal or ver­ti­cal lines. Diagonals don’t count. Backwards does­n’t count.

The scor­ing re­wards longer words and pun­ishes short ones. Words un­der seven let­ters cost points: a five-let­ter word loses you one point, a three-let­ter word costs three. Seven let­ters or more score their length mi­nus six, so an eight-let­ter word is worth two points. The same word can only be claimed once; if an­other bot gets there first, you get noth­ing. Each pair of mod­els played five rounds, one per grid size, with a ten-sec­ond wall-clock limit per round.

The grids are seeded with real dic­tio­nary words in a cross­word-style lay­out, then the re­main­ing cells are filled with let­ters weighted by Scrabble tile fre­quen­cies, and fi­nally the blank is scram­bled, more ag­gres­sively on larger boards. On a 10×10, many seed words sur­vive in­tact. On a 30×30, al­most none do. That turns out to mat­ter a lot.

The code pro­duced by Nvidia’s Nemotron Super 3 con­tained a syn­tax er­ror, so it never con­nected to the game server. Nine mod­els ac­tu­ally com­peted.

Kimi K2.6 is open-weights, pub­licly avail­able from Moonshot AI, a Chinese startup founded in 2023. MiMo V2-Pro is cur­rently API-only; the tweet linked here is Xiaomi con­firm­ing that weights for their newer V2.5 Pro model are drop­ping soon.[1]https://​x.com/​Xi­aomiM­iMo/​sta­tus/​2047840164777726076 The mod­els from Anthropic, OpenAI, Google, and xAI placed third through sev­enth. GLM 5.1, from Chinese lab Zhipu AI, placed fourth. DeepSeek fin­ished eighth. This is­n’t a clean China-beats-West story; it’s two spe­cific mod­els that won.

What I saw

The move logs tell the story. Kimi won by slid­ing ag­gres­sively. Its ap­proach was greedy: score each pos­si­ble move by what new pos­i­tive-value words it un­locks, ex­e­cute the best one, re­peat. When no move un­locked a pos­i­tive word, it fell back to the first le­gal di­rec­tion al­pha­bet­i­cally. This caused some in­ef­fi­cient edge-os­cil­la­tion, a 2-cycle pat­tern where the bot bounced the blank back and forth with­out progress. On smaller grids where seed words were still largely in­tact, that hurt. On the 30×30 grids, where the scram­ble had bro­ken up nearly every­thing and re­con­struc­tion was the only path to points, the sheer slide vol­ume even­tu­ally paid off. Kimi’s cu­mu­la­tive score of 77 was the high­est in the tour­na­ment.

MiMo’s slid­ing code ex­ists in the repo, but its best value greater than zero” thresh­old never trig­gered, so in prac­tice it never slid once. It went straight to scan­ning the ini­tial grid for words of seven let­ters or more and blasted all its claims in a sin­gle TCP packet. Brittle strat­egy: en­tirely de­pen­dent on the scram­ble leav­ing in­tact seed words. On grids where words sur­vived, MiMo cleaned up fast. On grids where they did­n’t, it scored noth­ing. Final tally: 43 cu­mu­la­tive points, sec­ond place.

Claude also did­n’t slide. The move logs show it hold­ing up well on 25×25 boards where scram­ble den­sity was still man­age­able, then falling apart on 30×30 where ac­tual tile move­ment was needed. Not slid­ing is a real lim­i­ta­tion in a puz­zle built around slid­ing.

GPT-5.5 was more con­ser­v­a­tive, roughly 120 slides per round with a cap to avoid thrash­ing, and showed the strongest num­bers on 15×15 and 30×30 grids. Grok never slid ei­ther, yet scored rea­son­ably on the larger boards. GLM was the most ag­gres­sive slider in the whole tour­na­ment, over 800,000 to­tal slides, but stalled badly when­ever it ran out of pos­i­tive moves.

DeepSeek sent mal­formed data every round. Zero use­ful out­put. At least it did­n’t make things worse by play­ing.

Muse made things worse by play­ing.

The scor­ing pe­nal­izes short words: three-let­ter words cost three points, four-let­ter words cost two, five-let­ter words cost one. The in­tent is to stop bots from car­pet-bomb­ing the board with the” and and” and it.” Every se­ri­ous com­peti­tor fil­tered their dic­tio­nary to words of seven let­ters or more. Muse claimed every­thing. Every word it could find, re­gard­less of length, fired off as a claim. On a 30×30 grid with hun­dreds of short valid words vis­i­ble at any mo­ment, Muse found them all and claimed every one.

Its cu­mu­la­tive score was −15,309. It lost all eight matches and won zero rounds. There is a ver­sion of Muse that sim­ply con­nected to the server and did noth­ing, and that ver­sion would have scored zero, a 15,309-point im­prove­ment. The gap be­tween Muse and eighth place was larger than the gap be­tween eighth and first.

DeepSeek’s mal­formed out­put tells you some­thing about how it han­dles novel pro­to­col specs un­der time pres­sure. Muse’s spi­ral tells you some­thing dif­fer­ent: it saw valid words and claimed them, with no ap­par­ent model of what valid” meant given the scor­ing rules. It read the task par­tially and ex­e­cuted that par­tial read­ing in full. Worth not­ing for any­one de­ploy­ing these mod­els on struc­tured tasks with penal­ties.

What sur­prised me

I de­sign these chal­lenges, so I have a rea­son­able sense of what they test. What I did­n’t fully an­tic­i­pate was how starkly the 30×30 grids would sep­a­rate the field. On smaller boards, the dif­fer­ence be­tween a sta­tic scan­ner and an ac­tive slider was mod­est. At full scale, mod­els that could only find what was al­ready there ran out of road. Kimi’s greedy loop, flawed as it was, kept pro­duc­ing out­put when the sta­tic scan­ners had noth­ing left to claim.

The other thing worth not­ing: MiMo and Kimi fin­ished two points apart de­spite do­ing al­most op­po­site things. Two dif­fer­ent the­o­ries of the same puz­zle, nearly iden­ti­cal re­sults. That means the gap be­tween first and sec­ond was partly seed vari­ance, not just ca­pa­bil­ity dif­fer­ence.

The big­ger pic­ture

One fair coun­ter­ar­gu­ment: this scor­ing sys­tem re­wards ag­gres­sive word claim­ing, and heav­ily safety-tuned mod­els may be more con­ser­v­a­tive about that kind of car­pet-bomb­ing. If so, the re­sults re­flect a mis­match be­tween task de­sign and aligned model be­hav­iour, not raw ca­pa­bil­ity. It’s a rea­son­able ob­jec­tion. It does­n’t change the out­come.

One chal­lenge does­n’t over­turn gen­eral bench­marks. This puz­zle tests real-time de­ci­sion-mak­ing and whether a model can write clean func­tional code that con­nects to a TCP server and plays a novel game cor­rectly. It does­n’t test long-con­text rea­son­ing or code gen­er­a­tion from a spec.

But I’ve been run­ning these chal­lenges long enough to no­tice what’s chang­ing. A year ago, the as­sump­tion was that the Western fron­tier labs had a ca­pa­bil­ity lead open-weights could­n’t close. Kimi K2.6 now scores 54 on the Artificial Analysis Intelligence Index. GPT-5.5 scores 60, Claude 57. That’s not par­ity, but it’s close, and it’s com­ing from a model any­one can down­load.

When mod­els within a few in­dex points of the fron­tier are also freely avail­able to run lo­cally, that’s a dif­fer­ent com­pet­i­tive sit­u­a­tion than the one that ex­isted a year ago. This chal­lenge is one data point in that shift. The gap is small enough now that it shows up in re­sults like this one.

Rohana Rezel runs the AI Coding Contest and is a tech­nol­o­gist, re­searcher, and com­mu­nity leader based in Vancouver, BC.

References

California to begin ticketing driverless cars that violate traffic laws

www.bbc.com

1 day ago

Grace Eliza Goodwin

Getty Images

Driverless cars are be­com­ing more com­mon in some California cities, but when the au­tonomous ve­hi­cles vi­o­late traf­fic laws, po­lice haven’t been able to ticket them - un­til now.

The state’s Department of Motor Vehicles (DMV) has an­nounced new reg­u­la­tions on au­tonomous ve­hi­cles (AVs), in­clud­ing a process for po­lice to is­sue a notice of AV non­com­pli­ance” di­rectly to the car’s man­u­fac­turer.

The new rules, which will go into ef­fect 1 July, are part of a larger 2024 law that im­posed deeper reg­u­la­tion on the tech­nol­ogy.

There have been a num­ber of re­ports of the cars break­ing traf­fic laws, in­clud­ing dur­ing a San Francisco black­out last year.

The California DMV is call­ing the new rules the most com­pre­hen­sive AV reg­u­la­tions in the na­tion”.

Under the new rules, po­lice can cite AV com­pa­nies when their ve­hi­cles com­mit mov­ing vi­o­la­tions. The rules will also re­quire the com­pa­nies to re­spond to calls from po­lice and other emer­gency of­fi­cials within 30 sec­onds, and will is­sue penal­ties if their ve­hi­cles en­ter ac­tive emer­gency zones.

California con­tin­ues to lead the na­tion in the de­vel­op­ment and adop­tion of AV tech­nol­ogy, and these up­dated reg­u­la­tions fur­ther demon­strate the state’s com­mit­ment to pub­lic safety,” DMV Director Steve Gordon said in a press re­lease.

Waymo is one of the main op­er­a­tors of fully self-dri­ving ro­b­o­t­axis in the San Francisco Bay Area and Los Angeles County, but sev­eral com­pa­nies, in­clud­ing Tesla, also have per­mits to test their AVs in some California cities. The BBC has con­tacted Waymo and Tesla for com­ment.

When the ve­hi­cles vi­o­late traf­fic laws, some po­lice have been stumped as to how to hold the dri­ver­less cars ac­count­able.

In an in­ci­dent last September, po­lice of­fi­cers in San Bruno - a city south of San Francisco - no­ticed a Waymo AV mak­ing an il­le­gal U-turn at a light di­rectly in front of them, the San Bruno Police Department said at the time. But when of­fi­cers stopped the car, they were not able to is­sue a ticket with­out a dri­ver to give it to. Instead, they con­tacted the com­pany about the glitch”.

San Francisco Fire Department of­fi­cials have also re­peat­edly com­plained about ro­b­o­t­axis get­ting in the way of emer­gency re­sponses.

This Tesla owner won $10k in court for Tesla’s FSD lies. Tesla is still fighting him.

electrek.co

For over a decade now, Tesla has sold a promise of ve­hi­cles that can drive them­selves, even stat­ing that every car it pro­duced had all the hard­ware for self-dri­ving.

But af­ter years of the com­pany be­ing un­able to de­liver, some own­ers want their money back. Ben Gawiser is one of those own­ers, who re­cently won a $10,600 judg­ment due to Tesla’s fail­ure to de­liver. But Tesla is still fight­ing to de­lay pay­ment, even just a few days at a time.

Gawiser pur­chased a Tesla Model 3 in August of 2021, and paid $10,000 for the com­pa­ny’s Full Self-Driving soft­ware. At the time, the price of the soft­ware had grad­u­ally in­creased, which Tesla said it would do as the soft­ware gained more ca­pa­bil­i­ties and got closer to re­lease.

(Later, Tesla low­ered prices and even­tu­ally moved to a sub­scrip­tion-only model, where it stands now — though Tesla is still charg­ing some own­ers for hard­ware they al­ready bought).

After five years, Gawiser’s pur­chase should have al­lowed his ve­hi­cle to drive all by it­self. After all, Tesla’s soft­ware was con­tin­u­ally get­ting bet­ter, and the com­pa­ny’s CEO had promised in January 2021 that the car will drive it­self for the re­li­a­bil­ity in ex­cess of a hu­man this year.”

However, that did not hap­pen. Tesla is still yet to de­liver soft­ware that is ca­pa­ble of level 5 full self-dri­ving to any owner. Even on its own fleet of Robotaxis, only a few run at Level 4 au­ton­omy in lim­ited cir­cum­stances.

(Tesla pre­vi­ously said you’d be able to use your car as a ro­b­o­t­axi, too, but de­spite that the com­pany is now mak­ing rev­enue with FSD soft­ware in its Robotaxi” fleet, it still does­n’t let you do it)

With all these false promises and what amounts to a five-digit, nearly five-year loan given to Tesla, Gawiser had had enough, and de­cided to do some­thing about it. So he reached out to Tesla’s res­o­lu­tions email ad­dress in November 2025 to ask for a re­fund for his non­func­tional soft­ware… al­beit with some ag­gres­sive lan­guage.

He cited in­stances of his ve­hi­cle stop­ping in the mid­dle of the road, ask­ing him to take over within min­utes of ac­ti­va­tion, and fail­ing to slow for a school zone. But over­all, the soft­ware sim­ply does not de­liver what it promised — he was sold a level 5 sys­tem, and FSD is still level 2.

He was given the cold shoul­der, and asked again in January, at which point he was told that the only rem­edy avail­able would be to visit a ser­vice cen­ter to make sure the sys­tem is work­ing prop­erly. But that would­n’t have up­graded it to the level 5 sys­tem he paid for.

Then, he filed a law­suit in small claims court in Travis County, Texas, where he lives and where Tesla moved its head­quar­ters to. Gawiser is­n’t a lawyer, but small claims courts are de­signed to be used by the pub­lic, rather than lawyers. While Tesla’s pur­chase agree­ment has an ar­bi­tra­tion clause, it is also pos­si­ble to take dis­putes to small claims court.

All it took was find­ing Tesla’s registered agent” (under service of process” on Tesla’s le­gal page), then fil­ing a small claims law­suit on­line with the Texas jus­tice of the peace. This cost him $72.88, in­clud­ing the cost to send cer­ti­fied mail to serve Tesla with the court doc­u­ments.

After be­ing served with the law­suit, Tesla again did not re­spond. So a court date was set for a de­fault judg­ment hear­ing, which is what hap­pens when one party does not re­spond to a court case. The hear­ing hap­pened over video call, where Gawiser pro­vided ev­i­dence show­ing how much he paid for FSD and that it had not yet been de­liv­ered, and the court made a judg­ment in his fa­vor in the amount of $10,672.88, the amount Gawiser paid for FSD, in­clud­ing taxes and court fees.

After the de­fault judg­ment was filed on April 1, Tesla had 3 weeks to file a re­sponse, and did­n’t do so by the April 22 dead­line (and no, in a real court case, you can’t say it was April Fools when you miss a dead­line). This is when we first heard from Gawiser.

However, that was­n’t the end of the saga. Tesla waited 5 more days and filed a re­quest for an ex­ten­sion, stat­ing that they had not re­ceived no­tice of the de­fault judg­ment hear­ing and there­fore could­n’t show up. But rather than re­quest­ing a re­hear­ing, Tesla merely re­quested the dead­line be pushed back by 5 days, and then did­n’t sub­mit any ad­di­tional ev­i­dence show­ing its side of the story (which it has to do if re­quest­ing a re-trial).

In Gawiser’s re­sponse to Tesla’s most re­cent re­quest, he took a swipe at Tesla’s lack of de­fense, us­ing one of Musk’s state­ments dur­ing this quar­ter’s earn­ings call as ev­i­dence:

Tesla, Inc. does not have meritorious de­fense” for this ac­tion as their CEO as re­cently as April 22nd, 2026 said that Tesla could not de­liver a work­ing ver­sion of Full Self-Driving” for the ve­hi­cle that the Plaintiff pur­chased re­quired by the con­tract. Unless their coun­sel hap­pens to know Tesla’s own prod­ucts bet­ter than their CEO, they have no de­fense to this cause of ac­tion. The re­quire­ment for a meritorious de­fense” is laid out in Craddock v. Sunshine Bus Lines. Tesla, Inc. has not pre­sented a prima fa­cie mer­i­to­ri­ous de­fense”, nor do they have one.

Tesla, Inc. does not have meritorious de­fense” for this ac­tion as their CEO as re­cently as April 22nd, 2026 said that Tesla could not de­liver a work­ing ver­sion of Full Self-Driving” for the ve­hi­cle that the Plaintiff pur­chased re­quired by the con­tract. Unless their coun­sel hap­pens to know Tesla’s own prod­ucts bet­ter than their CEO, they have no de­fense to this cause of ac­tion. The re­quire­ment for a meritorious de­fense” is laid out in Craddock v. Sunshine Bus Lines. Tesla, Inc. has not pre­sented a prima fa­cie mer­i­to­ri­ous de­fense”, nor do they have one.

In that call, Musk fi­nally ad­mit­ted that HW3 cars like Gawiser’s would never be able to drive them­selves, and would re­quire Tesla to build fac­to­ries just to up­grade them. This would only add more wait for Gawiser if he did want to con­tinue wait­ing pa­tiently for FSD, as there is no in­di­ca­tion that Tesla has started build­ing those fac­to­ries to de­liver the hard­ware needed to make the promised soft­ware work. (Further, the cur­rent hard­ware, HW4, also has not yet de­liv­ered Level 5 au­ton­omy to cus­tomers)

In short, Gawiser’s ar­gu­ment is: I bought this soft­ware, it was­n’t de­liv­ered, and there is no le­gal ar­gu­ment that can get around those facts as long as it re­mains un­de­liv­ered.

The court has not yet re­sponded to this most re­cent back and forth, but Gawiser is con­fi­dent that he will pre­vail.

Then comes the mat­ter of pay­ment — Gawiser filed a writ of ex­e­cu­tion” (another $240 in court fees) just yes­ter­day, which would al­low Texas law en­force­ment to seize and sell off enough of Tesla’s prop­erty as would be re­quired to pay the judg­ment against them. If it comes to that, we hope he brings cam­eras.

Electrek’s Take

Gawiser’s case is one of a few we’ve heard of where own­ers were able to get re­funds from Tesla, ei­ther in small claims or mul­ti­ple cases of ar­bi­tra­tion. But these cases still re­main rather rare, given the scale of the bro­ken promise.

There are cur­rently mil­lions of ve­hi­cles on the road that have no hope of get­ting the full self-dri­ving hard­ware they were sold with… and yes, we think the Tesla HW3 retro­fit microfactory” plan is un­likely to ac­tu­ally hap­pen.

But in ad­di­tion to these small claims cases, there are a num­ber of class ac­tion cases work­ing their way through le­gal sys­tems around the world. We’ve heard of class ac­tion suits al­ready in process in the US, China and Australia, and in Europe thou­sands of own­ers have signed up on a Dutch col­lec­tive claim site (Tesla’s re­sponse to this ef­fort was :”be pa­tient“)

Collectively, these suits could re­sult in bil­lions in li­a­bil­ity for the com­pany — and none of this would have been nec­es­sary if not for the CEOs con­stant false promises.

So it’s pos­si­ble that all own­ers will even­tu­ally re­ceive some sort of rec­ti­fi­ca­tion for this is­sue that Tesla has cre­ated, but Gawiser’s case shows how one owner took the mat­ter into his own hands, and got it taken care of once and for all (or so we think — we’ll up­date this post if Tesla de­cides to em­ploy any more de­lay­ing tac­tics).

But Gawiser’s case may not be re­peat­able. Since this case went through small claims and Tesla failed to re­spond, the de­fault judg­ment was­n’t re­ally a judg­ment on the mer­its of the case it­self. And small claims cases do not set a bind­ing le­gal prece­dent, so there’s no guar­an­tee an­other court would rule in the same way.

However, it’s still use­ful to gauge the sta­tus of how these cases can work in prac­tice. It seems that one more owner was able to get his due from Tesla, and whether by in­ten­tional tac­tics or by the in­ter­nal dis­or­ga­ni­za­tion the com­pany is oft known for, Tesla barely even put up a de­fense. If Tesla found these cases easy to de­fend, it may have done so — but it did­n’t.

Charge your elec­tric ve­hi­cle at home us­ing rooftop so­lar pan­els. Find a re­li­able and com­pet­i­tively priced so­lar in­staller near you on EnergySage, for free. They have pre-vet­ted in­stallers com­pet­ing for your busi­ness, en­sur­ing high-qual­ity so­lu­tions and 20 – 30% sav­ings. It’s free, with no sales calls un­til you choose an in­staller. Compare per­son­al­ized so­lar quotes on­line and re­ceive guid­ance from un­bi­ased Energy Advisers. Get started here. — ad*

FTC: We use in­come earn­ing auto af­fil­i­ate links. More.

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.