10 interesting stories served every morning and every evening.

Moving away from Tailwind, and learning to structure my CSS

jvns.ca

Hello! 8 years ago, I wrote ex­cit­edly about dis­cov­er­ing Tailwind.

At that time I re­ally had no idea how to struc­ture my CSS code and given the choice be­tween a pile of com­plete chaos and Tailwind, I was re­ally happy to choose Tailwind. It helped me make a lot of tiny sites!

I spent the last week or so mi­grat­ing a cou­ple of sites away from Tailwind and to­wards more se­man­tic HTML + vanilla CSS, and it was SO fun and SO in­ter­est­ing, so here are some things I learned!

As usual I’m not a full-time fron­tend de­vel­oper and so all of my CSS learn­ing has hap­pened in fits and starts over many years.

it turns out Tailwind taught me a lot

When I started think­ing about struc­tur­ing CSS, I was in­tim­i­dated at first: I’m not very good at struc­tur­ing my CSS! But then I started read­ing blog posts talk­ing about how to struc­ture CSS (like A whole cas­cade of lay­ers or How I write CSS in 2024) and I re­al­ized a cou­ple of things:

Every CSS code base has a bunch of dif­fer­ent things go­ing on (layouts! fonts! colours! com­mon com­po­nents!)

It’s ex­tremely use­ful to have sys­tems or guide­lines to man­age each of those things, oth­er­wise things de­scend into chaos

Tailwind has sys­tems for some of these, and I al­ready know those sys­tems! Maybe I can im­i­tate the sys­tems I like!

For ex­am­ple, Tailwind has:

a re­set stylesheet

a colour palette

a font scale

the sys­tems I’m go­ing to talk about

I’m go­ing to talk about a few as­pects of my CSS code­base and my thoughts so far what kind of rules I want to im­pose on the code­base for each one. Some of them are copied from Tailwind and some aren’t.

re­set

com­po­nents

colours

font sizes

util­ity classes

the base

spac­ing

re­spon­sive de­sign

the build sys­tem

1. re­set

I just copied Tailwind’s preflight styles” by go­ing into tail­wind.css and copy­ing the first 200 lines or so.

I no­ticed that I’ve de­vel­oped a re­la­tion­ship with Tailwind’s CSS re­set over time, for ex­am­ple Tailwind sets box-siz­ing: bor­der-box on every el­e­ment (which means that an el­e­men­t’s width in­cludes its padding):

* { box-siz­ing: bor­der-box; }

I think it would be a real ad­just­ment for me to switch to writ­ing CSS with­out these, and I’m sure there are lots of other things in the Tailwind re­set (like html {line-height: 1.5;}) that I’m sub­con­sciously used to and don’t even re­al­ize are there.

2. com­po­nents

This next part is the bulk of the CSS!

The idea here is to or­ga­nize CSS by components”, in a way that’s spir­i­tu­ally re­lated to Vue or React com­po­nents. (though there might not ac­tu­ally be any Javascript at all in the site)

Basically the idea is that:

Each component” has a unique class

The CSS for one com­po­nent never over­rides the CSS for any other com­po­nent

Each com­po­nent has its own CSS file

So edit­ing the CSS for one com­po­nent won’t mys­te­ri­ously break some­thing in an­other com­po­nent. And prob­a­bly like 80% of the CSS that I would ac­tu­ally want to change is in var­i­ous com­po­nent files, so if I’m edit­ing a 100-line com­po­nent, I just have to think about those 100 lines. It’s way eas­ier for me to think about.

For ex­am­ple, this HTML might be the .zine component”.

<figure class=“zine hor­i­zon­tal”> <img src=“what­ever.jpg”> </figure>

And the CSS looks some­thing like this, us­ing nested se­lec­tors:

.zine { … &.horizontal { … } &.vertical { … } &:hover { … } }

I haven’t done any­thing pro­gram­matic (like web com­po­nents or @scope) that en­sures that com­po­nents won’t in­ter­fere with each other, but just hav­ing a con­ven­tion and try­ing my best al­ready feels like a big im­prove­ment.

Next: con­ven­tions to main­tain some con­sis­tency across the site and keep these com­po­nents in line with each other!

3. colours

colours.css has a bunch of vari­ables like this which I can use as nec­es­sary. Colour is re­ally hard and I did­n’t want to re­visit my use of colour in this refac­tor, so I left this alone.

The only guide­line I’m try­ing to en­force here is that all colours used in the site are listed in this file.

:root { –pink: #fea0c2; –pink-light: #F9B9B9; –red: #f91a55; –orange: rgb(222, 117, 31); … }

4. font sizes

One thing I ap­pre­ci­ated about Tailwind was that if I wanted to set a font size, I could just think hm, I want the text to be big”, write text-lg, and be done with it! And maybe if it’s not big enough I’d use xl or 2xl in­stead. No try­ing to re­mem­ber whether I’m us­ing em or px or rem.

So I de­fined a bunch of vari­ables, taken from Tailwind, like this:

–size-xs: 0.75rem; –line-height-xs: 1rem;

–size-sm: 0.875rem; –line-height-sm: 1.25rem;

Then if I want to set a font size, I can do it like this. It’s a lit­tle more ver­bose than Tailwind but I’m happy with it for now.

h3 { font-size: var(–size-lg); line-height: var(–line-height-lg); }

5. util­i­ties

There are some things like but­tons that ap­pear in many dif­fer­ent com­po­nents. I’m call­ing these utilities”.

I copied some util­ity classes from Tailwind (like .sr-only for things that should only ap­pear for screen­reader users).

This sec­tion is pretty small and I try to be care­ful about mak­ing changes here.

6. the base

base” styles are styles that ap­ply across the whole site that I chose my­self. I have to keep this sec­tion re­ally small be­cause I’m not con­fi­dent enough to en­force a lot of styles across the whole site. These are the only two I feel okay about right now, and I might change the <section> one:

/* put a 950px col­umn in the mid­dle of each <section> */ sec­tion { –inner-width: 950px; padding: 3rem max(1rem, (100% - var(–in­ner-width))/​2); }

a { color: var(–or­ange); }

I think for the base styles it’s go­ing to be eas­i­est for me to work kind of bot­tom up — first start with al­most noth­ing in the base styles, and then move some styles from the com­po­nents into base styles as I iden­tify com­mon things I want.

7. spac­ing

I haven’t com­pletely worked out an ap­proach to man­ag­ing padding and mar­gins yet. I’m def­i­nitely try­ing to be more prin­ci­pled than how I was do­ing it in Tailwind though, where I would just hap­haz­ardly put padding and mar­gins every­where un­til it looked the way I wanted.

Right now I’m work­ing to­wards mak­ing the outer lay­out com­po­nents in charge of spac­ing as much as pos­si­ble. For ex­am­ple if I have a <section> with a bunch of chil­dren that I want to have space be­tween them, I might use this to space the chil­dren evenly:

sec­tion > *+* { mar­gin-top: 1rem; }

Some in­spi­ra­tion blog posts:

the owl se­lec­tor

no outer mar­gin”

8. re­spon­sive de­sign: use more grid!

The way I was do­ing re­spon­sive de­sign in Tailwind was to use a lot of me­dia queries. Tailwind has this md:text-xl syn­tax that means apply the text-xl style at sizes md or larger”.

I’m try­ing some­thing pretty dif­fer­ent now, which is to make more flex­i­ble CSS grid lay­outs that don’t need as many break­points. This is hard but it’s re­ally in­ter­est­ing to learn about what’s pos­si­ble with grid, and it’s a good ex­am­ple of some­thing that I don’t think is pos­si­ble with Tailwind.

For ex­am­ple, I’ve been learn­ing about how to use auto-fit to au­to­mat­i­cally use 2 columns on a big screen and 1 col­umn on a small screen like this:

dis­play: grid; grid-tem­plate-columns: re­peat(auto-fit, min­max(min(100%, 400px), max-con­tent)); jus­tify-con­tent: cen­ter;

I also used grid-tem­plate-ar­eas a lot which is an amaz­ing fea­ture that I don’t think you can use with Tailwind.

Some in­spi­ra­tion:

A re­spon­sive grid lay­out with no me­dia queries from CSS Tricks

9. the build sys­tem: es­build

In de­vel­op­ment, I don’t need a build sys­tem: CSS now has both built in im­port state­ments, like this:

@import reset.css”; @import typography.css”; @import colors.css”;

and built in nested se­lec­tors, like this:

.page { h2 { …} }

If I want, I can use es­build to bun­dle the CSS file for pro­duc­tion. That looks some­thing like this.

es­build style.css –bundle –loader:.svg=dataurl –loader:.woff2=file –outfile=/tmp/out.css

Even though I usu­ally avoid us­ing CSS and JS build sys­tems, I don’t mind us­ing es­build (which I wrote about in 2021 here) be­cause it’s based on web stan­dards and be­cause it’s a sta­tic Go bi­nary.

why mi­grate away from Tailwind?

A few peo­ple asked why I was mi­grat­ing away from Tailwind. A few fac­tors that con­tributed are:

Tailwind has be­come much more re­liant on a build sys­tem since 2018, I think it’s im­pos­si­ble (?) to use newer ver­sions of Tailwind with­out us­ing a build sys­tem. So I’ve been us­ing Tailwind v2 for years. (there’s also litewind ap­par­ently)

It’s al­ways been true that you’re sup­posed to use Tailwind with a build sys­tem, but I’ve never re­ally done that, so I have 2.8MB tail­wind.min.css files (270K gzipped) in a lot of my pro­jects and it feels a lit­tle silly.

I’m a lot bet­ter at CSS than I was when I started us­ing Tailwind

Ultimately Tailwind is lim­it­ing: if you want to do Weird Stuff in your CSS, it’s not al­ways pos­si­ble with Tailwind. Those lim­its can be ex­tremely use­ful (a lot of this post is about me reim­ple­ment­ing some of Tailwind’s lim­its!) but at this point I’d like to be able to pick and choose.

I ended up with sites that mixed both vanilla CSS and Tailwind in the same pro­ject and that was not fun to main­tain

I got cu­ri­ous about what writ­ing more se­man­tic HTML would feel like.

CSS fea­tures I’m cu­ri­ous about

While do­ing this I learned about a lot of CSS fea­tures that I did­n’t use but am cu­ri­ous about learn­ing about one day:

@layer (from A Whole Cascade of Layers)

@scope)

con­tainer queries

sub­grid

The CTF scene is dead

kabir.au

What makes me qual­i­fied to say this?

I started play­ing CTFs in 2021, the same year I started uni­ver­sity. My first CTF was HCKSYD, a 48-hour solo CTF. I full solved it and won in 2 hours. I was com­pletely hooked. That led me to win DownUnderCTF, Australia’s largest CTF, with Blitzkrieg mul­ti­ple times. Blitzkrieg was one of Australia’s strongest teams at the time. I later joined TheHackersCrew, an in­ter­na­tional top-tier team that was con­sis­tently ranked highly on CTFTime, the main global rank­ing and event cal­en­dar the scene uses as its score­board. With them, I com­peted in some of the most pres­ti­gious CTFs in the world, con­sis­tently plac­ing well within the top 10 un­til the end of 2025.

I am not say­ing this be­cause I dis­like CTFs. I am say­ing it be­cause CTFs were the thing that made me fall in love with se­cu­rity. They taught me how to learn, gave me a way to mea­sure my­self, and in­tro­duced me to many of the peo­ple I re­spect most in the field. Watching peo­ple pre­tend the for­mat is still fine is frus­trat­ing be­cause the old game is not there any­more.

What changed?

As AI tools ramped up in ca­pa­bil­ity, es­pe­cially when GPT-4 first came out, a sig­nif­i­cant per­cent­age of medium dif­fi­culty CTF chal­lenges started be­com­ing one-shot­table, mean­ing a sin­gle prompt from a user could pro­duce the solve and flag. You could paste a cryp­tog­ra­phy chal­lenge into ChatGPT, come back in 10 min­utes, and have the so­lu­tion. At the time, we did not think too much of it. Hard chal­lenges went mostly un­touched, and the time save was not large enough to ruin the com­pe­ti­tion.

The is­sue was never that AI could help. CTF play­ers have al­ways used tools. The is­sue is when the model does the rea­son­ing, writes the solve, and leaves the hu­man with noth­ing mean­ing­ful to do be­sides copy the flag.

Enter Claude Opus 4.5

When Opus 4.5 dropped, the tone changed. Almost every medium dif­fi­culty chal­lenge, and some hard chal­lenges, be­came agent-solv­able. Claude Code pack­aged every­thing into a CLI and made it easy to con­nect other CLI and MCP tools. It be­came triv­ial to build an or­ches­tra­tor that used the CTFd API to spin up a Claude in­stance for every chal­lenge. You could let the sys­tem run for the first hour, then only start work­ing on what­ever was left.

That changed the game. Teams that re­fused to use AI were not just miss­ing a con­ve­nience; they were play­ing a slower ver­sion of the com­pe­ti­tion. Open on­line CTFs started be­com­ing a ques­tion of how quickly you could au­to­mate the easy and medium work, then how much hu­man at­ten­tion you had left for the hard­est chal­lenges. The score­board started mea­sur­ing or­ches­tra­tion and will­ing­ness to use fron­tier mod­els along­side, and some­times above, se­cu­rity skill.

The ef­fects were ob­vi­ous. The CTFTime leader­board started feel­ing wrong. Some leg­endary teams that were con­sis­tently near the top ap­peared less of­ten. Player ac­tiv­ity felt lower. Challenge de­vel­op­ers who treated CTFs as an art­form had less rea­son to spend weeks build­ing some­thing beau­ti­ful if it was go­ing to be eaten by an agent in min­utes.

GPT-5.5 seals the deal

I have been work­ing heav­ily with GPT-5.5 and GPT-5.5 Pro af­ter launch. By bench­mark met­rics, 5.5 is close to Claude Mythos’ ca­pa­bil­ity, and Pro likely sur­passes it. These mod­els can one-shot Insane dif­fi­culty ac­tive leak­less heap pwn chal­lenges on HackTheBox. They can solve a large por­tion of what a smaller CTF or­gan­iser can re­al­is­ti­cally pro­duce. If you or­ches­trate Pro against Insane chal­lenges in a 48-hour CTF, there is a good chance you get the flag be­fore the event ends.

That makes open CTFs pay-to-win. The more to­kens you can throw at a com­pe­ti­tion, the faster you can burn down the board. Specialised cy­ber­se­cu­rity mod­els like alias1 by Alias Robotics are be­com­ing less rel­e­vant com­pared to gen­eral fron­tier LLMs. The com­pe­ti­tion is turn­ing into who can af­ford to run enough agents, with enough con­text, for long enough.”

CTFs feel much more like a cheesable mess than a com­pe­ti­tion. Your per­for­mance in a CTF no longer de­fines your skill the way it used to. Recruiting se­cu­rity prac­ti­tion­ers by CTF per­for­mance is be­com­ing less mean­ing­ful. It is not even a par­tic­u­larly good mea­sure of AI skill, be­cause most of the or­ches­tra­tion needed for CTFs is al­ready open source or vibe code­able.

The beginners are fine” take

I have seen var­i­ous takes that be­gin­ners can still learn from CTFs as they al­ways have. These takes miss the score­board. CTFs were not just a set of puz­zles. They were a lad­der. Even as a be­gin­ner, you had some­thing to climb. You could see your­self im­prove, solve more chal­lenges, place higher, join bet­ter teams, and be­come more com­pet­i­tive over time.

That feed­back loop is break­ing. If the vis­i­ble score­board is dom­i­nated by teams us­ing AI, a be­gin­ner is pushed to­ward us­ing AI be­fore they have built the in­stincts the AI is re­plac­ing. That is an anti-pat­tern. It pre­vents ac­tive learn­ing, and ac­tive strug­gle is the bit that ac­tu­ally teaches you. It is also com­pletely de­mo­ti­vat­ing to put in real ef­fort and see no vis­i­ble progress be­cause the lad­der above you has been au­to­mated.

It also changes what chal­lenge au­thors want to build. If be­gin­ner CTFs be­come an­other place where peo­ple qui­etly paste prompts and climb a score­board, au­thors have more rea­son to put their ef­fort into learn­ing plat­forms in­stead. At least on plat­forms like pic­o­Gym and HackTheBox, the ex­pec­ta­tion is ed­u­ca­tion, and be­gin­ners are less in­cen­tivised to cheat them­selves out of learn­ing.

Beginners are bet­ter off us­ing pic­o­Gym, HackTheBox, and other lab en­vi­ron­ments where the point is ac­tu­ally learn­ing in­stead of pre­tend­ing the pub­lic score­board still re­flects hu­man growth.

CTF is­n’t dead”

I have seen some hopium posts about how CTF is not dead, it is just aug­mented by AI. They of­ten point at CTFs like DEF CON to ar­gue that AI still can­not solve every­thing. That is true, but it is the wrong de­fence.

The hard­est top-tier fi­nals have very few par­tic­i­pants, and they are usu­ally gated be­hind qual­i­fiers that are eas­ier than the fi­nals them­selves. If those qual­i­fiers fall to agents, fewer gen­uinely qual­i­fied peo­ple reach the chal­lenges that still re­sist AI. A tiny num­ber of elite fi­nals does not save the open on­line for­mat that most peo­ple ac­tu­ally play.

The claim is not that every chal­lenge is solved. The claim is that enough of the score­board has been au­to­mated that the score­board no longer means what it used to mean.

The AI is use­ful for se­cu­rity re­search” take

CTFs were never meant to be se­cu­rity re­search. They can show­case new and in­ter­est­ing tech­niques, but the CTF it­self is not the point of dis­cov­ery. Just be­cause AI is use­ful within a field does not mean it be­longs in the com­pet­i­tive land­scape of that field.

In CTFs, un­re­stricted AI re­moves the hu­man from the puz­zle al­most en­tirely and re­duces the art of se­cu­rity to a prompt. Sure, LLMs will keep get­ting bet­ter at se­cu­rity as long as CTFs are around, but that does not mean the com­pet­i­tive for­mat is healthy. CTFs were an art­form, a way to share tech­niques with nerds, and a way to push the hu­man bounds of se­cu­rity skill. That pur­pose is be­ing stripped away.

The LLMs are chess en­gines for cy­ber” take

Chess has been dom­i­nated by com­put­ers for well over a decade. People use chess en­gines as an anal­ogy for LLMs in CTFs, but they miss the point: chess en­gines are not al­lowed dur­ing com­pet­i­tive play. They are used for analy­sis, train­ing, com­men­tary, and prac­tice. They en­rich the game around the com­pe­ti­tion with­out re­plac­ing the per­son com­pet­ing.

Imagine giv­ing every com­pet­i­tive chess player the best chess en­gine and let­ting them use it freely dur­ing matches. Would that be con­sid­ered fair? Would it be fun to watch? Would it jus­tify prize pools? Would it push the hu­man lim­its of what could be achieved in chess? The same ques­tions ap­ply to CTFs.

Organisers can’t fight back

CTF or­gan­is­ers have tried tech­niques to break or de­ter LLM so­lu­tions, but they are tem­po­rary fric­tion at best. Claude Code does not mean­ing­fully care about old re­fusal-string tricks any­more. Frontier mod­els are get­ting bet­ter at notic­ing prompt in­jec­tions. Web search ca­pa­bil­i­ties weaken chal­lenges based on tech­nolo­gies re­leased af­ter the train­ing cut­off. Rules that ask peo­ple not to use LLMs are ig­nored and al­most im­pos­si­ble to en­force in open on­line events.

That leaves or­gan­is­ers in a bad po­si­tion. If they make nor­mal chal­lenges, agents solve too much. If they make chal­lenges de­lib­er­ately hos­tile to agents, the chal­lenges of­ten be­come guessy, ov­erengi­neered, or un­pleas­ant for hu­mans too. That is not a real fix. It just makes CTFs worse for every­one.

just adapt bro”

This take is in­fu­ri­at­ing. People I have al­ways looked up to in the com­mu­nity have said it. To me, it is com­pletely non­sen­si­cal un­less you ex­plain what we are adapt­ing into.

If adap­ta­tion means build­ing bet­ter tool­ing, CTF play­ers al­ready did that. If adap­ta­tion means writ­ing harder chal­lenges, or­gan­is­ers al­ready tried. If adap­ta­tion means ac­cept­ing that the score­board is now an AI or­ches­tra­tion bench­mark, then we should say that hon­estly in­stead of pre­tend­ing the old com­pe­ti­tion still ex­ists.

Even if or­gan­is­ers cre­ate guessier or more ov­erengi­neered chal­lenges that cur­rent LLMs can­not solve, there are no good paths for play­ers to learn the re­quired skills while stay­ing com­pet­i­tive. A few mod­els from now, that point may be ir­rel­e­vant any­way. The tra­jec­tory of LLM se­cu­rity ca­pa­bil­ity is mov­ing too quickly for chal­lenge de­sign to stay ahead for long.

The af­ter­math

The scene that grew my love for CTFs is emp­ty­ing out. The CTFTime leader­board has al­most no sem­blance of his­tory or hu­man skill any­more. The 2026 score­board is un­recog­nis­able com­pared to every year be­fore it. TheHackersCrew, along­side many other large and rep­utable teams, ei­ther do not play, play with far fewer peo­ple, or strug­gle to cut into the top 10. Unregulated cheat­ing is through the roof. Some of the best CTFs, like Plaid CTF, are not run­ning any­more.

These sen­ti­ments are not only mine. Many mem­bers of my lo­cal team, Emu Exploit, feel sim­i­larly. These are peo­ple who con­sis­tently at­tend the International Cybersecurity Championship, per­form at the top level in bug bounty pro­grammes, com­pete in Pwn2Own, and pre­sent at con­fer­ences in­clud­ing Black Hat. The peo­ple los­ing in­ter­est are not ca­sual ob­servers. They are ex­actly the kind of peo­ple the scene used to pro­duce and re­tain.

The fun of CTFing is gone for many of the peo­ple who cared most. The loss is not just a score­board. It is the lad­der from be­gin­ner cu­rios­ity to elite com­pe­ti­tion. It is the craft of chal­lenge de­sign. It is the feel­ing that a clever hu­man solved some­thing dif­fi­cult be­cause they un­der­stood it deeply.

That legacy is not be­ing car­ried for­ward by open on­line CTFs in their cur­rent form. The for­mat is dead. Something else may re­place it, but pre­tend­ing noth­ing fun­da­men­tal has changed only makes the loss harder to talk about hon­estly. It also gives AI shills more room to cap­i­talise on the de­cline by sell­ing mediocre wrap­pers back to the com­mu­nity that made the train­ing data valu­able in the first place.

What now?

While a lot of what’s hap­pen­ing in the CTF/AI space is su­per com­mer­cialised and out of our con­trol, CTF has had a hugely pos­i­tive im­pact on the in­dus­try. I have met so many kind, smart, and pas­sion­ate peo­ple through CTFs. I have played some of the most beau­ti­fully crafted chal­lenges and found some of the most in­trigu­ing un­in­tended so­lu­tions.

The com­mu­nity around CTFing has been an amaz­ing place to learn, grow, and con­nect. That’s some­thing we should­n’t lose, no mat­ter where the com­pe­ti­tion goes. As a com­mu­nity, we should strive to stay to­gether and build new av­enues to stay pas­sion­ate and keep learn­ing. Security-adjacent so­cial events like SecTalks, stu­dent con­fer­ences, and lo­cal mee­tups are great ways to stay con­nected and stay in­volved. Learning plat­forms and the com­mu­ni­ties they pro­vide through plat­forms like Discord are also a valu­able re­source.

While it may be a strug­gle to find an al­ter­na­tive to what we had, the amaz­ing com­mu­nity we have built around it is more im­por­tant now more than ever as we find new ways to keep the com­pet­i­tive spirit alive.

Six SQL patterns I use to catch transaction fraud

analytics.fixelsmith.com

Quick dis­claimer: I do data work on a pro­gram-in­tegrity team. Examples be­low use generic trans­ac­tion ta­bles and made-up sce­nar­ios. Nothing here comes from any­thing I’ve ac­tu­ally worked on or seen. Views are mine, not my em­ploy­er’s.

Fraud de­tec­tion in trans­ac­tion data is mostly SQL. Not ma­chine learn­ing, not graph data­bases, not what­ever Gartner is hyp­ing this year. SQL, run against the right ta­bles, with the right joins, look­ing for the right shapes.

I work mostly with gov­ern­ment-funded ben­e­fit pro­grams, but the pat­terns be­low port over to any­thing with a trans­ac­tions table: credit cards, health­care claims, e-com­merce, point-of-sale. If money moves and gets logged, these queries will find weird things in the log.

Six pat­terns. Roughly in the or­der I’d build them out on a new dataset.

1. Velocity

The sim­plest one. Someone with a stolen card wants to drain it be­fore the holder no­tices. So they hit the card fast.

SELECT card­hold­er_id, date_trunc(‘hour’, time­stamp) AS hour_bucket, count(*) AS tx_­count, min(time­stamp) AS first_tx, max(time­stamp) AS last_tx FROM trans­ac­tions WHERE time­stamp >= cur­ren­t_­date - INTERVAL 30 days’ GROUP BY 1, 2 HAVING count(*) > 10;

Tune two knobs: the win­dow size and the count thresh­old. I usu­ally run a 1-minute, 5-minute, and 1-hour ver­sion in par­al­lel and com­pare. Different fraud shows up at dif­fer­ent scales — a card-test­ing ring hits a server in sec­onds; a ben­e­fits-traf­fick­ing ring might take an af­ter­noon.

A few card­hold­ers will le­git­i­mately blow past the thresh­old. Route op­er­a­tors ser­vic­ing vend­ing ma­chines. People re­load­ing pre­paid cards in bulk. Your false pos­i­tives. Worth keep­ing a whitelist af­ter the first pass.

For slid­ing-win­dow ve­loc­ity, this is the form I use:

SELECT card­hold­er_id, time­stamp, count(*) OVER ( PARTITION BY card­hold­er_id ORDER BY time­stamp RANGE BETWEEN INTERVAL 5 min­utes’ PRECEDING AND CURRENT ROW ) AS tx_in­_last_5min FROM trans­ac­tions QUALIFY tx_in­_last_5min >= 5 ORDER BY card­hold­er_id, time­stamp;

QUALIFY works in Snowflake, BigQuery, Databricks, Teradata. For Postgres you wrap the whole thing in a CTE and fil­ter on the out­side. Slight pain, same re­sult.

2. Impossible travel

If a card swipes in Chicago and seven min­utes later swipes in Los Angeles, one of those swipes is fake. The card is cloned. This is the most un­con­tro­ver­sial fraud sig­nal you’ll find — there’s al­most no le­git­i­mate rea­son a sin­gle card is in two dis­tant places in seven min­utes.

WITH or­dered_tx AS ( SELECT card­hold­er_id, time­stamp, lo­ca­tion, LAG(timestamp) OVER (PARTITION BY card­hold­er_id ORDER BY time­stamp) AS pre­v_ts, LAG(location) OVER (PARTITION BY card­hold­er_id ORDER BY time­stamp) AS pre­v_loc FROM trans­ac­tions ) SELECT card­hold­er_id, pre­v_ts AS first_tx, time­stamp AS sec­ond_tx, pre­v_loc AS first_lo­ca­tion, lo­ca­tion AS sec­ond_lo­ca­tion, EXTRACT(EPOCH FROM (timestamp - pre­v_ts)) / 60 AS min­utes_a­part, haver­sine(pre­v_loc, lo­ca­tion) AS miles_a­part FROM or­dered_tx WHERE pre­v_ts IS NOT NULL AND pre­v_loc <> lo­ca­tion AND haver­sine(pre­v_loc, lo­ca­tion) / nul­lif(EX­TRACT(EPOCH FROM (timestamp - pre­v_ts)), 0) * 3600 > 600;

haver­sine is the great-cir­cle dis­tance func­tion. Most ware­houses ship one. If yours does­n’t, it’s about ten lines to write your own.

The 600 mph thresh­old is rough — com­mer­cial jet cruise is around 575, so this is faster than a plane could pos­si­bly do it.” You can tighten it to 100 mph if you want to catch sus­pi­ciously-fast ground travel too, but at that thresh­old you start pick­ing up real air­line trav­el­ers, kids with par­ents dri­ving them home from camp, that kind of thing.

A few other shapes in the same fam­ily are worth run­ning:

Two dis­tant cities, same state, in­side 5 min­utes. Local cloning rings.

Multiple ZIP codes in­side an hour. Skimmer rings work­ing a re­gion.

Border cross­ings in­side 10 min­utes. International rings.

3. Amount anom­alies

There are a cou­ple of amounts that show up dis­pro­por­tion­ately in fraud and al­most never in nor­mal use.

SELECT card­hold­er_id, time­stamp, amount, mer­chan­t_id FROM trans­ac­tions WHERE (amount >= 99.50 AND amount < 100.00) OR (amount >= 499.50 AND amount < 500.00) OR amount IN (1.00, 5.00, 10.00) ORDER BY card­hold­er_id, time­stamp;

What’s hap­pen­ing:

Round dol­lar amounts at small val­ues — $1.00, $5.00, $10.00 — are al­most al­ways card tests. Someone got a card num­ber from a dump and they’re check­ing if it works be­fore re­selling it. Real card­hold­ers al­most never buy some­thing for ex­actly $1.00. Coffee is $4.73, gas is $52.81. The round­ness is the sig­nal.

Amounts just be­low a thresh­old are dif­fer­ent. $99.99 is in­ter­est­ing be­cause at a lot of places, $100 is the line where the cashier is sup­posed to check ID. $499.99 is in­ter­est­ing be­cause $500 is of­ten a daily ATM cap. Whoever’s do­ing the trans­ac­tion knows the rules and is stay­ing un­der them.

(For ben­e­fits trans­ac­tions specif­i­cally, the round-num­ber pat­tern does­n’t help much. Benefits don’t get card-tested the same way. There the sig­nal is usu­ally du­pli­cate re­cip­i­ents, which is a dif­fer­ent post.)

4. Suspicious mer­chants

When a skim­mer com­pro­mises a card reader at, say, a gas pump, you don’t get one fraud case. You get dozens. Every card swiped at that pump for the next few weeks is now in some­one’s data­base. So the symp­tom from the mer­chant side is: an un­usual num­ber of un­re­lated cards spend­ing more than usual, in a short win­dow.

SELECT mer­chan­t_id, date_trunc(‘hour’, time­stamp) AS hour_bucket, count(DIS­TINCT card­hold­er_id) AS unique_­cards, count(*) AS to­tal_tx, sum(amount) AS to­tal_amount FROM trans­ac­tions WHERE time­stamp >= cur­ren­t_­date - INTERVAL 7 days’ GROUP BY 1, 2 HAVING count(DIS­TINCT card­hold­er_id) > 20 AND sum(amount) > 5000 ORDER BY to­tal_amount DESC;

The prob­lem with sta­tic thresh­olds (20 unique cards, $5000) is they don’t ac­count for size. A Costco does that in 90 sec­onds. A used book­shop, never. So the bet­ter ver­sion com­pares each mer­chant against it­self:

WITH mer­chan­t_hourly AS ( SELECT mer­chan­t_id, date_trunc(‘hour’, time­stamp) AS hour_bucket, count(DIS­TINCT card­hold­er_id) AS unique_­cards FROM trans­ac­tions WHERE time­stamp >= cur­ren­t_­date - INTERVAL 60 days’ GROUP BY 1, 2 ), with­_base­line AS ( SELECT *, avg(unique_­cards) OVER ( PARTITION BY mer­chan­t_id ORDER BY hour_bucket ROWS BETWEEN 168 PRECEDING AND 1 PRECEDING ) AS rolling_avg_­cards FROM mer­chan­t_hourly ) SELECT *, unique_­cards / nul­lif(rolling_avg_­cards, 0) AS spike_ra­tio FROM with­_base­line WHERE unique_­cards > rolling_avg_­cards * 3 ORDER BY spike_ra­tio DESC;

The 168 is the trail­ing seven days of hourly buck­ets. I use a week be­cause daily and weekly sea­son­al­ity mat­ters — Tuesday 2pm at a cof­fee shop is not the same base­line as Saturday 9am at the same shop. A week catches both cy­cles.

Three times nor­mal is where I start. It’s loose enough not to drown you in alerts but tight enough to flag the ac­tu­ally weird hours.

5. Off-hours

Most peo­ple are crea­tures of habit when they spend money. A nine-to-fiver does­n’t sud­denly start buy­ing gas at 3am. If their card does, it’s ei­ther be­ing used by some­one else or they’re trav­el­ing — and travel pro­duces other sig­nals you can check.

WITH card­hold­er_hour_­pat­tern AS ( SELECT card­hold­er_id, EXTRACT(HOUR FROM time­stamp) AS hour_of_­day, count(*) AS tx_­count FROM trans­ac­tions WHERE time­stamp >= cur­ren­t_­date - INTERVAL 90 days’ GROUP BY 1, 2 ), card­hold­er_nor­mal AS ( SELECT card­hold­er_id, min(hour_of_­day) FILTER (WHERE tx_­count >= 2) AS ear­li­est_hour, max(hour_of_­day) FILTER (WHERE tx_­count >= 2) AS lat­est_hour FROM card­hold­er_hour_­pat­tern GROUP BY 1 ) SELECT t.card­hold­er_id, t.time­stamp, t.amount, t.mer­chan­t_id FROM trans­ac­tions t JOIN card­hold­er_nor­mal cn USING (cardholder_id) WHERE EXTRACT(HOUR FROM t.time­stamp) NOT BETWEEN cn.ear­li­est_hour AND cn.lat­est_hour ORDER BY t.time­stamp DESC;

The two or more in that hour” fil­ter on the in­ner query is do­ing im­por­tant work. Without it, one stray late-night gas sta­tion pur­chase three months ago be­comes part of the card­hold­er’s normal” hours, and you never flag them again. Requiring at least two pur­chases in a given hour, in 90 days, sets the bar at actually a habit” in­stead of happened once.”

Drawback: this does­n’t work un­til you have his­tory. New ac­counts have no base­line. For those, you fall back to global hour pat­terns or just skip this pat­tern en­tirely un­til they’ve been around for a cou­ple months.

6. Window func­tions for chained sig­nals

This one is­n’t re­ally a pat­tern. It’s a setup that makes the other five pat­terns com­pos­able.

SELECT card­hold­er_id, time­stamp, amount, mer­chan­t_id,

time­stamp - LAG(timestamp) OVER w AS time_s­ince_last,

CASE WHEN mer­chan­t_id <> LAG(merchant_id) OVER w THEN changed’ ELSE same’ END AS mer­chan­t_change,

sum(amount) OVER ( PARTITION BY card­hold­er_id ORDER BY time­stamp RANGE BETWEEN INTERVAL 24 hours’ PRECEDING AND CURRENT ROW ) AS run­ning_24h_­to­tal,

ROW_NUMBER() OVER ( PARTITION BY card­hold­er_id, date(time­stamp) ORDER BY time­stamp ) AS tx_of_­day

FROM trans­ac­tions WINDOW w AS (PARTITION BY card­hold­er_id ORDER BY time­stamp) ORDER BY card­hold­er_id, time­stamp;

Once you’ve ma­te­ri­al­ized those columns, fraud rules col­lapse to fil­ter ex­pres­sions. Say you’re hunt­ing card-test­ing rings, where the tell is lots of small charges, all at dif­fer­ent mer­chants, within min­utes of each other.” The rule be­comes:

SELECT * FROM tx_with­_win­dows WHERE tx_of_­day >= 5 AND time_s­ince_last < INTERVAL 60 sec­onds’ AND mer­chan­t_change = changed’;

Three fil­ters. That’s it.

The rea­son this mat­ters is that the mo­ment your an­a­lysts can ex­press new fraud hy­pothe­ses as SQL fil­ters in­stead of en­gi­neer­ing tick­ets, your it­er­a­tion loop drops from weeks to hours. You catch more fraud, faster.

Putting it to­gether

None of these alone is enough. Velocity has false pos­i­tives (vending op­er­a­tors). Geographic im­pos­si­bil­ity misses any­thing in­side a sin­gle metro. Amount anom­alies don’t ap­ply out­side of card-test con­texts. The off-hours rule needs his­tory.

What works is run­ning them all and scor­ing each trans­ac­tion across the sig­nals. A trans­ac­tion that fails on three or four of them is al­most al­ways fraud. A trans­ac­tion that fails on one might be your grandma be­ing weird with her debit card on va­ca­tion.

If you’re brand new to fraud de­tec­tion, start with pat­tern 1. It alone will sur­face a use­ful amount of fraud and very lit­tle le­git­i­mate ac­tiv­ity, and it’s cheap to run.

If you’ve al­ready got 1 through 5, the place to in­vest is pat­tern 6 — those win­dow-func­tion prim­i­tives. Every an­a­lyst on your team will use them once they ex­ist, and adding the next fraud pat­tern stops be­ing a pro­ject.

Things I left out

A few things this post does­n’t cover that come up con­stantly:

NULL han­dling. Real trans­ac­tion ta­bles don’t use NULL the way in­tro SQL books do. A lot of legacy sys­tems use sen­tinel val­ues like 9999 – 12-31 for no end date” or 0001 – 01-01 for no start date.” Filtering with IS NULL will silently miss those rows. Always check what the con­ven­tion is in your spe­cific table be­fore writ­ing WHERE clauses that as­sume NULL.

False pos­i­tives. Every rule above will flag real card­hold­ers do­ing weird-but-le­git­i­mate things. Your fraud work­flow needs hu­man re­view of flagged cases, with a feed­back loop that lets you tune thresh­olds based on what’s ac­tu­ally fraud and what is­n’t. Auto-blocking on a sin­gle rule is how you lose cus­tomers.

Privacy. If the data has PII, your queries need to com­ply with your ap­plic­a­ble data-use poli­cies. De-identified or sam­pled data first, pro­duc­tion data with au­tho­riza­tion sec­ond.

Cost. Window func­tions with big par­ti­tions are not cheap. Filter your date range first, then ap­ply the win­dow, not the other way around. I’ve watched a ju­nior an­a­lyst burn through a ware­house credit bud­get by run­ning a LAG() across two years of trans­ac­tions on the en­tire dataset be­fore adding the WHERE.

Things I want to write about next, de­pend­ing on what peo­ple ask for:

Eight win­dow-func­tion tricks be­yond LAG and ROW_NUMBER

Detecting fraud rings, which is the so­cial-graph prob­lem in dis­guise

What goes on a fraud team’s dash­board, and what does­n’t

Why your fraud alerts are noisy, and how to ac­tu­ally fix that in­stead of just rais­ing thresh­olds

If there’s some­thing spe­cific you want cov­ered, mes­sage through fix­el­smith.com.

Fixel Smith is an ex­pe­ri­enced Program Integrity Analyst work­ing in pub­lic-sec­tor data.

Efficient Minute-Scale World Modeling

nvlabs.github.io

You don’t know HTML Lists

blog.frankmtaylor.com

Reading Time: 13 min­utes

This sec­ond in­stall­ment in the You don’t know HTML se­ries is go­ing to be all about the ways that we put col­lec­tions of things to­gether. We’re skip­ping over the MDN and W3Schools in­tro­duc­tory pages and in­stead we’re go­ing into the kind of stuff you dis­cover af­ter ac­ci­den­tally tak­ing your cous­in’s Ritalin right be­fore you open up the W3C specs. Let’s dive deep into lists.

This is­n’t an in­tro­duc­tion!

I’m as­sum­ing you’ve got real-world ex­pe­ri­ence writ­ing HTML and this is­n’t your first time search­ing How to make a list.” What I’m go­ing to cover are all of the ways you can put col­lec­tions of con­tent to­gether. So I’m talk­ing about these kinds of lists:

Ordered

Unordered

Description

Menu

Control

And if you did­n’t know there were five dif­fer­ent kinds of lists in HTML, per­fect. That must mean you don’t know HTML!

How Do we Decide Which to Use?

No need to ask AI for a sum­mary; I’ll just give you the end­ing up front. Here’s how you’ll de­cide which kind of list to use:

If the items in the list are for a sin­gle con­trol field where you’re get­ting data from a user, you ei­ther want a <select> + <option> mashup or an <input> + <datalist> combo

If chang­ing the or­der of the items would change the mean­ing of the list, then use an or­dered list (<ol>)

If the items are key-value pairs, or keys-to-value pairs, use a de­scrip­tion list (<dl>)

If the items are con­trols that will per­form ac­tions in the user in­ter­face, use a menu (<menu>)

Use an un­ordered list (<ul>)

Control Lists with <select> and<op­tion> or <input> and <datalist>

When we think of lists, we don’t usu­ally throw user con­trol fields into the mix. And that’s weird, be­cause we con­struct our nav­i­ga­tions us­ing lists, and those are lists of links that the user…uh…can con­trol. So we tend to have a bias with what we think lists are.

But I’m here to bring that to the fore­front of your mind: when we’re build­ing forms, some­times we’re build­ing lists that our users will in­ter­act with.

If it’s a fixed list, use <select> and <option>

When I say fixed”, I mean that the user can only choose the items from that list. If that’s the case, let’s use se­lect and op­tion

Suppose we want a list of lan­guages to talk in:

<select name=“lan­guages”> <option value=“”>Se­lect a Language</option> <option value=“en”>Eng­lish</​op­tion> <option value=“fr”>French</​op­tion> <option value=“es”>Span­ish</​op­tion> <option value=“pt”>Por­tuguese</​op­tion> </select>

This gives the user ex­actly one choice to make.

But if the user were also mul­ti­lin­gual, maybe they’d like to choose more than one. Easy enough with the mul­ti­ple at­tribute! The list will dis­play dif­fer­ently. Now all the op­tions will be vis­i­ble so we can shift or cmd + click the ones we want:

<select name=“lan­guages” mul­ti­ple> <option value=“”>Se­lect a Language</option> <option value=“en”>Eng­lish</​op­tion> <option value=“fr”>French</​op­tion> <option value=“es”>Span­ish</​op­tion> <option value=“pt”>Por­tuguese</​op­tion> <option value=“en”>Irish</​op­tion> <option value=“cy”>Welsh</​op­tion> </select>

So long as you’re do­ing this with an ac­tual se­lect el­e­ment and an op­tion, you don’t have to use the aria-mul­ti­s­e­lec­table at­tribute on a list el­e­ment with the role=“list­box” at­tribute. Native browser se­man­tics bakes that in for you.

Put re­lated op­tions to­gether with <optgroup>

What if we wanted to group lan­guages by lan­guage-fam­i­lies? We can do that with opt­group which lets us group a list of op­tions to­gether:

<select name=“lan­guages”> <optgroup la­bel=“Ger­manic”> <option value=“en”>Eng­lish</​op­tion> </optgroup> <optgroup la­bel=“Ro­mance”> <option value=“fr”>French</​op­tion> <option value=“es”>Span­ish</​op­tion> <option value=“pt”>Por­tuguese</​op­tion> </optgroup> <optgroup la­bel=“Celtic”> <option value=“en”>Irish</​op­tion> <option value=“cy”>Welsh</​op­tion> </optgroup> </select>

What if there’s a bunch of op­tions, but for [reasons] we don’t want a user to be able to se­lect a sub­set of them? Let’s add the dis­abled at­tribute to an opt­group:

<select name=“lan­guages”> <optgroup la­bel=“Ger­manic”> <option value=“en”>Eng­lish</​op­tion> </optgroup> <optgroup la­bel=“Ro­mance”> <option value=“fr”>French</​op­tion> <option value=“es”>Span­ish</​op­tion> <option value=“pt”>Por­tuguese</​op­tion> </optgroup> <optgroup la­bel=“Celtic” dis­abled> <option value=“en”>Irish</​op­tion> <option value=“cy”>Welsh</​op­tion> </optgroup> </select>

Use na­tive HTML op­tions first for im­prov­ing the list

Sometimes we may want a vi­sual break be­tween your groups. If we don’t want to fid­dle with CSS, we’re in luck! An <hr> is an ap­proved item in a se­lect. Not only does that make our se­lect look a lit­tle sharper, we can also use the size at­tribute to con­trol how many items will be dis­played at once — mak­ing this use­ful for es­pe­cially long lists.

We just gotta watch out with size if we’re also us­ing opt­group be­cause those group la­bels will take up some of that space we were prob­a­bly hop­ing for:

<select name=“lan­guages” size=“4″ mul­ti­ple> <optgroup la­bel=“Ger­manic”> <option value=“en”>Eng­lish</​op­tion> </optgroup> <hr /> <optgroup la­bel=“Ro­mance”> <option value=“fr”>French</​op­tion> <option value=“es”>Span­ish</​op­tion> <option value=“pt”>Por­tuguese</​op­tion> </optgroup> <hr /> <optgroup la­bel=“Celtic”> <option value=“en”>Irish</​op­tion> <option value=“cy”>Welsh</​op­tion> </optgroup> <hr /> <optgroup la­bel=“Afroasi­atic”> <option value=“he”>He­brew</​op­tion> <option value=“ar”>Ara­bic</​op­tion> </optgroup> </select>

If it’s a sug­gested list, use <datalist>

Let’s sup­pose we have a con­trol where we want to sug­gest a list op­tions to a user. This is where we get the datal­ist in­volved.

Using a datal­ist is a two-step process be­cause we have to tell the in­put to use a datal­ist.

Create a datal­ist and give it an id.

Put the value of that id in the list at­tribute of a cor­re­spond­ing in­put

<datalist id=“lan­guages”> <option>English</option> <option>French</option> <option>Spanish</option> <option>Portuguese</option> <option>Irish</option> <option>Welsh</option> <option>Hebrew</option> <option>Arabic</option> </datalist>

<input name=“lan­guage” list=“lan­guages”>

English French Spanish Portuguese Irish Welsh Hebrew Arabic

We need to watch out for us­ing a value at­tribute on the <option> of a <datalist>!

This is­n’t a datal­ist prob­lem but an op­tion prob­lem: The de­fault value for an op­tion is the text it wraps. A value at­tribute over­rides that and then the text acts like a la­bel. This is no big deal for a se­lect list be­cause the user only sees the text.

But if we put a value on an <option> in a datal­ist the user will see the label” in the list, but when they se­lect it they’ll see the value in the in­put. It’s a con­fus­ing ex­pe­ri­ence.

Start typ­ing w in this in­put and then se­lect Welsh” to see what I mean:

<datalist id=“lan­guages”> <option value=“en”>Eng­lish</​op­tion> <option value=“fr”>French</​op­tion> <option value=“es”>Span­ish</​op­tion> <option value=“pt”>Por­tuguese</​op­tion> <option value=“en”>Irish</​op­tion> <option value=“cy”>Welsh</​op­tion> <option value=“he”>He­brew</​op­tion> <option value=“ar”>Ara­bic</​op­tion> </datalist>

<input name=“lan­guage” list=“lan­guages”>

English French Spanish Portuguese Irish Welsh Hebrew Arabic

So if we’re go­ing to use a datal­ist, we need to work with the un­der­stand­ing that the value is what gets in­serted — not the la­bel.

We can use a datal­ist for any kind of in­put

We tend to think of the datal­ist as be­ing use­ful for text op­tions. But that ain’t how it has to work.

Suppose we had a cal­en­dar wid­get and we wanted to gen­tly sug­gest a par­tic­u­lar range of weeks in the year. We could do that with a datal­ist:

<label for=“camp-week”>Choose a week</​la­bel>

<input type=“week” name=“week” id=“camp-week” min=“2026-W2″ max=“2026-W51” list=“pre­ferred-weeks” />

<datalist id=“pre­ferred-weeks”> <option>2026-W22</option> <option>2026-W23</option> <option>2026-W24</option> <option>2026-W25</option> </datalist>

Choose a week:

2026-W22 2026-W23 2026-W24 2026-W25

<datalist> and <input type=“range”> can work to­gether

<datalist> is­n’t lim­ited to stringy val­ues; it works with num­bers. Which means we could pair it with a range in­put and cre­ate la­beled stops along the range.

The only thing we have to watch out for in this ap­proach is that not all browsers are guar­an­teed to work the same way. In Chrome and friends, we could dis­play these stops with very pro­gram­matic and sim­ple CSS. In Firefox…shenanigans are in­volved. But it starts with the big idea that you can dis­play a datal­ist:

<div class=“range­Field”> <label for=“tips”>Tip Percentage</label>

<input type=“range” name=“tips” id=“tips” min=“0″ max=“50” step=“1″ list=“rec­om­mended-tips” />

<datalist id=“rec­om­mended-tips”> <option value=“10″ la­bel=“10%“></​op­tion> <option value=“18” la­bel=“18%“></​op­tion> <option value=“30” la­bel=“30%“></​op­tion> <option value=“45” la­bel=“45%“></​op­tion> </datalist>

<style> .rangeField {

/* con­tainer for the two things ch is the width of the 0 in com­puted font. Very pre­cise for num­bers */ width: 50ch; }

/*same width for in­put and datal­ist*/ #recommended-tips, #tips { width: 100%; mar­gin: 0; padding: 0; }

#recommended-tips { po­si­tion: rel­a­tive; dis­play: block; writ­ing-mode: ver­ti­cal-lr; }

</style> </div>

Tip Percentage

Our pro­gram­matic styles which will work in Chrome and friends will in­volve us­ing the attr() func­tion, cast­ing it to a per­cent, and some math.

@supports (x: attr(x type(per­cent­age))) { /* For browsers that let you set a type on an attr() 1. get value from the la­bel with attr() 2. use the type() func­tion to de­clare the value as per­cent 3. make it ab­solute 4. max of in­put is 50, not 100. Set left to be clo­seish to left x 2, and sub­tract based on the char­ac­ter width */ /* set datal­ist to dis­play, and be a po­si­tion­ing root add a ver­ti­cal writ­ing mode */ #recommended-tips op­tion{ –percent: attr(la­bel type(<per­cent­age>)); po­si­tion: ab­solute; left: calc((var(–per­cent) * 1.9) - .1ch); } }

For this to work in Firefox, we have to go in a dif­fer­ent and more an­noy­ing di­rec­tion. We will need to man­u­ally set these as sep­a­rate rule­sets. And we will tar­get a pseudo-el­e­ment in­stead. And our math gets weirder. This is not guar­an­teed to dis­play well on your screen:

@supports not (x: attr(x type(per­cent­age))) { /* In fire­fox, the val­ues dis­play as a ::before Also, ex­plic­itly set the height of the op­tion, oth­er­wise it will be too big so set the ::before to po­si­tion ab­solute Also, don’t set length with per­cent as it’s wildly off in­stead, use the same unit set on the con­tainer (ch) */

#recommended-tips op­tion { height: 1ch; mar­gin:0; padding:0; } #recommended-tips op­tion::be­fore { po­si­tion: ab­solute; top: .5ex; }

#recommended-tips op­tion[value=“10”]::be­fore { left: calc(5ch + 2ex) }

#recommended-tips op­tion[value=“18”]::be­fore { left: calc(9ch + 2.5ex ); }

#recommended-tips op­tion[value=“30”]::be­fore { left: calc(15ch + 4ex); }

#recommended-tips op­tion[value=“45”]::be­fore { left: calc(22.5ch + 6.5ex); } }

Ordered Lists with <ol>

Any time we have a col­lec­tion of items that must be read in a par­tic­u­lar or­der, we should use an or­dered list. We should not let vi­sual pre­sen­ta­tion dic­tate this choice. It’s not about whether the items should have num­bers next to them. It’s about whether their se­quence mat­ters.

These are the kinds of col­lec­tions that should be an or­dered list:

An al­go­rithm

Series of events

Items that have an in­cre­men­tal con­tin­uum

A recipe (which is a se­ries of events and also an al­go­rithm)

An al­pha­betic list (which is let­ters arranged along their con­tin­uum)

And the rea­son these should be an or­dered list is be­cause chang­ing the or­der of the items would change the mean­ing of the list.

Our bread will bake dif­fer­ently if it’s not in an or­dered list!

<ol> <li>Pre-heat oven to 350 de­grees and grease a 9x5 pan.</​li> <li>Combine flour, bak­ing soda, and salt in large bowl with beaten brown sugar, but­ter, eggs, and mashed ba­nanas</​li> <li>If oven is pre-heated, pour bat­ter into pan</​li> <li>Bake for 60 min­utes or un­til a tooth­pick in­serted into the cen­ter comes out clean.</​li> <li>Let cool on a wire rack</​li> </ol>

Pre-heat oven to 350 de­grees and grease a 9×5 pan.

Combine flour, bak­ing soda, and salt in large bowl with beaten brown sugar, but­ter, eggs, and mashed ba­nanas

If oven is pre-heated, pour bat­ter into pan

Bake for 60 min­utes or un­til a tooth­pick in­serted into the cen­ter comes out clean.

Let cool on a wire rack

And if we say some­thing is al­pha­bet­i­cal, it’d be weird to sug­gest it could be or­dered dif­fer­ently!

<h3>Ingredients (alphabetical)</h3> <ol> <li>baking soda (1 tea­spoon)</​li> <li>bananas (2) (mashed)</li> <li>brown sugar (¾ cup)</​li> <li>butter (½ cup)</​li> <li>eggs (2)</li> <li>flour (2 cups)</​li> <li>salt (¼ tea­spoon)</​li> </ol>

Fecal transplants for autism deliver success in clinical trials

refractor.io

Scientific re­search con­tin­ues to un­cover in­ter­est­ing con­nec­tions be­tween the gut mi­cro­biome and hu­man health, in­clud­ing every­thing from de­pres­sion to PTSD to au­toim­mune dis­ease. Another ex­am­ple of this are emerg­ing ties be­tween gut health and autism. Exciting new re­search, now mov­ing to Phase 3 hu­man tri­als, has found boost­ing mi­cro­bial di­ver­sity via fe­cal trans­plants can dra­mat­i­cally re­duce autism symp­toms in the long term.

Editor’s note: Readers of­ten ask us for fol­low-ups on mem­o­rable sto­ries. What has hap­pened to this story over the years? This ar­ti­cle was orig­i­nally pub­lished in 2019 but it has been re-edited and up­dated with new in­for­ma­tion cur­rent as of April 7, 2025. Enjoy!

One in every 59 chil­dren born in the US is di­ag­nosed with autism, ac­cord­ing to the Centers for Disease Control and Prevention, and un­for­tu­nately for many of them, chronic gas­troin­testi­nal is­sues are a harsh re­al­ity of their con­di­tion. According to sci­en­tists at Arizona State University (ASU), who con­ducted the cur­rent study, around 30 to 50% of peo­ple with autism ex­pe­ri­ence se­ri­ous gut prob­lems like con­sti­pa­tion, di­ar­rhea and stom­ach pain.

Many kids with autism have gas­troin­testi­nal prob­lems, and some stud­ies, in­clud­ing ours, have found that those chil­dren also have worse autism-re­lated symp­toms,” ASUs Rosa Krajmalnik-Brown said back in 2019 dur­ing the early stages of the work. In many cases, when you are able to treat those gas­troin­testi­nal prob­lems, their be­hav­ior im­proves.”

A key study in 2019 built on ear­lier re­search from 2017 that found in­tro­duc­ing new bac­te­ria via fe­cal trans­plants in 18 autis­tic chil­dren brought about marked im­prove­ments in their be­hav­ior, as mea­sured through ques­tion­naires as­sess­ing their so­cial skills, hy­per­ac­tiv­ity, com­mu­ni­ca­tion and other fac­tors.

These im­prove­ments held for eight weeks, an im­pres­sive out­come to be sure. But the Arizona State University re­searchers then set out to in­ves­ti­gate the en­dur­ing ef­fects of the treat­ment, which in­volved a bowel cleanse and daily trans­plants of fe­cal mi­cro­biota over a pe­riod of seven to eight weeks. Prior to the treat­ment, these chil­dren all had far lower di­ver­sity of gut mi­crobes than those with­out autism.

Kids with autism are lack­ing im­por­tant ben­e­fi­cial bac­te­ria, and have fewer op­tions in the bac­te­r­ial menu of im­por­tant func­tions that bac­te­ria pro­vide to the gut than typ­i­cally de­vel­op­ing kids,” Krajmalnik-Brown said in 2019.

Arizona State University

Two years af­ter the treat­ment, the re­searchers found that not only did the ben­e­fits per­sist, they seemed to im­prove. Doctors ob­ser­va­tions at the eight-week mark found that psy­cho­log­i­cal autism symp­toms of the pa­tients had de­creased by 24%. But two years later those symp­toms had al­most been cut in half, with a pro­fes­sional eval­u­a­tor find­ing a de­crease of 45% in autism symp­toms com­pared to base­line.

Prior to the study, 83% of par­tic­i­pants had severe” autism. Two years later, only 17% were rated as se­vere, 39% as mild or mod­er­ate, and in­cred­i­bly, 44% were be­low the cut-off for mild ASD.

We are find­ing a very strong con­nec­tion be­tween the mi­crobes that live in our in­testines and sig­nals that travel to the brain,” Krajmalnik-Brown said in 2019. Two years later, the chil­dren are do­ing even bet­ter, which is amaz­ing.”

The next steps were larger placebo-con­trolled clin­i­cal trial de­signed to ver­ify their re­sults, with a view to gain­ing FDA ap­proval for the ther­apy.

In early 2022 Krajmalnik-Brown and col­leagues patented a spe­cific bac­te­r­ial for­mu­la­tion and spun-off a com­mer­cial com­pany called Gut-Brain Axis Therapeutics. The treat­ment, dubbed Microbiota Transplant Therapy (MTT), moved through a Phase 2 hu­man placebo-con­trolled trial over the fol­low­ing years and the ini­tial data has been in­cred­i­bly promis­ing.

Our phase 2 study for adults with autism found that the treat­ment group im­proved more than placebo on the pri­mary out­come (autism symp­toms) and on a sec­ondary out­come (daily stool record),” the re­searchers ex­plain. Evaluation of symp­toms on the Parent Global Impressions found that the treat­ment group at the end of part 2 im­proved more than the placebo group in part 1 on nearly all symp­toms, with sta­tis­ti­cally sig­nif­i­cant im­prove­ments in GI, re­cep­tive lan­guage, and av­er­age of all symp­toms. There were also mar­gin­ally sig­nif­i­cant im­prove­ments in tantrums, stim­ming/​per­se­ver­a­tion, and cog­ni­tion.”

Now, the team is look­ing to raise funds to move through the large-scale Phase 3 tri­als nec­es­sary for fi­nal FDA ap­proval.

The team’s key 2019 study ap­pears in the jour­nal Scientific Reports, and you can hear from the re­searchers about their most re­cent find­ings in the video be­low.

Source: Arizona State University

Microbiota Transplant for Adults with Autism by Prof. James Adams

An ear­lier ver­sion of this ar­ti­cle writ­ten by Nick Lavars was pub­lished in 2019.

Editor’s note: A prior ver­sion of this ar­ti­cle used the term autism suf­fer­ers’ in ref­er­ence to peo­ple with autism. We un­der­stand that ter­mi­nol­ogy is in­ap­pro­pri­ate and can be deemed of­fen­sive. The ar­ti­cle has been edited to re­move the ref­er­ence.

Where to buy a non-Apple, non-Google smartphone

www.theregister.com

As both Apple and Google in­tro­duce un­wel­come changes in their phone OSes, here’s a quick re­minder that you do have al­ter­na­tives to the Gruesome Twosome.

The Keep Android Open cam­paign is gath­er­ing at­ten­tion and sup­port as the big red num­bers on its page count down. The good news is that you do al­ready have al­ter­na­tives, and The Register has been re­port­ing on them. But if you are not the sort of per­son who reads phone re­views, or write­ups of al­ter­na­tive phone OSes, and just wants to buy a new hand­set and re­tain con­trol of it and its con­tents, we thought it might be a good time to re­mind you of where to go and who to talk to.

At the time of writ­ing, the cam­paign says it’s 123 days un­til Google’s new mea­sures pre­vent­ing you from side-load­ing your own soft­ware will kick in. The cam­paign frames it in in­ten­tion­ally alarmist lan­guage:

REG AD

Your phone is about to stop be­ing yours.

123 days un­til lock­down

Starting September 2026, a silent up­date, non­con­sen­su­ally pushed by Google, will block every Android app whose de­vel­oper has­n’t reg­is­tered with Google, signed their con­tract, paid up, and handed over gov­ern­ment ID.

Every app and every de­vice, world­wide, with no opt-out.

The Register has of course been cov­er­ing both the loom­ing Google changes as well as the cam­paign it­self.

REG AD

It’s worth not­ing, too, that the Mountain View mas­sive is also tak­ing steps to make life harder for the or­ga­ni­za­tions cre­at­ing these de-Googled Android vari­ants — such as the changes to the Android Open Source Platform that re­duce how of­ten the source code will be made avail­able. As The Reg noted at the time, Google dropped its old Don’t Be Evil” motto when it turned 20… and com­mem­o­rated it by fir­ing staff who stuck to the motto. It has changed its po­si­tion on be­ing evil, but it’s not stu­pid.

Fancy a free FOSS fondleslab?

Even so, there are al­ter­na­tives. The Reg FOSS desk has writ­ten about sev­eral of them over the last four years, and we have more to come in the near fu­ture, as well.

For now, mul­ti­ple com­pa­nies will sell you a brand new smart­phone with a Google-free OS on it — ei­ther a de-Googled ver­sion of Android, or a Linux OS that is­n’t based on Android in the first place.

It’s some­what eas­ier to start with the FOSS Android Open Source Platform, and then sys­tem­at­i­cally re­move all the Google in­te­gra­tion. That still leaves a fea­ture-com­plete mo­bile OS, and it can still in­stall and run many Android apps.

Murena

Murena is one of the big names in this area. It spon­sors the de­vel­op­ment of /e/OS, which you can run on mul­ti­ple off-the-shelf hand­sets, but it also of­fers its own range of phones and tablets so you don’t need to mess around try­ing to root” an old phone. We looked at the Murena One phone in 2022 and then at /e/OS 3 on a Pixel Tablet last year.

One of the mod­els that Murena sells with /e/OS is from Fairphone. We re­ported on the Fairphone 6′s 10/10 re­pairabil­ity score last July.

REG AD

Punkt

Swiss de­signer-kit ven­dor Punkt of­fers a va­ri­ety of sleek black gad­gets in­clud­ing an alarm clock. It’s been mak­ing phones for years, and The Reg in­spected its min­i­mal­ist MP02 phone back in 2018. More re­cently, this vul­ture tried out the MC02 ul­tra-pri­vate smart­phone in 2024. That’s now been re­placed with a newer faster model, the MC03, and we are cur­rently in the process of re­view­ing one of the hand­sets.

Volla

German fondleslab-flinger Volla of­fers three smart­phones and a tablet. We have yet to get our claws on one, but all of them are avail­able with a choice of OSes: ei­ther the com­pa­ny’s own de-Google Android, Volla OS, or al­ter­na­tively, with Ubuntu Touch, the com­mu­nity-led con­tin­u­a­tion of Canonical’s phone OS. We’ve re­ported on the OTA-24 up­date and later the newer Ubuntu 20.04 re­lease.

Jolla

The German Volla is not to be con­fused with the Finnish Jolla. We took a look at its Sailfish 5 OS and new C2 hand­set in December last year. The first two batches of the phone have sold out, but the com­pany is cur­rently tak­ing or­ders for the third batch.

Furilabs

If you want Debian in your pocket, then Furilabs can help. We re­ported on the launch of their first hand­set, the FLX1, from Devconf.cz in 2024, and the com­pany pro­vided us with a hand­set which we re­viewed last year.

REG AD

Since then, the com­pany has launched its sec­ond model, the FLX1s. The old one was a bit of a brick, which is to be hon­est how this vul­ture likes his phones: it was 18 cm long, 9 cm wide, and 2.8 cm thick, and weighed just over a third of a kilo. (For our read­ers in Liberia, Myanmar and else­where, that’s 7 ×  × 1.1 inches, and ¾ lb.) The new model is un­der a third of the thick­ness, and just over 200g (7 oz.)

Purism

Purism has a range of Free Software-powered phones, tablets, and lap­tops, but the one that’s rel­e­vant here is the Librem 5. It’s a low-end hand­set by mod­ern stan­dards, and it’s ex­pen­sive at that, but then free­dom does cost.

PinePhone and post­mar­ke­tOS

Pine64 of­fers a va­ri­ety of hacker-friendly gad­gets which can run open-source firmware. The one that’s most rel­e­vant here is the orig­i­nal PinePhone. Do be­ware, though, it’s a very low-end, low-spec de­vice. We re­ported last year that the higher-end PinePhone Pro was be­ing dis­con­tin­ued, but the older model is still avail­able to or­der. It’s on the com­pa­ny’s global store al­though the EU store is cur­rently out of stock.

We have re­ported on both Mobian Linux and post­mar­ke­tOS, and this de­vice can run both.

Honorable men­tion: FXtec

FXtec also of­fers an in­trigu­ing hand­set, the Pro1. The Reg re­ported on its launch in 2020 and it’s still listed on the com­pa­ny’s web store, al­beit out of stock — but you may be able to find one.

But can I run my apps?

Well, prob­a­bly, yes.

Several of these OSes, in­clud­ing Sailfish, FuriOS, Mobian, and post­mar­ke­tOS are all pure Linux OSs. They’re not de­rived from Android, but they can all run an Android VM or con­tainer, and so al­low you to in­stall and use Android apps.

This is not an ex­haus­tive list. They are just ones I know of or have tried. If we missed any sig­nif­i­cant play­ers out, please do let us know. ®

Bootnote

When we re­ferred to Apple users find­ing them­selves with un­wel­come fea­tures, we were pri­mar­ily think­ing of the com­pa­ny’s new Liquid Glass user in­ter­face. However, on dis­cus­sion with sev­eral iDe­vice own­ers, it seems that there are more ag­gres­sive fea­tures. As the Reg men­tioned in pass­ing, the iOS 26.4 up­date in­tro­duces age ver­i­fi­ca­tion mea­sures to the OS. (That’s as well as chang­ing the pass­code key­pad.) For UK users, Apple’s age ver­i­fi­ca­tion wants to scan a UK pass­port or dri­ver’s li­cence. We know of a num­ber of adult cit­i­zens who own nei­ther of these doc­u­ments. (The au­thor does not: he has Manx ones. This lack re­cently cost him his Nationwide build­ing so­ci­ety ac­count.) Without of­fi­cial ID, they are now locked out of con­trol­ling their own phones, stuck in child mode with ac­cess con­trols they can’t change.

GitHub - chiennv2000/orthrus: Fast, lossless LLM inference via dual-view diffusion decoding.

github.com

Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion

Official im­ple­men­ta­tion and model check­points for Orthrus, a dual-ar­chi­tec­ture frame­work that uni­fies the ex­act gen­er­a­tion fi­delity of au­tore­gres­sive Large Language Models (LLMs) with the high-speed par­al­lel to­ken gen­er­a­tion of dif­fu­sion mod­els.

Model Zoo

All mod­els use a Qwen3 back­bone and guar­an­tee strictly loss­less gen­er­a­tion.

Installation

uv pip in­stall -e . uv pip in­stall ninja pack­ag­ing uv pip in­stall flash-attn –no-build-isolation # or: pip in­stall flash-attn-4[cu13]” if your de­vice sup­ports it

We rec­om­mend uv for fast de­pen­dency res­o­lu­tion.

We rec­om­mend uv for fast de­pen­dency res­o­lu­tion.

Quickstart

⚡ Try in­stantly: Run Orthrus di­rectly in Colab:

⚡ Try in­stantly: Run Orthrus di­rectly in Colab:

im­port torch from trans­form­ers im­port AutoModelForCausalLM, AutoTokenizer, TextStreamer

model = AutoModelForCausalLM.from_pretrained( chiennv/Orthrus-Qwen3 – 8B”, dtype=torch.bfloat16, de­vice_map=“cuda”, at­tn_im­ple­men­ta­tion=“flash_at­ten­tion_2”, # op­tions: sdpa | ea­ger | flash_at­ten­tion_4 trust_re­mote_­code=True, ).eval() to­k­enizer = AutoTokenizer.from_pretrained(“chiennv/Orthrus-Qwen3 – 8B”)

prompt = Write a pro­gram to count the fre­quency of each word in a para­graph.” mes­sages = [{“role”: system”, content”: ”}, {“role”: user”, content”: prompt}] in­put_ids = to­k­enizer.ap­ply_chat_tem­plate(mes­sages, re­turn_ten­sors=“pt”, ad­d_­gen­er­a­tion_prompt=True, en­able_­think­ing=False).in­put_ids

out­put_ids = model.gen­er­ate( in­put_ids=in­put_ids.to(model.de­vice), max_new_­to­kens=2048, use_d­if­fu­sion_­mode=True, streamer=TextStreamer(to­k­enizer, skip_prompt=True) # en­able stream­ing gen­er­a­tion )

Coming soon: Native in­te­gra­tion with vLLM and SGLang is com­ing soon. Stay tuned!

Coming soon: Native in­te­gra­tion with vLLM and SGLang is com­ing soon. Stay tuned!

Key Advantages

Significant Inference Acceleration: Breaks the se­quen­tial bot­tle­neck of stan­dard au­tore­gres­sive de­cod­ing, de­liv­er­ing up to a $7.8\times$ speedup on gen­er­a­tion tasks.

Strictly Lossless Generation: Employs an ex­act in­tra-model con­sen­sus mech­a­nism to guar­an­tee that the out­put matches the orig­i­nal base mod­el’s ex­act pre­dic­tive dis­tri­b­u­tion.

Zero Redundant Memory Overhead: Both the au­tore­gres­sive and dif­fu­sion views at­tend to the ex­act same high-fi­delity Key-Value (KV) cache na­tively, re­sult­ing in only an $O(1)$ mem­ory cache over­head.

Parameter Efficient: Parallel gen­er­a­tion ca­pa­bil­i­ties are in­jected by fine-tun­ing only 16% of the to­tal model pa­ra­me­ters while keep­ing the base LLM strictly frozen.

Performance Comparison: Orthrus vs. Speculative Decoding

Orthrus out­per­forms spec­u­la­tive de­cod­ing meth­ods like EAGLE-3, DFlash. By na­tively shar­ing the ex­act same KV cache across dual views, Orthrus avoids the re­dun­dant mem­ory over­head of draft mod­els, re­sult­ing in sig­nif­i­cantly higher to­ken ac­cep­tance rates and faster in­fer­ence times, es­pe­cially as con­text length scales.

Left: Average ver­i­fied to­kens per for­ward pass com­pared to EAGLE-3 and DFlash. Right: Simulated gen­er­a­tion time across scal­ing con­text lengths com­pared to DFlash.

Comparison with State-of-the-Art Diffusion Models

While re­cent dif­fu­sion lan­guage mod­els (dLLMs) of­fer par­al­lel de­cod­ing, they of­ten suf­fer from sig­nif­i­cant con­di­tional drift and se­vere ac­cu­racy degra­da­tion on com­plex rea­son­ing tasks. Orthrus re­solves this by de­cou­pling par­al­lel gen­er­a­tion from se­quen­tial con­straints, es­tab­lish­ing a new state-of-the-art for par­al­lel gen­er­a­tion fi­delity.

Throughput vs. Accuracy on MATH-500. Orthrus de­liv­ers a ~6x speedup over the Qwen3 – 8B base­line with strictly loss­less per­for­mance, whereas adap­ta­tions like Fast-dLLM-v2 suf­fer sig­nif­i­cant ac­cu­racy drops.

Citation

If you find this model or ar­chi­tec­ture use­ful in your work, please cite our pa­per:

@misc{vannguyen2026orthrusmemoryefficientparalleltoken, ti­tle={Or­thrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion}, au­thor={Chien Van Nguyen and Chaitra Hegde and Van Cuong Pham and Ryan A. Rossi and Franck Dernoncourt and Thien Huu Nguyen}, year={2026}, eprint={2605.12825}, archivePre­fix={arXiv}, pri­ma­ryClass={cs.LG}, url={https://​arxiv.org/​abs/​2605.12825}, }

DeepSeek-V4-Flash means LLM steering is interesting again

www.seangoedecke.com

Ever since Golden Gate Claude I’ve been fas­ci­nated with steering”: the idea that you can guide LLM out­puts by di­rectly ma­nip­u­lat­ing the ac­ti­va­tions of the model mid-flight.

DeepSeek V4 Flash

I was in­spired to write this post by an­ti­rez’s re­cent pro­ject DwarfStar 4, which is a ver­sion of llama.cpp that’s been stripped down to run only DeepSeek-V4-Flash. What’s so spe­cial about this model? It might be what many en­gi­neers have been wait­ing for: a lo­cal model good enough to com­pete with at least the low end of fron­tier model agen­tic cod­ing.

Since steer­ing re­quires a lo­cal model, it’s now prac­ti­cal for many en­gi­neers to try it out for the first time. And in­deed, an­ti­rez has baked steer­ing into DwarfStar 4 as a first-class cit­i­zen. Right now it’s very rudi­men­tary (basically just the toy verbosity” ex­am­ple you can repli­cate via prompt­ing), but the ini­tial re­lease was only eight days ago. I plan to fol­low this pro­ject closely.

How steer­ing works

The ba­sic idea be­hind steer­ing is ex­tract­ing a con­cept (like respond tersely”) from the mod­el’s in­ter­nal brain state, then reach­ing in dur­ing in­fer­ence and boost­ing the nu­mer­i­cal ac­ti­va­tions that form that con­cept.

One way you might do this is to feed your model the same set of a hun­dred prompts twice, once with the nor­mal prompts and once with the words respond tersely” ap­pended. Then mea­sure the dif­fer­ence in the mod­el’s ac­ti­va­tions1 for each prompt pair (by sub­tract­ing one ac­ti­va­tion ma­trix from the other). That’s your steering vec­tor”. In the­ory, you can go and add that to the same ac­ti­va­tion layer for any prompt and get the same ef­fect (of the model re­spond­ing tersely).

Another, more so­phis­ti­cated way you might do this is to train a sec­ond model to ex­tract features” from your mod­el’s ac­ti­va­tions: pat­terns of be­hav­ior that seem to show up to­gether. Then you can try to map those fea­tures back to in­di­vid­ual con­cepts, and boost them in the same way. This is more or less what Anthropic is do­ing with sparse au­toen­coders2. It’s the same prin­ci­ple as the naive ap­proach, but it lets you cap­ture deeper pat­terns (at the cost of be­ing much more ex­pen­sive in time, com­pute and ex­per­tise).

Why steer­ing is in­ter­est­ing

Steering sounds like a cheat code. Instead of painstak­ingly as­sem­bling a train­ing set that tries to push the model to­wards the smart” end of the dis­tri­b­u­tion in its train­ing data, why not sim­ply go un­cover the smart” dial in the mod­el’s brain and turn it all the way to the right?

It also seems like a more el­e­gant way to ad­just the way mod­els talk. Instead of fid­dling with the prompt (adding or re­mov­ing qual­i­fiers like you MUST), could­n’t we just have a con­trol panel of slid­ers like succinctness/verbosity” or conscientiousness/speed” and move them around di­rectly?

Finally, it’s just cool. Watching Golden Gate Claude un­will­ingly drag every sen­tence back to the Golden Gate Bridge is as fas­ci­nat­ing and un­set­tling as Oliver Sacks’ neu­ro­log­i­cal anec­dotes. What if your own mind was tweaked in a sim­i­lar way? Would it still be you?

Why steer­ing has­n’t been used

Why don’t we steer more, then? Why don’t ChatGPT and Claude Code al­ready have a steer­ing panel where you can ad­just the mod­el’s brain in real time? One rea­son is that steer­ing is kind of an un­for­tu­nately middle class” idea in AI re­search.

It’s be­neath the big AI labs, who can ma­nip­u­late their mod­els di­rectly with­out hav­ing to do awk­ward brain surgery mid-in­fer­ence. Anthropic is work­ing on this stuff, but largely from an in­ter­pretabil­ity and safety per­spec­tive (as far as I know). When they want a model to be­have in a cer­tain way, they don’t mess around with steer­ing, they just train the model.

Steering is also out of reach for reg­u­lar AI users like you and me3, who use LLMs via an API and thus don’t have ac­cess to the model weights or ac­ti­va­tions needed to steer the model. Only OpenAI can iden­tify or ex­pose steer­ing vec­tors for GPT-5.5, for in­stance. We could do this for open-weights mod­els, but un­til very re­cently (more on that later) there haven’t been any open mod­els strong enough to be worth do­ing this for.

On top of that, most ba­sic ap­pli­ca­tions of steer­ing are out­com­peted by just prompt­ing the model. It sounds pretty im­pres­sive to be able to ma­nip­u­late the mod­el’s brain di­rectly. But you know what else ma­nip­u­lates the mod­el’s brain di­rectly? Prompt to­kens. You can ex­er­cise fairly fine-grained con­trol over ac­ti­va­tions with steer­ing, but you can al­ready ex­er­cise ex­tremely fine-grained con­trol by tweak­ing the lan­guage of your prompt. In other words, there’s not much point go­ing to the trou­ble to steer a model to be more ver­bose when you could sim­ply ask.

Steering the un­prompt­able

One way for steer­ing to be re­ally use­ful is if we could iden­tify a con­cept that can’t be prompted for. What about intelligence”? You used to be able to prompt for in­tel­li­gence - this is why 4o-era prompt­ing al­ways be­gan with you are an ex­pert” - but cur­rent-gen­er­a­tion mod­els have that baked into their per­son­al­i­ties, so prompt­ing for it does noth­ing. Maybe steer­ing for it would still work?

Ultimately this is an em­pir­i­cal ques­tion, but I’m skep­ti­cal that we’ll be able to find an intelligence” steer­ing vec­tor. Put an­other way, the steer­ing vec­tor that makes up a con­cept as dif­fi­cult as intelligence” might be al­most co­ex­ten­sive with the en­tire set of weights of the model, and thus iden­ti­fy­ing it re­duces to the prob­lem of training a smart model”.

A suf­fi­ciently so­phis­ti­cated steer­ing ap­proach ends up just re­plac­ing the ac­tual model. If I take GPT-2, and at each layer I swap out the ac­ti­va­tions with the ac­ti­va­tions from a much stronger model with the same ar­chi­tec­ture, I will get a much bet­ter re­sult. But at that point you’re not mak­ing GPT-2 more in­tel­li­gent, you’re just talk­ing to the stronger model in­stead. The in­tel­li­gence is in the steer­ing, not in the model. For much more on this, see my post AI in­ter­pretabil­ity has the same prob­lems as phi­los­o­phy of mind.

Steering as data com­pres­sion

Another way for steer­ing to be use­ful is if we could some­how steer for a con­cept that re­quires a ton of to­kens to ex­press. Steering would thus save us a big chunk of the mod­el’s con­text win­dow. Intuitively, we might think of this as a way to shift a con­cept from the mod­el’s work­ing mem­ory into its im­plicit mem­ory.

For in­stance, what if we could iden­tify a knowledge of my par­tic­u­lar code­base” con­cept? When GPT-5.5 speed-reads my code­base, some of that knowl­edge it gains has to be buried in the ac­ti­va­tions, right? Maybe we could drag that out into a very large steer­ing vec­tor.

I would be sur­prised if this could work. I think we’ll run into the same prob­lem as with ex­tract­ing intelligence”: the knows my code­base” con­cept is prob­a­bly so­phis­ti­cated enough to re­quire a full fine-tune of the mod­el4. But it at least seems pos­si­ble.

Conclusion

I’m fas­ci­nated with steer­ing, but I’m not par­tic­u­larly op­ti­mistic about it. I think most of the gains can be more ef­fi­ciently re­pro­duced with prompts, and that the truly am­bi­tious steer­ing goals can be more ef­fi­ciently re­pro­duced by train­ing or fine-tun­ing the model.

However, the open-source com­mu­nity has­n’t done a lot of work on steer­ing yet, and that might be just start­ing to change now. If I’m wrong and it does have prac­ti­cal ap­pli­ca­tions, we should find that out in the next six months.

It’ll be in­ter­est­ing to see if be­spoke per-model tools like DwarfStar 4 end up in­clud­ing a library” of boost­able fea­tures. When a pop­u­lar open-weights model is re­leased, the com­mu­nity al­ways rushes to re­lease a suite of wrap­pers and quan­tized ver­sions. Could we also see a rush to ex­tract boost­able fea­tures from the model?

edit: this post got some com­ments on Hacker News. Several com­menters (including an­ti­rez him­self) pointed out that steer­ing can change some trained in” be­hav­ior in ways that prompt­ing can’t: most no­tably to re­move re­fusal from the model. Another com­menter says that this is how un­cen­sor­ing/​ablit­er­a­tion is al­ready done for open mod­els. I did­n’t know that - I thought the un­cen­sored mod­els were typ­i­cally LoRA fine-tunes. On this point, an­ti­rez noted that mod­i­fy­ing the weights can dam­age model ca­pa­bil­i­ties more than the more light­weight run­time-steer­ing ap­proach (which can only be ap­plied when needed). Makes sense to me.

Models have lots of dif­fer­ent ac­ti­va­tions you might mea­sure (after at­ten­tion, be­tween each layer, etc). You can ba­si­cally pick any one you want, or try mul­ti­ple and see what works best. ↩

Models have lots of dif­fer­ent ac­ti­va­tions you might mea­sure (after at­ten­tion, be­tween each layer, etc). You can ba­si­cally pick any one you want, or try mul­ti­ple and see what works best.

I re­cently read a re­ally good deep dive into do­ing this with an open LLaMA model (and I tried it my­self a few months ago, with mixed re­sults.) ↩

I re­cently read a re­ally good deep dive into do­ing this with an open LLaMA model (and I tried it my­self a few months ago, with mixed re­sults.)

Apologies to my read­ers from the big AI labs. Please email me if you have tried steer­ing in­ter­nally to boost ca­pa­bil­i­ties and it has­n’t worked. I promise I won’t tell any­one. ↩

Apologies to my read­ers from the big AI labs. Please email me if you have tried steer­ing in­ter­nally to boost ca­pa­bil­i­ties and it has­n’t worked. I promise I won’t tell any­one.

And even then, the re­sults of fine tune a model on your code­base” in the in­dus­try have largely been un­suc­cess­ful. ↩

And even then, the re­sults of fine tune a model on your code­base” in the in­dus­try have largely been un­suc­cess­ful.

Here’s a pre­view of a re­lated post that shares tags with this one.

We've made the world too complicated

user8.bearblog.dev

16 May, 2026

We’ve made the world too com­pli­cated. I’m writ­ing this with tech­nol­ogy I will never fully un­der­stand in a build­ing with rooms I can never en­ter, liv­ing in a coun­try dic­tated by laws I can’t con­trol. We spend the ma­jor­ity of our wak­ing hours and lives in an ab­stract world of com­pressed life. The mo­ment I walk through my door I’m in a zon­ing area on a city-owned side­walk, flanked by ugly metal­lic mon­sters, float­ing through a sea of strangers.

Our world is an ex­plo­sion of en­vi­ron­men­tal harm, ma­nip­u­la­tion, cor­rup­tion, and dam­age to every­thing around us.

This puts us all un­der a stress we can’t con­sciously no­tice. Manifesting in the slight clench­ing our jaws, thin­ning of our breath, steady in­cline of our blood pres­sure. There’s a spirit of silent con­fu­sion in our mind at all times. The world does­n’t make sense. It’s al­ways been this way, so we don’t even know an­other way to ex­ist.

In the doc­u­men­tary The Thinking Game about Demis Hassabis and Google Deepmind, we are pre­sented with the world­view that AGI of­fers the best so­lu­tion to hu­man­i­ty’s biggest prob­lems. The ul­ti­mate sav­ior from tech­nol­ogy.

I think we do a very good job at con­vinc­ing our­selves that we are do­ing good things, work­ing to­wards hon­est goals. Participating in so­ci­ety, dis­cov­er­ing new truths, im­ple­ment­ing new plans and pro­jects. Seeing how easy it is to ma­nip­u­late oth­ers, it makes sense that we are the mas­ters of con­struct­ing re­al­i­ties around our­selves as well.

Honestly, I’ve wanted to snap my lap­top right at the hinge so many times. To throw my phone into the sea. I’ve wanted to walk out of my school or of­fice and never re­turn. I want to never pay with money or read a writ­ten word again. But to do so would leave you alone and a lu­natic.

These thoughts are bad. These thoughts are ag­gran­diz­ing primitive” ways. No. We are prim­i­tive now.

The more we learn, the more de­struc­tion seems to fol­low. The sick irony is that we would never have un­der­stood this with­out tools that help us look back, or so we are led to be­lieve. Our in­ter­nal in­tu­ition about right and wrong seems to leave us at an early age.

I used to want to do many things. Make great art, build great ma­chines, solve im­por­tant is­sues. Maybe our great­est gift to the world is to do as lit­tle as pos­si­ble. To look at the birds, feel the wind and the wa­ter in our own hands, and … noth­ing more. Eat when we are hun­gry, laugh when we are happy, cry when we are empty. And maybe that is the great­est gift to our­selves as well.

Discussion on Hacker News

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

Visit pancik.com for more.