10 interesting stories served every morning and every evening.




1 1,325 shares, 59 trendiness

Kagi Translate

About Kagi Log in Try for free

...

Read the original on translate.kagi.com »

2 844 shares, 44 trendiness

Reddit User Uncovers Who Is Behind Meta’s $2B Lobbying for Invasive Age Verification Tech

Reddit re­searcher ex­poses Meta’s $2B cam­paign to force Apple and Google into build­ing sur­veil­lance sys­tems while ex­empt­ing its own plat­forms

Reddit re­searcher ex­poses Meta’s $2B cam­paign to force Apple and Google into build­ing sur­veil­lance sys­tems while ex­empt­ing its own plat­forms

A Reddit re­searcher just ex­posed how Meta fun­neled over $2 bil­lion through shad­owy non­prof­its to push age ver­i­fi­ca­tion laws that would force Apple and Google to build sur­veil­lance in­fra­struc­ture into every de­vice—while con­ve­niently ex­empt­ing Meta’s own plat­forms from the same re­quire­ments.

The in­ves­ti­ga­tion by GitHub user upper-up” traces fund­ing through or­ga­ni­za­tions like the Digital Childhood Alliance (DCA), which launched December 18, 2024, and tes­ti­fied for Utah’s SB-142 just days later. Bloomberg and Deseret News re­ported Meta’s back­ing of DCA, part of a $70 mil­lion frag­mented su­per PAC strat­egy de­signed to evade FEC track­ing. Traditional elec­tion spend­ing dis­clo­sure re­quire­ments don’t ap­ply to this frag­mented ap­proach.

The tech­ni­cal re­al­ity hits harder than pol­icy ab­strac­tions. These bills man­date OS-level APIs that apps can query for age data—cre­at­ing a per­ma­nent iden­tity layer baked into your phone’s core func­tions. Meta’s Horizon OS for Quest VR al­ready im­ple­ments this in­fra­struc­ture through Family Center con­trols. Now they want Apple and Google to build sim­i­lar sys­tems that every app can ac­cess, turn­ing age ver­i­fi­ca­tion into per­sis­tent de­vice fin­ger­print­ing.

Here’s where the lob­by­ing gets sur­gi­cal. The pro­posed laws ham­mer Apple’s App Store and Google Play with com­pli­ance re­quire­ments but re­port­edly spare so­cial me­dia plat­forms—Meta’s core busi­ness. It’s like Spotify lob­by­ing for stream­ing reg­u­la­tions that only ap­ply to Apple Music. The child safety” rhetoric masks a com­pet­i­tive strat­egy that shifts li­a­bil­ity from plat­forms to op­er­at­ing sys­tem mak­ers.

The European Union’s Digital Identity Wallet takes a rad­i­cally dif­fer­ent ap­proach. Zero-knowledge proofs let you ver­ify age with­out re­veal­ing per­sonal data—like show­ing you’re over 18 with­out dis­clos­ing your birth­date or iden­tity de­tails. It’s open-source, self-hostable, and only ap­plies to large plat­forms while ex­empt­ing FOSS and small en­ti­ties. Meanwhile, US law­mak­ers seem ready to let Meta bam­boo­zle them into com­plete pri­vacy an­ni­hi­la­tion.

Your de­vice’s trust­wor­thi­ness hangs in the bal­ance. These laws could force every Linux dis­tri­b­u­tion and pri­vacy-fo­cused Android fork to im­ple­ment iden­tity ver­i­fi­ca­tion or face le­gal li­a­bil­ity. The choice be­tween sur­veil­lance-free com­put­ing and reg­u­la­tory com­pli­ance is com­ing faster than you think.

...

Read the original on www.gadgetreview.com »

3 563 shares, 34 trendiness

Microsoft’s ‘unhackable’ Xbox One has been hacked by 'Bliss'

A ground­break­ing hack for Microsoft’s unhackable’ Xbox One was re­vealed at the re­cent RE//verse 2026 con­fer­ence. This con­sole has re­mained a fortress since its launch in 2013, but now Markus Doom’ Gaasedelen has show­cased the Bliss’ dou­ble glitch. Just as the Xbox 360 fa­mously fell to the Reset Glitch Hack (RGH), the Xbox One has now fallen to Voltage Glitch Hacking (VGH).

In 2013 some kind of iron cur­tain came down on se­cu­rity, of the Xbox ecosys­tem, and the Xbox One never got hacked,” noted Gaasedelen in his in­tro­duc­tion. The same is true of the Xbox One’s suc­ces­sors, and Microsoft was rightly proud. Seven years af­ter its launch, Microsoft en­gi­neers would still as­sert that the Xbox One was the most se­cure prod­uct Microsoft has ever pro­duced.”

What made the Xbox One so se­cure, so spe­cial? Gaasedelen ref­er­enced prior work and pre­sen­ta­tions to con­vey this in­for­ma­tion. I’ve shared a sum­mary slide about this, too, but let’s fast for­ward to the demo of the new Bliss hack, which takes place from about 46 min­utes into the pre­sen­ta­tion.

Since re­set glitch­ing was­n’t pos­si­ble, Gaasedelen thought some volt­age glitch­ing could do the trick. So, in­stead of tin­ker­ing with the sys­tem rest pin(s) the hacker tar­geted the mo­men­tary col­lapse of the CPU volt­age rail. This was quite a feat, as Gaasedelen could­n’t see’ into the Xbox One, so had to de­velop new hard­ware in­tro­spec­tion tools.

Eventually, the Bliss ex­ploit was for­mu­lated, where two pre­cise volt­age glitches were made to land in suc­ces­sion. One skipped the loop where the ARM Cortex mem­ory pro­tec­tion was setup. Then the Memcpy op­er­a­tion was tar­geted dur­ing the header read, al­low­ing him to jump to the at­tacker-con­trolled data.

As a hard­ware at­tack against the boot ROM in sil­i­con, Gaasedelen says the at­tack in un­patch­able. Thus it is a com­plete com­pro­mise of the con­sole al­low­ing for load­ing un­signed code at every level, in­clud­ing the Hypervisor and OS. Moreover, Bliss al­lows ac­cess to the se­cu­rity proces­sor so games, firmware, and so on can be de­crypted.

What hap­pens next with this tech­nique re­mains to be seen. Digital archivists should en­joy new lev­els of ac­cess to Xbox One firmware, OS, games. There could be sub­se­quent em­u­la­tion break­throughs thanks to this ef­fort. We also now have a route to mak­ing a Bliss-a-like mod chip to au­to­mate the pre­cise elec­tri­cal glitch­ing re­quired.

Whether PC users, our core read­er­ship, will be in­ter­ested in ac­tu­ally em­u­lat­ing Xbox One, looks un­likely. The 2013 sys­tem’s game li­brary is largely over­lapped in bet­ter qual­ity on the PC plat­form.

Follow Tom’s Hardware on Google News, or add us as a pre­ferred source, to get our lat­est news, analy­sis, & re­views in your feeds.

...

Read the original on www.tomshardware.com »

4 501 shares, 21 trendiness

Every layer of review makes you 10x slower

Every layer of re­view makes you 10x slower

We’ve all heard of those net­work ef­fect laws: the value of a net­work goes up with the square of the num­ber of mem­bers. Or the cost of com­mu­ni­ca­tion goes up with the square of the num­ber of mem­bers, or maybe it was n log n, or some­thing like that, de­pend­ing how you arrange the mem­bers. Anyway dou­bling a team does­n’t dou­ble its speed; there’s co­or­di­na­tion over­head. Exactly how much over­head de­pends on how badly you botch the org de­sign.

But there’s one rule of thumb that some­one showed me decades ago, that has stuck with me ever since, be­cause of how an­noy­ingly true it is. The rule is an­noy­ing be­cause it does­n’t seem like it should be true. There’s no the­o­ret­i­cal ba­sis for this claim that I’ve ever heard. And yet, every time I look for it, there it is.

Here we go:

I know what you’re think­ing. Come on, 10x? That’s a lot. It’s un­fath­omable. Surely we’re ex­ag­ger­at­ing.

Just to be clear, we’re count­ing wall clock time” here rather than ef­fort. Almost all the ex­tra time is spent sit­ting and wait­ing.

Get it code re­viewed by the peer next to you

300 min­utes → 5 hours → half a day

Get a de­sign doc ap­proved by your ar­chi­tects team first

50 hours → about a week

Get it on some other team’s cal­en­dar to do all that

(for ex­am­ple, if a cus­tomer re­quests a fea­ture)

500 hours → 12 weeks → one fis­cal quar­ter

I wish I could tell you that the next step up — 10 quar­ters or about 2.5 years — was too crazy to con­tem­plate, but no. That’s the life of an ex­ec­u­tive sit­ting above a medium-sized team; I bump into it all the time even at a rel­a­tively small com­pany like Tailscale if I want to change prod­uct di­rec­tion. (And ex­ecs sit­ting above large teams can’t ac­tu­ally do work of their own at all. That’s an­other story.)

First of all, this is­n’t a post about AI, be­cause AIs di­rect im­pact on this prob­lem is min­i­mal. Okay, so Claude can code it in 3 min­utes in­stead of 30? That’s su­per, Claude, great work.

Now you ei­ther get to spend 27 min­utes re­view­ing the code your­self in a back-and-forth loop with the AI (this is ac­tu­ally kinda fun); or you save 27 min­utes and sub­mit un­ver­i­fied code to the code re­viewer, who will still take 5 hours like be­fore, but who will now be mad that you’re mak­ing them read the slop that you were too lazy to read your­self. Little of value was gained.

Now now, you say, that’s not the value of agen­tic cod­ing. You don’t use an agent on a 30-minute fix. You use it on a mon­stros­ity week-long pro­ject that you and Claude can now do in a cou­ple of hours! Now we’re talk­ing. Except no, be­cause the mon­stros­ity is so big that your re­viewer will be ex­tra mad that you did­n’t read it your­self, and it’s too big to re­view in one chunk so you have to slice it into new bite-sized chunks, each with a 5-hour re­view cy­cle. And there’s no de­sign doc so there’s no in­ten­tional ar­chi­tec­ture, so even­tu­ally some­one’s go­ing to push back on that and here we go with the de­sign doc re­view meet­ing, and now your mon­stros­ity week-long pro­ject that you did in two hours is… oh. A week, again.

I guess I could have called this post Systems Design 4 (or 5, or what­ever I’m up to now, who knows, I’m writ­ing this on a plane with no wifi) be­cause yeah, you guessed it. It’s Systems Design time again.

The only way to sus­tain­ably go faster is fewer re­views

It’s funny, every­one has been pre­dict­ing the Singularity for decades now. The premise is we build sys­tems that are so smart that they them­selves can build the next sys­tem that is even smarter, that builds the next smarter one, and so on, and once we get that started, if they keep get­ting smarter faster enough, then the in­cre­men­tal time (t) to achieve a unit (u) of im­prove­ment goes to zero, so (u/t) goes to in­fin­ity and foom.

Anyway, I have never be­lieved in this the­ory for the sim­ple rea­son we out­lined above: the ma­jor­ity of time needed to get any­thing done is not ac­tu­ally the time do­ing it. It’s wall clock time. Waiting. Latency.

And you can’t over­come la­tency with brute force.

I know you want to. I know many of you now work at com­pa­nies where the busi­ness model kinda de­pends on do­ing ex­actly that.

But you can’t just not re­view things!

Ah, well, no, ac­tu­ally yeah. You re­ally can’t.

There are now many peo­ple who have seen the symp­tom: the start of the pipeline (AI gen­er­ated code) is so much faster, but all the sub­se­quent stages (reviews) are too slow! And so they in­tuit the ob­vi­ous so­lu­tion: stop re­view­ing then!

The re­sult might be slop, but if the slop is 100x cheaper, then it only needs to de­liver 1% of the value per unit and it’s still a fair trade. And if your value per unit is even a mere 2% of what it used to be, you’ve dou­bled your re­turns! Amazing.

There are some pretty dumb as­sump­tions un­der­ly­ing that the­ory; you can imag­ine them for your­self. Suffice it to say that this pro­duces what I will call the AI Developer’s Descent Into Madness:

Whoa, I pro­duced this pro­to­type so fast! I have su­per pow­ers!

This pro­to­type is get­ting buggy. I’ll tell the AI to fix the bugs.

Hmm, every change now causes as many new bugs as it fixes.

Aha! But if I have an AI agent also re­view the code, it can find its own bugs!

Wait, why am I per­son­ally pass­ing data back and forth be­tween agents

I can have my agent write an agent frame­work!

It’s ac­tu­ally alarm­ing how many friends and re­spected peers I’ve lost to this cy­cle al­ready. Claude Code only got good maybe a few months ago, so this only re­cenlty started hap­pen­ing, so I as­sume they will emerge from the spi­ral even­tu­ally. I mean, I hope they will. We have no way of know­ing.

Anyway we know our symp­tom: the pipeline gets jammed up be­cause of too much new code spewed into it at step 1. But what’s the root cause of the clog? Why does­n’t the pipeline go faster?

I said above that this is­n’t an ar­ti­cle about AI. Clearly I’m fail­ing at that so far, but let’s bring it back to hu­mans. It goes back to the an­noy­ingly true ob­ser­va­tion I started with: every layer of re­view is 10x slower. As a so­ci­ety, we know this. Maybe you haven’t seen it be­fore now. But trust me: peo­ple who do org de­sign for a liv­ing know that lay­ers are ex­pen­sive… and they still do it.

As com­pa­nies grow, they all end up with more and more lay­ers of col­lab­o­ra­tion, re­view, and man­age­ment. Why? Because oth­er­wise mis­takes get made, and mis­takes are in­creas­ingly ex­pen­sive at scale. The av­er­age value added by a new fea­ture even­tu­ally be­comes lower than the av­er­age value lost through the new bugs it causes. So, lack­ing a way to make fea­tures pro­duce more value (wouldn’t that be nice!), we try to at least re­duce the dam­age.

The more checks and con­trols we put in place, the slower we go, but the more mo­not­o­n­i­cally the qual­ity in­creases. And is­n’t that the ba­sis of con­tin­u­ous im­prove­ment?

Well, sort of. Monotonically in­creas­ing qual­ity is on the right track. But more checks and con­trols” went off the rails. That’s only one way to im­prove qual­ity, and it’s a fraught one.

I wrote a few years ago about W. E. Deming and the new” phi­los­o­phy around

qual­ity that he pop­u­lar­ized in Japanese auto man­u­fac­tur­ing. (Eventually U. S. auto man­u­fac­tur­ers more or less got the idea. So far the soft­ware in­dus­try has­n’t.)

One of the ef­fects he high­lighted was the prob­lem of a QA pass in a fac­tory: build wid­gets, have an in­spec­tion/​QA phase, re­ject wid­gets that fail QA. Of course, your in­spec­tors prob­a­bly miss some of the fail­ures, so when in doubt, add a sec­ond QA phase af­ter the first to catch the re­main­ing ones, and so on.

In a sim­plis­tic math­e­mat­i­cal model this seems to make sense. (For ex­am­ple, if every QA pass catches 90% of de­fects, then af­ter two QA passes you’ve re­duced the num­ber of de­fects by 100x. How awe­some is that?)

But in the re­al­ity of agen­tic hu­mans, it’s not so sim­ple. First of all, the in­cen­tives get weird. The sec­ond QA team ba­si­cally serves to eval­u­ate how well the first QA team is do­ing; if the first QA team keeps miss­ing de­fects, fire them. Now, that sec­ond QA team has lit­tle in­cen­tive to pro­duce that out­come for their friends. So maybe they don’t look too hard; af­ter all, the first QA team missed the de­fect, it’s not un­rea­son­able that we might miss it too.

Furthermore, the first QA team knows there is a sec­ond QA team to catch any de­fects; if I don’t work too hard to­day, surely the sec­ond team will pick up the slack. That’s why they’re there!

Also, the team mak­ing the wid­gets in the first place does­n’t check their work too care­fully; that’s what the QA team is for! Why would I slow down the pro­duc­tion of every wid­get by be­ing care­ful, at a cost of say 20% more time, when there are only 10 de­fects in 100 and I can just elim­i­nate them at the next step for only a 10% waste over­head? It only makes sense. Plus they’ll fire me if I go 20% slower.

To say noth­ing of a whole en­gi­neer­ing re­design to im­prove qual­ity, that would be su­per ex­pen­sive and we could be de­sign­ing all new wid­gets in­stead.

Sound like any en­gi­neer­ing de­part­ments you know?

Well, this is­n’t the right time to re­hash Deming, but suf­fice it to say, he was on to some­thing. And his tech­niques worked. You get things like the fa­mous Toyota Production System where they elim­i­nated the QA phase en­tirely, but gave every­body an oh crap, stop the line, I found a de­fect!” but­ton.

Famously, US auto man­u­fac­tur­ers tried to adopt the same sys­tem by in­stalling the same stop the line” but­tons. Of course, no­body pushed those but­tons. They were afraid of get­ting fired.

The ba­sis of the Japanese sys­tem that worked, and the miss­ing part of the American sys­tem that did­n’t, is trust. Trust among in­di­vid­u­als that your boss Really Truly Actually wants to know about every de­fect, and wants you to stop the line when you find one. Trust among man­agers that ex­ec­u­tives were se­ri­ous about qual­ity. Trust among ex­ec­u­tives that in­di­vid­u­als, given a sys­tem that can work and has the right in­cen­tives, will pro­duce qual­ity work and spot their own de­fects, and push the stop but­ton when they need to push it.

But, one more thing: trust that the sys­tem ac­tu­ally does work. So first you need a sys­tem that will work.

AI coders are fal­li­ble; they write bad code, of­ten. In this way, they are just like hu­man pro­gram­mers.

Deming’s ap­proach to man­u­fac­tur­ing did­n’t have any magic bul­lets. Alas, you can’t just fol­low his ten-step process and im­me­di­ately get higher qual­ity en­gi­neer­ing. The se­cret is, you have to get your en­gi­neers to en­gi­neer higher qual­ity into the whole sys­tem, from top to bot­tom, re­peat­edly. Continuously.

Every time some­thing goes wrong, you have to ask, How did this hap­pen?” and then do a whole post-mortem and the Five Whys (or how­ever many Whys are in fash­ion nowa­days) and fix the un­der­ly­ing Root Causes so that it does­n’t hap­pen again. The coder did it wrong” is never a root cause, only a symp­tom. Why was it pos­si­ble for the coder to get it wrong?

The job of a code re­viewer is­n’t to re­view code. It’s to fig­ure out how to ob­so­lete their code re­view com­ment, that whole class of com­ment, in all fu­ture cases, un­til you don’t need their re­views at all any­more.

By the time your re­view catches a mis­take, the mis­take has al­ready been made. The root cause hap­pened al­ready. You’re too late.

I wish I could tell you I had all the an­swers. Actually I don’t have much. If I did, I’d be first in line for the Singularity be­cause it sounds kind of awe­some.

I think we’re go­ing to be stuck with these sys­tems pipeline prob­lems for a long time. Review pipelines — lay­ers of QA — don’t work. Instead, they make you slower while hid­ing root causes. Hiding causes makes them harder to fix.

But, the call of AI cod­ing is strong. That first, fast step in the pipeline is so fast! It re­ally does feel like hav­ing su­per pow­ers. I want more su­per pow­ers. What are we go­ing to do about it?

Maybe we fi­nally have a com­pelling enough ex­cuse to fix the 20 years of prob­lems hid­den by code re­view cul­ture, and re­place it with a real cul­ture of qual­ity.

I think the op­ti­mists have half of the right idea. Reducing re­view stages, even to an un­com­fort­able de­gree, is go­ing to be needed. But you can’t just re­duce re­view stages with­out some­thing to re­place them. That way lies the Ford Pinto or any re­cent Boeing air­craft.

The com­plete pack­age, the table flip, was what Deming brought to man­u­fac­tur­ing. You can’t half-adopt a total qual­ity” sys­tem. You need to elim­i­nate the re­views and ob­so­lete them, in one step.

How? You can fully adopt the new sys­tem, in small bites. What if some com­po­nents of your sys­tem can be built the new way? Imagine an old-school U. S. auto man­u­fac­turer buy­ing parts from Japanese sup­pli­ers; wow, these parts are so well made! Now I can start re­mov­ing QA steps else­where be­cause I can just as­sume the parts are go­ing to work, and my job of assemble a big­ger wid­get from the parts” has a ton of its com­plex­ity re­moved.

I like this view. I’ve al­ways liked small beau­ti­ful things, that’s my own bias. But, you can as­sem­ble big beau­ti­ful things from small beau­ti­ful things.

It’s a lot eas­ier to build those in­di­vid­ual beau­ti­ful things in small teams that trust each other, that know what qual­ity looks like to them. They de­liver their things to cus­tomer teams who can clearly ex­plain what qual­ity looks like to them. And on we go. Quality starts bot­tom-up, and spreads.

I think small star­tups are go­ing to do re­ally well in this new world, prob­a­bly bet­ter than ever. Startups al­ready have fewer lay­ers of re­view just be­cause they have fewer peo­ple. Some star­tups will fig­ure out how to pro­duce high qual­ity com­po­nents quickly; oth­ers won’t and will fail. Quality by nat­ural se­lec­tion?

Bigger com­pa­nies are gonna have a harder time, be­cause their slow re­view sys­tems are baked in, and delet­ing them would cause com­plete chaos.

But, it’s not just about com­pany size. I think en­gi­neer­ing teams at any com­pany can get smaller, and have bet­ter de­fined in­ter­faces be­tween them.

Maybe you could have mul­ti­ple teams in­side a com­pany com­pet­ing to de­liver the same com­po­nent. Each one is just a few peo­ple and a few cod­ing bots. Try it 100 ways and see who comes up with the best one. Again, qual­ity by evo­lu­tion. Code is cheap but good ideas are not. But now you can try out new ideas faster than ever.

Maybe we’ll see a new op­ti­mal point on the mono­liths-mi­croser­vices

con­tin­uum. Microservices got a bad name be­cause they were too mi­cro; in the orig­i­nal ter­mi­nol­ogy, a micro” ser­vice was ex­actly the right size for a two pizza team” to build and op­er­ate on their own. With AI, maybe it’s one pizza and some to­kens.

What’s fun is you can also use this new, faster cod­ing to ex­per­i­ment with dif­fer­ent mod­ule bound­aries faster. Features are still hard for lots of rea­sons, but refac­tor­ing and au­to­mated in­te­gra­tion test­ing are things the AIs ex­cel at. Try split­ting out a mod­ule you were afraid to split out be­fore. Maybe it’ll add some lines of code. But sud­denly lines of code are cheap, com­pared to the co­or­di­na­tion over­head of a big­ger team main­tain­ing both parts.

Every team has some mono­liths that are a lit­tle too big, and too many lay­ers of re­views. Maybe we won’t get all the way to Singularity. But, we can en­gi­neer a much bet­ter world. Our prob­lems are solv­able.

...

Read the original on apenwarr.ca »

5 447 shares, 58 trendiness

A Decade of Slug

What is now known as the Slug Algorithm for ren­der­ing fonts di­rectly from Bézier curves on the GPU was de­vel­oped in the Fall of 2016, so this year marks a full decade since its in­cep­tion. I pub­lished a pa­per in JCGT about the tech­nique in the mid­dle of 2017, and my com­pany sold the first li­cense for ver­sion 1.0 of the Slug Library not long af­ter­ward. Since then, Slug has been li­censed widely in the video games in­dus­try as well as by an ar­ray of com­pa­nies spe­cial­iz­ing in ar­eas like sci­en­tific vi­su­al­iza­tion, CAD, video edit­ing, med­ical equip­ment, and even plan­e­tar­i­ums. Our clients in­clude Activision, Blizzard, id Software, 2K Games, Ubisoft, Warner Brothers, Insomniac, Zenimax, and Adobe among many oth­ers. Slug turned out to be the most suc­cess­ful soft­ware prod­uct I’ve ever made.

I orig­i­nally cre­ated Slug in pur­suit of bet­ter text ren­der­ing for the C4 Engine, where fonts needed to look great not only in the GUI, but in­side game lev­els where they could ap­pear very large and be viewed at oblique an­gles. Most re­cently, I used Slug to build the Radical Pie equa­tion ed­i­tor, which of course, needs ex­tremely high-qual­ity font ren­der­ing as well as vec­tor graph­ics for things like brack­ets, rad­i­cals, and purely graph­i­cal items like ar­rows and high­lights at­tached to math­e­mat­i­cal ex­pres­sions. Slug is also used to ren­der the en­tire user in­ter­face in­side the main edit­ing win­dow and all di­a­log boxes.

This post talks about what has changed within the ren­der­ing method since 2017, when the pa­per was pub­lished and the Slug Library was first re­leased. It then con­cludes with an ex­cit­ing an­nounce­ment for those who may want to im­ple­ment the Slug al­go­rithm for their own pro­jects.

Slug ren­ders text and vec­tor graph­ics on the GPU di­rectly from Bézier curve data with­out the use of tex­ture maps con­tain­ing pre­com­puted or cached images of any kind. Doing this ro­bustly, while also be­ing fast and pro­duc­ing high qual­ity re­sults, is a dif­fi­cult prob­lem when we have to deal with float­ing-point round-off er­rors. Robustness re­quires that we never see ar­ti­facts like dropped pix­els, sparkles, or streaks un­der any cir­cum­stances, prov­ably so. Being fast means that the al­go­rithm can ren­der any rea­son­able amount of text on the game con­soles of 2016 with­out im­pact­ing frame rates sig­nif­i­cantly. Producing high-qual­ity results means that we get nicely an­tialiased text with smooth curves and sharp cor­ners when viewed at any scale and from any per­spec­tive. The prin­ci­ples by which the Slug ren­der­ing al­go­rithm achieves all of this are sum­ma­rized in the fol­low­ing di­a­gram. (Click for PDF ver­sion.)

The method that de­ter­mines root el­i­gi­bil­ity and cal­cu­lates the wind­ing num­ber, which is re­spon­si­ble for ro­bust­ness, is pretty much ex­actly the same now as it was in 2017 when Slug was first re­leased. Some other parts of the ren­der­ing code that were de­scribed in the pa­per have changed over the years, how­ever. I’ll briefly de­scribe the smaller changes here be­fore talk­ing about the big ad­di­tion called dynamic di­la­tion” in its own sec­tion be­low.

The orig­i­nal pa­per in­cluded a de­scrip­tion of a band split op­ti­miza­tion” that could be turned on when it was known that glyphs would be ren­dered at a large size. It did pro­vide a speed in­crease for large glyphs, but it also in­tro­duced some di­ver­gence in the pixel shader that could hurt per­for­mance a lit­tle for text ren­dered at a small size. This op­ti­miza­tion also re­quired that the list of curves in­ter­sect­ing each band be stored twice, once sorted for rays point­ing in one di­rec­tion and again sorted for rays point­ing in the op­po­site di­rec­tion. The speed im­prove­ment was mod­est and did­n’t ap­ply uni­ver­sally, so I de­cided to re­move it. This elim­i­nated some com­plex­ity in the pixel shader, and more im­por­tantly, it al­lowed the band data to be cut in half. The tex­ture con­tain­ing the band data now uses two 16-bit com­po­nents in­stead of four.

In the Extensions sec­tion at the end of the pa­per, there was some dis­cus­sion about su­per­sam­pling. Though not nec­es­sary for ren­der­ing text at or­di­nary sizes, adaptive su­per­sam­pling was im­ple­mented in early ver­sions of Slug to en­hance text drawn at very small sizes. If small text was ren­dered far away in a 3D scene, then su­per­sam­pling re­duced the amount of alias­ing sig­nif­i­cantly as the cam­era moved, and be­cause it was adap­tive, the num­ber of sam­ples taken larger text was still just one. Supersampling was re­moved be­cause (a) it made a dif­fer­ence only for text so small that it was barely read­able any­way and (b) alias­ing for tiny text was mitigated to a high de­gree by the di­la­tion tech­nique de­scribed be­low. Removing su­per­sam­pling also sim­pli­fied the pixel shader con­sid­er­ably. (Conditional com­pi­la­tion already elim­i­nated the su­per­sam­pling code when it was turned off, so its re­moval did not mean that the or­di­nary sin­gle-sam­ple shader got any faster.)

The Extensions sec­tion also talked about adding a loop to the pixel shader in or­der to ren­der multi-color emoji, which are es­sen­tially a stack of glyphs in which each layer has a dif­fer­ent color. This proved to be un­op­ti­mal be­cause many of the lay­ers of­ten only cov­ered a small frac­tion of the to­tal area of the com­pos­ite glyph, but per-layer ren­der­ing cal­cu­la­tions were still be­ing per­formed over the full bound­ing poly­gon. It turned out to be bet­ter to ren­der a bunch of in­de­pen­dent glyphs on top of each other, even though it in­creased the amount of ver­tex data, so that each layer could have its own bound­ing poly­gon. This was faster, and it again sim­pli­fied the pixel shader code.

There has been one ma­jor im­prove­ment to the ren­der­ing al­go­rithm since the in­tro­duc­tion of the Slug Library. It’s called dy­namic di­la­tion, and it solves the prob­lem dis­cussed in a pre­vi­ous post from 2019 when it was first added to the code. Before dy­namic di­la­tion, the user had to man­u­ally spec­ify a con­stant dis­tance by which every glyph’s bound­ing poly­gon would be ex­panded to en­sure that all par­tially cov­ered pix­els get ras­ter­ized. This has two dis­ad­van­tages: (a) if you choose a dis­tance that’s too small, then glyphs ren­dered be­low a cer­tain size start to have alias­ing ar­ti­facts along their boundaries, and (b) any cho­sen dis­tance will be too large for glyphs above a cer­tain size, leav­ing empty space that eats up per­for­mance for no rea­son.

Dynamic di­la­tion makes the op­ti­mal choice au­to­matic, and it is re­cal­cu­lated in the ver­tex shader every time a glyph is ren­dered. The tech­nique uses the cur­rent model-view-pro­jec­tion (MVP) ma­trix and view­port di­men­sions to de­ter­mine how far a ver­tex needs to be moved out­ward along its nor­mal di­rec­tion in ob­ject space to ef­fec­tively ex­pand the bound­ing polygon by half a pixel in view­port space. This guar­an­tees that the cen­ters of any par­tially cov­ered pix­els are in­side the bound­ing poly­gon so the ras­ter­izer will pick them up. When text is viewed in per­spec­tive, the di­la­tion dis­tance can be dif­fer­ent for each ver­tex. The code al­ways pro­duces the op­ti­mal value so that there’s never any un­nec­es­sary padding that wastes GPU re­sources.

The dy­namic di­la­tion cal­cu­la­tion done in the ver­tex shader is shown in the di­a­gram above, but I haven’t pro­vided a de­riva­tion of it any­where. So here we go. The goal is to find the dis­tance d we must move an ob­ject-space ver­tex po­si­tion \(\mathbf p = (p_x, p_y, 0, 1)\) along its nor­mal vec­tor \(\mathbf n = (n_x, n_y, 0, 0)\) for it to cor­re­spond to a half-pixel ex­pan­sion of the bound­ing poly­gon in view­port space. The nor­mal does not have unit length, but is in­stead scaled so that it would point to the new ver­tex lo­ca­tion if both ad­ja­cent sides of the bound­ing poly­gon were to be pushed out­ward by one unit of dis­tance, as shown in the di­a­gram. We first cal­cu­late the dis­tance d along the unit nor­mal di­rec­tion \(\hat{\mathbf n} = (\hat n_x, \hat n_y, 0)\) and then ap­ply that to the orig­i­nal nor­mal vec­tor n to ob­tain the new ver­tex po­si­tion \(\mathbf p + d\mathbf n\).

By ap­ply­ing the MVP ma­trix m (which is \(4 \times 4\)), the per­spec­tive di­vide, and the view­port scal­ing by its width w and height h to an object-space po­si­tion p off­set by the dis­tance d in the unit nor­mal di­rec­tion \(\hat{\mathbf n}\), we can ex­press dif­fer­ences \(\Delta x\) and \(\Delta y\) in view­port space as

If we set \((\Delta x)^2 + (\Delta y)^2 = (\frac{1}{2})^2\), then the off­set in view­port space is one-half pixel. We just need to solve this equa­tion for d, but it gets pretty messy. When we mul­ti­ply every­thing out, sim­plify as much as pos­si­ble, and write this as a qua­dratic equa­tion in d, we get

It is con­ve­nient to make the as­sign­ments \(s = m_{30}p_x + m_{31}p_y + m_{33}\) and \(t = m_{30}\hat n_x + m_{31}\hat n_y\), which let us write

fi­nally gives us the sim­pli­fied qua­dratic equa­tion

which has the so­lu­tions

Choosing the plus sign ob­tains the dis­tance out­ward along the unit nor­mal vec­tor that the ver­tex needs to be moved for a half-pixel di­la­tion. To make sure the glyph is still ren­dered at the orig­i­nal size, the em-space sam­pling co­or­di­nates also need to be off­set. A \(2 \times 2\) in­verse Jacobian ma­trix is stored with each ver­tex, and it gives us the in­for­ma­tion we need to trans­form an ob­ject-space dis­place­ment into an em-space off­set vec­tor. The Jacobian ma­trix, be­fore in­vert­ing, is the up­per-left \(2 \times 2\) por­tion of the trans­for­ma­tion ma­trix that con­verts em-space co­or­di­nates to ob­ject-space co­or­di­nates, ac­count­ing for scale, stretch, skew, and pos­si­bly flips of the co­or­di­nate axes.

I was granted a patent for the Slug al­go­rithm in 2019, and I legally have ex­clu­sive rights to it un­til the year 2038. But I think that’s too long. The patent has al­ready served its pur­pose well, and I be­lieve that hold­ing on to it any longer ben­e­fits no­body. Therefore, ef­fec­tive to­day, I am per­ma­nently and ir­rev­o­ca­bly ded­i­cat­ing the Slug patent to the pub­lic do­main. That means any­body can freely im­ple­ment the Slug al­go­rithm from this day for­ward with­out a li­cense for what­ever pur­pose they want, and they don’t need to worry about in­fring­ing upon any in­tel­lec­tual prop­erty rights. (For any le­gal ex­perts read­ing this, my com­pany has filed form SB/43 with the USPTO and paid the fee to dis­claim the ter­mi­nal part of the term for patent #10,373,352, effective March 17, 2026.)

To aid in im­ple­men­ta­tions of the Slug al­go­rithm, ref­er­ence ver­tex and pixel shaders based on the ac­tual code used in the Slug Library have been posted in a new GitHub repos­i­tory and made avail­able un­der the MIT li­cense. The pixel shader is a sig­nif­i­cant up­grade com­pared to the code in­cluded with the JCGT pa­per, and the ver­tex shader in­cludes dy­namic di­la­tion, which had not yet been implemented when the pa­per was pub­lished.

...

Read the original on terathon.com »

6 402 shares, 30 trendiness

Give Django your time and money, not your tokens

Spending your to­kens to sup­port Django by hav­ing an LLM work on tick­ets is not help­ful. You and the com­mu­nity are bet­ter off do­nat­ing that money to the Django Software Foundation in­stead.

We’re in a new era where peo­ple don’t have to type out all of their code. I used an LLM to build a good part of the new func­tion­al­ity in the djang­o­naut.space site. I know I would­n’t have shipped that much in that amount of time with­out us­ing an LLM.

But Django is dif­fer­ent. The level of qual­ity is much, much higher. This is be­cause it has a much larger user base, it changes slowly, and the com­mu­nity ex­pects it to be in use 20 years from now. It’s partly why it’s such an honor to have your name among the list of con­trib­u­tors.

This is­n’t about whether you use an LLM, it’s about whether you still un­der­stand what’s be­ing con­tributed. What I see now is peo­ple who are us­ing LLMs to gen­er­ate the code and write the PR de­scrip­tion and han­dle the feed­back from the PR re­view. It’s to the ex­tent where I can’t tell if there’d be a dif­fer­ence if the re­viewer had just used the LLM them­selves. And that is a big prob­lem.

If you do not un­der­stand the ticket, if you do not un­der­stand the so­lu­tion, or if you do not un­der­stand the feed­back on your PR, then your use of LLM is hurt­ing Django as a whole.

Django con­trib­u­tors want to help oth­ers, they want to cul­ti­vate com­mu­nity, and they want to help you be­come a reg­u­lar con­trib­u­tor. Before LLMs, this was eas­ier to sense be­cause you were lim­ited to com­mu­ni­cat­ing what you un­der­stood. With LLMs, it’s much eas­ier to com­mu­ni­cate a sense of un­der­stand­ing to the re­viewer, but the re­viewer does­n’t know if you ac­tu­ally un­der­stood it.

In this way, an LLM is a fa­cade of your­self. It helps you pro­ject un­der­stand­ing, con­tem­pla­tion, and growth, but it re­moves the trans­parency and vul­ner­a­bil­ity of be­ing a hu­man.

For a re­viewer, it’s de­mor­al­iz­ing to com­mu­ni­cate with a fa­cade of a hu­man.

This is be­cause con­tribut­ing to open source, es­pe­cially Django, is a com­mu­nal en­deavor. Removing your hu­man­ity from that ex­pe­ri­ence makes that en­deavor more dif­fi­cult. If you use an LLM to con­tribute to Django, it needs to be as a com­ple­men­tary tool, not as your ve­hi­cle.

Use an LLM to de­velop your com­pre­hen­sion. Then com­mu­ni­cate the best you can in your own words, then use an LLM to tweak that lan­guage. If you’re strug­gling to con­vey your ideas with some­one, use an LLM more ag­gres­sively and men­tion that you used it. This makes it eas­ier for oth­ers to see where your un­der­stand­ing is and where there are dis­con­nects.

There needs to be un­der­stand­ing when con­tribut­ing to Django. There’s no way around it. Django has been around for 20 years and ex­pects to be around for an­other 20. Any code be­ing added to a pro­ject with that out­look on longevity must be well un­der­stood.

There is no short­cut to un­der­stand­ing. If you want to con­tribute to Django, you will have to spend time read­ing, ex­per­i­ment­ing, and learn­ing. Contributing to Django will help you grow as a de­vel­oper.

While it is nice to be listed as a con­trib­u­tor to Django, the growth you earn from it is in­cred­i­bly more valu­able.

So please, stop us­ing an LLM to the ex­tent it hides you and your un­der­stand­ing. We want to know you, and we want to col­lab­o­rate with you.

...

Read the original on www.better-simple.com »

7 368 shares, 27 trendiness

FFmpeg

A new mi­nor re­lease, FFmpeg 8.1 Hoare”, is now avail­able for down­load. Here are some of the high­lights:

This re­lease fea­tures a lot of in­ter­nal changes and bug­fixes. The ground­work for the up­com­ing sws­cale rewrite is pro­gress­ing. The Vulkan com­pute-based codecs, and a few fil­ters, no longer de­pend on run­time GLSL com­pi­la­tion, which speeds up their ini­tial­iza­tion.

A com­pan­ion post about the Vulkan Compute-based codec im­ple­men­ta­tions has been pub­lished on the Khronos blog, fea­tur­ing tech­ni­cal de­tails on the im­ple­men­ta­tions and fu­ture plans.

We rec­om­mend users, dis­trib­u­tors, and sys­tem in­te­gra­tors to up­grade un­less they use cur­rent git mas­ter.

A new ma­jor re­lease, FFmpeg 8.0 Huffman”, is now avail­able for down­load. Thanks to sev­eral de­lays, and mod­ern­iza­tion of our en­tire in­fra­struc­ture, this re­lease ended up be­ing one of our largest re­leases to date. In short, its new fea­tures are:

A new class of de­coders and en­coders based on pure Vulkan com­pute im­ple­men­ta­tion have been added. Vulkan is a cross-plat­form, open stan­dard set of APIs that al­lows pro­grams to use GPU hard­ware in var­i­ous ways, from draw­ing on screen, to do­ing cal­cu­la­tions, to de­cod­ing video via cus­tom hard­ware ac­cel­er­a­tors. Rather than us­ing a cus­tom hard­ware ac­cel­er­a­tor pre­sent, these codecs are based on com­pute shaders, and work on any im­ple­men­ta­tion of Vulkan 1.3.

Decoders use the same hwac­cel API and com­mands, so users do not need to do any­thing spe­cial to en­able them, as en­abling Vulkan de­cod­ing is suf­fi­cient to use them.

Encoders, like our hard­ware ac­cel­er­ated en­coders, re­quire spec­i­fy­ing a new en­coder (ffv1_vulkan). Currently, the only codecs sup­ported are: FFv1 (encoding and de­cod­ing) and ProRes RAW (decode only). ProRes (encode+decode) and VC-2 (encode+decode) im­ple­men­ta­tions are com­plete and cur­rently in re­view, to be merged soon and avail­able with the next mi­nor re­lease.

Only codecs specif­i­cally de­signed for par­al­lelized de­cod­ing can be im­ple­mented in such a way, with more main­stream codecs not be­ing planned for sup­port.

Depending on the hard­ware, these new codecs can pro­vide very sig­nif­i­cant speedups, and open up pos­si­bil­i­ties to work with them for sit­u­a­tions like non-lin­ear video ed­i­tors and loss­less screen record­ing/​stream­ing, so we are ex­cited to learn what our down­stream users can make with them.

The pro­ject has re­cently started to mod­ern­ize its in­fra­struc­ture. Our mail­ing list servers have been fully up­graded, and we have re­cently started to ac­cept con­tri­bu­tions via a new forge, avail­able on code.ffm­peg.org, run­ning a Forgejo in­stance.

As usual, we rec­om­mend that users, dis­trib­u­tors, and sys­tem in­te­gra­tors to up­grade un­less they use cur­rent git mas­ter.

FFmpeg 7.1 Péter”, a new ma­jor re­lease, is now avail­able! A full list of changes can be found in the re­lease changelog.

The more im­por­tant high­lights of the re­lease are that the VVC de­coder, merged as ex­per­i­men­tal in ver­sion 7.0, has had enough time to ma­ture and be op­ti­mized enough to be de­clared as sta­ble. The codec is start­ing to gain trac­tion with broad­cast stan­dard­iza­tion bod­ies.

Support has been added for a na­tive AAC USAC (part of the xHE-AAC cod­ing sys­tem) de­coder, with the for­mat start­ing to be adopted by stream­ing web­sites, due to its ex­ten­sive vol­ume nor­mal­iza­tion meta­data.

MV-HEVC de­cod­ing is now sup­ported. This is a stereo­scopic cod­ing tool that be­gun to be shipped and gen­er­ated by re­cent phones and VR head­sets.

LC-EVC de­cod­ing, an en­hance­ment meta­data layer to at­tempt to im­prove the qual­ity of codecs, is now sup­ported via an ex­ter­nal li­brary.

Support for Vulkan en­cod­ing, with H264 and HEVC was merged. This fi­nally al­lows fully Vulkan-based de­code-fil­ter-en­code pipelines, by hav­ing a sink for Vulkan frames, other than down­load­ing or dis­play­ing them. The en­coders have fea­ture-par­ity with their VAAPI im­ple­men­ta­tion coun­ter­parts. Khronos has an­nounced that sup­port for AV1 en­cod­ing is also com­ing soon to Vulkan, and FFmpeg is aim­ing to have day-one sup­port.

In ad­di­tion to the above, this re­lease has had a lot of im­por­tant in­ter­nal work done. By far, the stand­out in­ter­nally are the im­prove­ments made for full-range im­ages. Previously, color range data had two paths, no ne­go­ti­a­tion, and was un­re­li­ably for­warded to fil­ters, en­coders, mux­ers. Work on clean­ing the sys­tem up started more than 10 years ago, how­ever this stalled due to how frag­ile the sys­tem was, and that break­ing be­hav­iour would be un­ac­cept­able. The new sys­tem fixes this, so now color range is for­warded cor­rectly and con­sis­tently every­where needed, and also laid the path for more ad­vanced forms of ne­go­ti­a­tion.

Cropping meta­data is now sup­ported with Matroska and MP4 for­mats. This meta­data is im­por­tant not only for archival, but also with AV1, as hard­ware en­coders re­quire its sig­nalling due to the codec not na­tively sup­port­ing one.

As usual, we rec­om­mend that users, dis­trib­u­tors, and sys­tem in­te­gra­tors to up­grade un­less they use cur­rent git mas­ter.

The num­ber of is­sues FFmpeg has in Coverity (a sta­tic an­a­lyzer) is now lower than it has been since 2016. Our de­fect den­sity is less than one 30th of the av­er­age in OSS with over a mil­lion code lines. All this was pos­si­ble thanks to a grant from the Sovereign Tech Fund.

FFmpeg now im­ple­ments a na­tive xHE-AAC de­coder. Currently, streams with­out (e)SBR, USAC or MPEG-H Surround are sup­ported, which means the ma­jor­ity of xHE-AAC streams in use should work. Support for USAC and (e)SBR is com­ing soon. Work is also on­go­ing to im­prove its sta­bil­ity and com­pat­i­bil­ity. During the process we found sev­eral spec­i­fi­ca­tion is­sues, which were then sub­mit­ted back to the au­thors for dis­cus­sion and po­ten­tial in­clu­sion in a fu­ture er­rata.

The FFmpeg com­mu­nity is ex­cited to an­nounce that Germany’s Sovereign Tech Fund

has be­come its first gov­ern­men­tal spon­sor. Their sup­port will help sus­tain the main­tainance of the FFmpeg pro­ject, a crit­i­cal open-source soft­ware mul­ti­me­dia com­po­nent es­sen­tial to bring­ing au­dio and video to bil­lions around the world every­day.

A new ma­jor re­lease, FFmpeg 7.0 Dijkstra”, is now avail­able for down­load. The most note­wor­thy changes for most users are a na­tive VVC de­coder (currently ex­per­i­men­tal, un­til more fuzzing is done), IAMF sup­port, or a multi-threaded ffm­peg CLI tool.

This re­lease is not back­wards com­pat­i­ble, re­mov­ing APIs dep­re­cated be­fore 6.0. The biggest change for most li­brary callers will be the re­moval of the old bit­mask-based chan­nel lay­out API, re­placed by the AVChannelLayout API al­low­ing such fea­tures as cus­tom chan­nel or­der­ing, or Ambisonics. Certain dep­re­cated ffm­peg

CLI op­tions were also re­moved, and a C11-compliant com­piler is now re­quired to build the code.

As usual, there is also a num­ber of new sup­ported for­mats and codecs, new fil­ters, APIs, and count­less smaller fea­tures and bug­fixes. Compared to 6.1, the git repos­i­tory con­tains al­most ∼2000 new com­mits by ∼100 au­thors, touch­ing >100000 lines in ∼2000 files — thanks to every­one who con­tributed. See the Changelog, APIchanges, and the git log for more com­pre­hen­sive lists of changes.

The libav­codec li­brary now con­tains a na­tive VVC (Versatile Video Coding) de­coder, sup­port­ing a large sub­set of the codec’s fea­tures. Further op­ti­miza­tions and sup­port for more fea­tures are com­ing soon. The code was writ­ten by Nuo Mi, Xu Mu, Frank Plowman, Shaun Loo, and Wu Jianhua.

The libav­for­mat li­brary can now read and write IAMF

(Immersive Audio) files. The ffm­peg CLI tool can con­fig­ure IAMF struc­ture with the new -stream_group op­tion. IAMF sup­port was writ­ten by James Almer.

Thanks to a ma­jor refac­tor­ing of the ffm­peg com­mand-line tool, all the ma­jor com­po­nents of the transcod­ing pipeline (demuxers, de­coders, fil­ters, en­codes, mux­ers) now run in par­al­lel. This should im­prove through­put and CPU uti­liza­tion, de­crease la­tency, and open the way to other ex­cit­ing new fea­tures.

Note that you should not ex­pect sig­nif­i­cant per­for­mance im­prove­ments in cases where al­most all com­pu­ta­tional time is spent in a sin­gle com­po­nent (typically video en­cod­ing).

FFmpeg 6.1 Heaviside”, a new ma­jor re­lease, is now avail­able! Some of the high­lights:

* com­mand sup­port in the setpts and asetpts fil­ters

* Bitstream fil­ter for con­vert­ing VVC from MP4 to Annex B

* sup­port for the P_SKIP hint­ing to speed up libx264 en­cod­ing

* ffm­peg CLI -top’ op­tion dep­re­cated in fa­vor of the set­field fil­ter

* ff­probe XML out­put schema changed to ac­count for mul­ti­ple vari­able-fields el­e­ments within the same par­ent el­e­ment

* ff­probe -output_format op­tion added as an alias of -of

This re­lease had been over­due for at least half a year, but due to con­stant ac­tiv­ity in the repos­i­tory, had to be de­layed, and we were fi­nally able to branch off the re­lease re­cently, be­fore some of the large changes sched­uled for 7.0 were merged.

Internally, we have had a num­ber of changes too. The FFT, MDCT, DCT and DST im­ple­men­ta­tion used for codecs and fil­ters has been fully re­placed with the faster libavu­til/​tx (full ar­ti­cle about it com­ing soon).

This also led to a re­duc­tion in the the size of the com­piled bi­nary, which can be no­tice­able in small builds.

There was a very large re­duc­tion in the to­tal amount of al­lo­ca­tions be­ing done on each frame through­out video de­coders, re­duc­ing over­head.

RISC-V op­ti­miza­tions for many parts of our DSP code have been merged, with mainly the large de­coders be­ing left.

There was an ef­fort to im­prove the cor­rect­ness of time­stamps and frame du­ra­tions of each packet, in­creas­ing the ac­cur­racy of vari­able frame rate video.

Next ma­jor re­lease will be ver­sion 7.0, sched­uled to be re­leased in February. We will at­tempt to bet­ter stick to the new re­lease sched­ule we an­nounced at the start of this year.

We strongly rec­om­mend users, dis­trib­u­tors, and sys­tem in­te­gra­tors to up­grade un­less they use cur­rent git mas­ter.

A few days ago, Vulkan-powered de­cod­ing hard­ware ac­cel­er­a­tion code was merged into the code­base. This is the first ven­dor-generic and plat­form-generic de­code ac­cel­er­a­tion API, en­abling the same code to be used on mul­ti­ple plat­forms, with very min­i­mal over­head. This is also the first multi-threaded hard­ware de­cod­ing API, and our code makes full use of this, sat­u­rat­ing all avail­able de­code en­gines the hard­ware ex­poses.

Those wish­ing to test the code can read our doc­u­men­ta­tion page. For those who would like to in­te­grate FFmpeg’s Vulkan code to de­mux, parse, de­code, and re­ceive a VkImage to pre­sent or ma­nip­u­late, doc­u­men­ta­tion and ex­am­ples are avail­able in our source tree. Currently, us­ing the lat­est avail­able git check­out of our repos­i­tory is re­quired. The func­tion­al­ity will be in­cluded in sta­ble branches with the re­lease of ver­sion 6.1, due to be re­leased soon.

As this is also the first prac­ti­cal im­ple­men­ta­tion of the spec­i­fi­ca­tions, bugs may be pre­sent, par­tic­u­larly in dri­vers, and, al­though pass­ing ver­i­fi­ca­tion, the im­ple­men­ta­tion it­self. New codecs, and en­cod­ing sup­port are also be­ing worked on, by both the Khronos or­ga­ni­za­tion for stan­dard­iz­ing, and us as im­ple­ment­ing it, and giv­ing feed­back on im­prov­ing.

A new ma­jor re­lease, FFmpeg 6.0 Von Neumann”, is now avail­able for down­load. This re­lease has many new en­coders and de­coders, fil­ters, ffm­peg CLI tool im­prove­ments, and also, changes the way re­leases are done. All ma­jor re­leases will now bump the ver­sion of the ABI. We plan to have a new ma­jor re­lease each year. Another re­lease-spe­cific change is that dep­re­cated APIs will be re­moved af­ter 3 re­leases, upon the next ma­jor bump. This means that re­leases will be done more of­ten and will be more or­ga­nized.

New de­coders fea­tured are Bonk, RKA, Radiance, SC-4, APAC, VQC, WavArc and a few ADPCM for­mats. QSV and NVenc now sup­port AV1 en­cod­ing. The FFmpeg CLI (we usu­ally ref­fer to it as ffm­peg.c to avoid con­fu­sion) has speed-up im­prove­ments due to thread­ing, as well as sta­tis­tics op­tions, and the abil­ity to pass op­tion val­ues for fil­ters from a file. There are quite a few new au­dio and video fil­ters, such as adrc, show­cwt, back­ground­key and ssim360, with a few hard­ware ones too. Finally, the re­lease fea­tures many be­hind-the-scenes changes, in­clud­ing a new FFT and MDCT im­ple­men­ta­tion used in codecs (expect a blog post about this soon), nu­mer­ous bug­fixes, bet­ter ICC pro­file han­dling and col­or­space sig­nalling im­prove­ment, in­tro­duc­tion of a num­ber of RISC-V vec­tor and scalar as­sem­bly op­ti­mized rou­tines, and a few new im­proved APIs, which can be viewed in the doc/​APIchanges file in our tree. A few sub­mit­ted fea­tures, such as the Vulkan im­prove­ments and more FFT op­ti­miza­tions will be in the next mi­nor re­lease, 6.1, which we plan to re­lease soon, in line with our new re­lease sched­ule. Some high­lights are:

* ffm­peg now re­quires thread­ing to be built

* ffm­peg now runs every muxer in a sep­a­rate thread

* Add new mode to cropde­tect fil­ter to de­tect crop-area based on mo­tion vec­tors and edges

* VAAPI de­cod­ing and en­cod­ing for 10/12bit 422, 10/12bit 444 HEVC and VP9

* QSV de­cod­ing and en­cod­ing for 10/12bit 422, 10/12bit 444 HEVC and VP9

* fil­ter­graph syn­tax in ffm­peg CLI now sup­ports pass­ing file con­tents as op­tion val­ues

We strongly rec­om­mend users, dis­trib­u­tors, and sys­tem in­te­gra­tors to up­grade un­less they use cur­rent git mas­ter.

FFmpeg 5.1 Riemann”, a new ma­jor re­lease, is now avail­able! Some of the high­lights:

We strongly rec­om­mend users, dis­trib­u­tors, and sys­tem in­te­gra­tors to up­grade un­less they use cur­rent git mas­ter.

FFmpeg 5.0 Lorentz”, a new ma­jor re­lease, is now avail­able! For this long-over­due re­lease, a ma­jor ef­fort un­der­went to re­move the old en­code/​de­code APIs and re­place them with an N:M-based API, the en­tire libavre­sam­ple li­brary was re­moved, lib­sws­cale has a new, eas­ier to use AVframe-based API, the Vulkan code was much im­proved, many new fil­ters were added, in­clud­ing lib­placebo in­te­gra­tion, and fi­nally, DoVi sup­port was added, in­clud­ing tonemap­ping and re­mux­ing. The de­fault AAC en­coder set­tings were also changed to im­prove qual­ity. Some of the changelog high­lights:

We strongly rec­om­mend users, dis­trib­u­tors, and sys­tem in­te­gra­tors to up­grade un­less they use cur­rent git mas­ter.

We have a new IRC home at Libera Chat now! Feel free to join us at #ffmpeg and #ffmpeg-devel. More info at con­tact#IR­C­Cha­n­nels

FFmpeg 4.4 Rao”, a new ma­jor re­lease, is now avail­able! Some of the high­lights:

We strongly rec­om­mend users, dis­trib­u­tors, and sys­tem in­te­gra­tors to up­grade un­less they use cur­rent git mas­ter.

FFmpeg 4.3 4:3”, a new ma­jor re­lease, is now avail­able! Some of the high­lights:

* switch from AvxSynth to AviSynth+ on Linux

* Support for mux­ing pcm and pgs in m2ts

We strongly rec­om­mend users, dis­trib­u­tors, and sys­tem in­te­gra­tors to up­grade un­less they use cur­rent git mas­ter.

Note that this fil­ter is not FDA ap­proved, nor are we med­ical pro­fes­sion­als. Nor has this fil­ter been tested with any­one who has pho­to­sen­si­tive epilepsy. FFmpeg and its pho­to­sen­si­tiv­ity fil­ter are not mak­ing any med­ical claims.

That said, this is a new video fil­ter that may help pho­to­sen­si­tive peo­ple watch tv, play video games or even be used with a VR head­set to block out epiletic trig­gers such as fil­tered sun­light when they are out­side. Or you could use it against those an­noy­ing white flashes on your tv screen. The fil­ter fails on some in­put, such as the Incredibles 2 Screen Slaver

scene. It is not per­fect. If you have other clips that you want this fil­ter to work bet­ter on, please re­port them to us on our trac.

See for your­self. Example was made with -vf pho­to­sen­si­tiv­ity=20:0.8

We are not pro­fes­sion­als. Please use this in your med­ical stud­ies to ad­vance epilepsy re­search. If you de­cide to use this in a med­ical set­ting, or make a hard­ware hdmi in­put out­put re­al­time tv fil­ter, or find an­other use for this, please let me know. This fil­ter was a fea­ture re­quest of mine since 2013.

FFmpeg 4.2 Ada”, a new ma­jor re­lease, is now avail­able! Some of the high­lights:

* Support de­cod­ing of HEVC 4:4:4 con­tent in nvdec and cu­vid­dec

* mov muxer writes tracks with un­spec­i­fied lan­guage in­stead of English by de­fault

* added sup­port for us­ing clang to com­pile CUDA ker­nels

We strongly rec­om­mend users, dis­trib­u­tors, and sys­tem in­te­gra­tors to up­grade un­less they use cur­rent git mas­ter.

FFmpeg 4.1 al-Khwarizmi”, a new ma­jor re­lease, is now avail­able! Some of the high­lights:

* Support for AV1 in MP4 and Matroska/WebM

We strongly rec­om­mend users, dis­trib­u­tors, and sys­tem in­te­gra­tors to up­grade un­less they use cur­rent git mas­ter.

FFmpeg 4.0 Wu”, a new ma­jor re­lease, is now avail­able! Some of the high­lights:

* Bitstream fil­ters for edit­ing meta­data in H.264, HEVC and MPEG-2 streams

* Dropped sup­port for build­ing for Windows XP. The min­i­mum sup­ported Windows ver­sion is Windows Vista.

* Removed the ff­menc and ffmdec muxer and de­muxer

We strongly rec­om­mend users, dis­trib­u­tors, and sys­tem in­te­gra­tors to up­grade un­less they use cur­rent git mas­ter.

FFmpeg 3.4 Cantor”, a new ma­jor re­lease, is now avail­able! Some of the high­lights:

* sup­port for de­cod­ing through D3D11VA in ffm­peg

We strongly rec­om­mend users, dis­trib­u­tors, and sys­tem in­te­gra­tors to up­grade un­less they use cur­rent git mas­ter.

FFmpeg 3.3 Hilbert”, a new ma­jor re­lease, is now avail­able! Some of the high­lights:

* con­fig­ure now fails if au­tode­tect-li­braries are re­quested but not found

We strongly rec­om­mend users, dis­trib­u­tors, and sys­tem in­te­gra­tors to up­grade un­less they use cur­rent git mas­ter.

This has been a long time com­ing but we wanted to give a proper clo­sure to our par­tic­i­pa­tion in this run of the pro­gram and it takes time. Sometimes it’s just to get the fi­nal re­port for each pro­ject trimmed down, oth­ers, is fi­nal­iz­ing what­ever was still in progress when the pro­gram fin­ished: fi­nal patches need to be merged, TODO lists sta­bi­lized, fu­ture plans agreed; you name it.

Without fur­ther ado, here’s the sil­ver-lin­ing for each one of the pro­jects we sought to com­plete dur­ing this Summer of Code sea­son:

Stanislav Dolganov de­signed and im­ple­mented ex­per­i­men­tal sup­port for mo­tion es­ti­ma­tion and com­pen­sa­tion in the loss­less FFV1 codec. The de­sign and im­ple­men­ta­tion is based on the snow video codec, which uses OBMC. Stanislav’s work proved that sig­nif­i­cant com­pres­sion gains can be achieved with in­ter frame com­pres­sion. FFmpeg wel­comes Stanislav to con­tinue work­ing be­yond this proof of con­cept and bring its ad­vances into the of­fi­cial FFV1 spec­i­fi­ca­tion within the IETF.

Petru Rares Sincraian added sev­eral self-tests to FFmpeg and suc­cess­fully went through the in-some-cases te­dious process of fine tun­ing tests pa­ra­me­ters to avoid known and hard to avoid prob­lems, like check­sum mis­matches due to round­ing er­rors on the myr­iad of plat­forms we sup­port. His work has im­proved the code cov­er­age of our self tests con­sid­er­ably.

...

Read the original on ffmpeg.org »

8 297 shares, 16 trendiness

If you thought the speed of writing code was your problem - you have bigger problems

It’s Tuesday morn­ing. Your VP of Engineering is stand­ing in front of a slide deck, vi­brat­ing with the kind of ex­cite­ment usu­ally re­served for peo­ple who just dis­cov­ered cryp­tocur­rency in 2017. They’ve just come back from a con­fer­ence. Or maybe a ven­dor din­ner. Three glasses of pinot noir and a demo, and now they have news.

The room does that thing where half the peo­ple nod along and the other half de­velop a sud­den in­ter­est in their lap­tops. Your staff en­gi­neer is do­ing that face. You know the face - it’s the one where they’re cal­cu­lat­ing whether to say some­thing or just up­date their LinkedIn later.

Nobody asks the ques­tion that mat­ters, which is: ve­loc­ity to­ward what, ex­actly?

Because here’s what just hap­pened. Your VP looked at your en­tire soft­ware de­liv­ery or­gan­i­sa­tion, iden­ti­fied the one thing that was al­ready pretty fast, and de­cided to make it faster. They found a sta­tion on the as­sem­bly line that was not the bot­tle­neck, and threw money at it.

If you know any­thing about how sys­tems work, you know this does­n’t just fail to help. It makes every­thing ac­tively worse.

In 1984, Eli Goldratt wrote The Goal, a novel about man­u­fac­tur­ing that has no busi­ness be­ing as rel­e­vant to soft­ware as it is. It’s also the most use­ful busi­ness book you’ll ever read that’s tech­ni­cally fic­tion, which is al­most the ex­act op­po­site of most KPI frame­works.

The core idea is the Theory of Constraints, and it goes like this:

Every sys­tem has ex­actly one con­straint. One bot­tle­neck. The through­put of your en­tire sys­tem is de­ter­mined by the through­put of that bot­tle­neck. Nothing else mat­ters un­til you fix the bot­tle­neck.

That’s the part most peo­ple get. Here’s the part they don’t, and it’s the part that should scare you:

Think about it me­chan­i­cally. If sta­tion A pro­duces wid­gets faster but sta­tion B (the bot­tle­neck) can still only process them at the same rate, all you’ve done is cre­ate a pile of un­fin­ished wid­gets be­tween A and B. Inventory goes up. Lead time goes up. The peo­ple at sta­tion B are now drown­ing. The pile cre­ates con­fu­sion about what to work on next. Quality tanks be­cause every­one’s triag­ing in­stead of think­ing.

I bet some of you are al­ready liv­ing this. I’ve lived it. It sucked.

Your de­vel­op­ers are pro­duc­ing PRs faster than ever. Great. Wonderful. Gold star. Someone get the con­fetti can­non. Now those PRs hit the re­view queue, and your re­view­ers haven’t tripled. Nobody tripled the re­view­ers. Nobody even thought about the re­view­ers, be­cause the re­view­ers weren’t in the ven­dor’s slide deck.

So PRs sit. A day. Two days. A week. The au­thor has con­text-switched to their next AI-assisted fea­ture and can barely re­mem­ber what the first one did by the time re­view com­ments land. Can you ex­plain what this func­tion does?” they ask, star­ing at code they wrote eight days ago, which in de­vel­oper mem­ory is roughly the Jurassic pe­riod.

Reviews start get­ting rub­ber-stamped be­cause there are sim­ply too damn many of them to re­view prop­erly. Someone ap­proves a PR they did­n’t re­ally read. We’ve all done it (don’t look at me like that). It merges. CI takes 45 min­utes, fails on a flaky test, gets re-run, passes on the sec­ond at­tempt (the flaky test is fine, it’s al­ways fine, un­til it is­n’t and you’re de­bug­ging pro­duc­tion at 2am on a Saturday in your un­der­wear won­der­ing where your life went wrong. Ask me how I know… ac­tu­ally, don’t). The de­ploy pipeline re­quires a man­ual ap­proval from some­one who’s in a meet­ing about meet­ings. The fea­ture sits in stag­ing for three days be­cause no­body owns the get it to pro­duc­tion” step with any ur­gency.

Meanwhile, the de­vel­oper has al­ready shipped two more PRs. The queue grows. WIP goes through the roof. Everyone has six things in flight and noth­ing ac­tu­ally done. Cycle time (the thing that ac­tu­ally mea­sures how fast you de­liver value to users) gets worse.

You are pro­duc­ing more code and ship­ping less soft­ware. You have made your sit­u­a­tion mea­sur­ably, demon­stra­bly worse, and you have a dash­board that says pro­duc­tiv­ity is up 40%.

I have seen this ex­act movie play out at three dif­fer­ent com­pa­nies. The dash­board goes up. The ship­ping goes down. And no­body con­nects the two be­cause the dash­board is the thing they’re re­port­ing to the board, and the board does­n’t know what cy­cle time is, and no­body wants to be the per­son who ex­plains it.

And here’s the bit that re­ally keeps me up at night: a lot of this AI-generated code? Nobody fully un­der­stands it. The per­son who wrote” it did­n’t re­ally write it. They prompted it, skimmed it, maybe ran it once. When it breaks in pro­duc­tion at 2am, the per­son on-call did­n’t write it and the per­son who prompted it can’t ex­plain it. You’ve just in­creased the sur­face area for in­ci­dents while de­creas­ing the num­ber of hu­mans who can rea­son about the sys­tem.

If it’s not writ­ing code (and it al­most never is), then where should you be look­ing? Walk the value stream. Follow a fea­ture from someone had an idea” to a user got value from it.” I promise the bot­tle­neck will jump out and wave at you - it might even flip you off be­cause you’ve been ig­nor­ing it.

This is the one no­body wants to talk about be­cause it’s em­bar­rass­ing. Your PM has­n’t talked to a real user in two months. Your re­quire­ments ar­rive as a Jira ticket with three sen­tences and a Figma link to a de­sign that was ap­proved by some­one who’s never used the prod­uct. Your en­gi­neers are mak­ing fifty mi­cro-de­ci­sions a day about be­hav­iour, edge cases, and er­ror han­dling that no­body spec­i­fied, be­cause no­body thought about them.

I once watched a team spend six weeks build­ing a fea­ture based on a Slack mes­sage from a sales rep who para­phrased what a prospect maybe said on a call. Six weeks. The prospect did­n’t even end up buy­ing. The fea­ture got used by eleven peo­ple, and nine of them were in­ter­nal QA. That’s not a de­liv­ery prob­lem. That’s an oh fuck, what are we even do­ing” prob­lem.

When you speed up code out­put in this en­vi­ron­ment, you are speed­ing up the rate at which you build the wrong thing. You have au­to­mated the guess­ing. You will build the wrong fea­ture faster, ship it, watch it fail, and then do a retro where some­one says we need to talk to users more” and every­one nods solemnly and then ab­solutely noth­ing changes.

I put done” in quotes be­cause in most orgs, code be­ing writ­ten is maybe 20% of the jour­ney. The other 80% is your code sit­ting in var­i­ous queues, slowly age­ing, like a for­got­ten sand­wich in the of­fice fridge.

I’ve watched fea­tures where the code took an af­ter­noon and it took two months to reach pro­duc­tion. Two. Months. The code did­n’t get slower. Everyone around the code got in its way.

PR re­view. CI. Staging. QA. Security re­view. Product sign-off. Deploy win­dow. Canary roll­out. The ac­tual pipeline of get­ting code from a de­vel­op­er’s branch to a user’s screen is a long se­ries of hand­offs, waits, and queues. Most of the time, your code is sit­ting still. Waiting for a hu­man to look at it. Waiting for a pipeline to run. Waiting for some­one to give it per­mis­sion to ex­ist.

If you’ve ever watched a PR ap­proval come through at 4:55pm on a Friday and thought well, that’s ship­ping on Monday I guess,” you know ex­actly what I’m talk­ing about.

If you want to ship faster, look at where things are wait­ing. Count the hours of ac­tual work ver­sus the hours of sit­ting in a queue. I guar­an­tee the ra­tio will make you want to put your head through a wall.

I can’t count the num­ber of teams I’ve worked with that were scared to de­ploy. Tests are flaky, ob­serv­abil­ity is a mess, no­body trusts the ca­nary process, and the last time some­one de­ployed on a Thursday it ru­ined every­one’s week­end. So what do they do? They batch changes into big­ger re­leases. Which are riskier. Which makes de­ploys scarier. Which makes every­one batch more.

Now add faster code out­put to this en­vi­ron­ment. More code, same ter­ri­fied de­ploy cul­ture. The batches get big­ger. The risk gets higher. The re­leases get less fre­quent. You have given a team that was al­ready scared of ship­ping even more rea­sons to not ship. Incredible work.

This one pairs with you don’t know what to build” be­cause it’s the same dis­ease on the other end of the pipeline. You built the thing. You shipped the thing. And then… noth­ing. No an­a­lyt­ics worth look­ing at. No user in­ter­views af­ter launch. Nobody cir­cling back to check whether the fea­ture ac­tu­ally solved the prob­lem it was sup­posed to solve.

So you guess on the next fea­ture too. And the one af­ter that. The en­tire prod­uct roadmap is a se­ries of ed­u­cated guesses with no feed­back be­tween them.

You ar­rive at we have no idea if this worked” more of­ten, learn noth­ing each time, and some­how call that ve­loc­ity.

Sometimes the bot­tle­neck is­n’t tech­ni­cal at all. It’s the meet­ing you need to get a de­ci­sion. The three teams who need to agree on an API con­tract but haven’t talked to each other in a month. The ar­chi­tect who’s a sin­gle point of ap­proval on every sig­nif­i­cant de­sign choice and has a two-week back­log be­cause ap­par­ently we built a sys­tem where one per­son’s cal­en­dar is a load-bear­ing wall. Or my per­sonal favourite: the plan­ning process that takes six weeks, runs quar­terly, and means you can’t start work­ing on some­thing ur­gent for an­other five weeks be­cause it was­n’t in the plan.”

Not a tech­ni­cal prob­lem. Not a code prob­lem. A cal­en­dar prob­lem. We spent more time talk­ing about the fea­ture than build­ing it. At one point some­one sug­gested we have a meet­ing to dis­cuss the meet­ing. I wish I was jok­ing. Now I need a shower, and some whiskey.

Writing code faster does pre­cisely noth­ing for any of this. Zero. Your bot­tle­neck is the org chart, and no amount of Copilot is go­ing to refac­tor that.

You knew this sec­tion was com­ing. The bor­ing bit. I’m not go­ing to pre­tend this is glam­orous, be­cause it is­n’t. Nobody’s go­ing to write a LinkedIn post about it. Nobody’s go­ing to give a keynote about it at a ven­dor con­fer­ence. There’s no swag.

Map your value stream. Literally fol­low a fea­ture from idea to pro­duc­tion. Write down every step. Write down how long each step takes. Write down how long things sit be­tween steps. The gap be­tween steps is where your cy­cle time lives. This will be de­press­ing. Do it any­way. Bring snacks.

Measure cy­cle time, not out­put. If you’re mea­sur­ing lines of code, PRs merged, or story points de­liv­ered” and not mea­sur­ing how long it takes from com­mit to pro­duc­tion to users us­ing it, you’re op­ti­mis­ing for the wrong thing. You’re count­ing wid­gets at sta­tion A and ig­nor­ing the pile on the floor. Stop it. I mean it.

Find the wait states and kill them. If PRs wait two days for re­view, fix re­view. Pair pro­gram­ming, smaller PRs, ded­i­cated re­view time, async re­view norms, what­ever works for your team. If de­ploys wait for a man­ual ap­proval, au­to­mate it or at least make it a Slack but­ton in­stead of a cal­en­dar in­vite. If de­ci­sions wait for a meet­ing, make smaller de­ci­sions that don’t need meet­ings.

Stop start­ing and start fin­ish­ing. WIP lim­its ex­ist for a rea­son. It’s bet­ter to have three things done than ten things in progress. Every item in flight is con­text-switch­ing tax, and con­text-switch­ing is where good en­gi­neers go to slowly lose their minds and start writ­ing man­i­festos on in­ter­nal wikis that no­body reads.

Talk to the peo­ple do­ing the work. Your de­vel­op­ers al­ready know where the bot­tle­neck is. They com­plain about it in standups. They’ve been mak­ing memes about it in Slack for months. They just as­sumed no­body was lis­ten­ing, and hon­estly? They were prob­a­bly right.

Go back to that Tuesday morn­ing. Your VP is up there with their slide about 40% more code out­put. What they should have said, what would have ac­tu­ally been use­ful, is this: We did a value stream analy­sis and found that fea­tures spend an av­er­age of nine days wait­ing be­tween steps. We’re go­ing to cut that in half.”

That’s not sexy. It does­n’t fit on a ven­dor’s slide deck. You can’t sell it as a prod­uct. There’s prob­a­bly no con­fer­ence talk in it (actually, this is giv­ing me ideas…). But it’s the thing that would ac­tu­ally make you ship faster.

The speed of writ­ing code was never your prob­lem. If you thought it was, the gap be­tween that be­lief and re­al­ity is where all your ac­tual prob­lems live. The com­pet­i­tive ad­van­tage does­n’t go to the team that writes code fastest. It goes to the team that fig­ured out what to build, built it, and got it into users’ hands while every­one else was still drown­ing in a re­view queue full of AI-generated PRs that no­body has the time or the en­ergy to read.

...

Read the original on debuggingleadership.com »

9 295 shares, 13 trendiness

- YouTube

...

Read the original on www.youtube.com »

10 286 shares, 36 trendiness

Python 3.15’s JIT is now back on track

Python 3.15’s JIT is now back on track

(JIT per­for­mance as of 17 March (PST). Lower is bet­ter ver­sus in­ter­preter. Image cred­its to https://​doesjit­go­b­rrr.com/).

Great news—we’ve hit our (very mod­est) per­for­mance goals for the CPython JIT over a year early for ma­cOS AArch64, and a few months early for x86_64 Linux. The 3.15 al­pha JIT is about 11-12% faster on ma­cOS AArch64 than the tail call­ing in­ter­preter, and 5-6% faster than the stan­dard in­ter­preter on x86_64 Linux. These num­bers are geo­met­ric means and are pre­lim­i­nary. The ac­tual range is some­thing like a 20% slow­down to over 100% speedup (ignoring the un­pack­_se­quence mi­crobench­mark). We don’t have proper free-thread­ing sup­port yet, but we’re aim­ing for that in 3.15/3.16. The JIT is now back on track.

I can­not over­state how tough this was. There was a point where I was se­ri­ously won­der­ing if the JIT pro­ject would ever pro­duce mean­ing­ful speedups. To re­cap, the orig­i­nal CPython JIT had prac­ti­cally no speedups: 8 months ago I posted a JIT re­flec­tions ar­ti­cle on how the orig­i­nal CPython JIT in 3.13 and 3.14 was of­ten slower than the in­ter­preter. That was also around the time where the Faster CPython team lost fund­ing by its main spon­sor. I’m a vol­un­teer so this did­n’t af­fect me, but more im­por­tantly it did af­fect my friends work­ing there, and at a point of time it seemed the JITs fu­ture was un­cer­tain.

So what changed from 3.13 and 3.14? I’m not go­ing to give some heroic tale of how we res­cued the JIT from the jaws of fail­ure through our acu­men. I hon­estly at­tribute a lot of our cur­rent suc­cess to luck—right time, right place, right peo­ple, right bets. I se­ri­ously don’t think this would’ve been pos­si­ble if a sin­gle one of the core JIT con­trib­u­tors: Savannah Ostrowski, Mark Shannon, Diego Russo, Brandt Bucher, and me were not in the pic­ture. To not ex­clude the other ac­tive JIT con­trib­u­tors, I will also name a few more peo­ple: Hai Zhu, Zheaoli, Tomas Roun, Reiden Ong, Donghee Na, and I am prob­a­bly miss­ing a few more.

I’m go­ing to cover a lesser talked about part of a JIT: the peo­ple, and a bit of luck. If you want the tech­ni­cal de­tails of how we did it, it’s here

The Faster CPython team lost its main spon­sor in 2025. I im­me­di­ately raised the idea of com­mu­nity stew­ard­ship. At the time, I was pretty un­cer­tain this would work. JIT pro­jects are not known to be good for new con­trib­u­tors. It his­tor­i­cally re­quires a lot of prior ex­per­tise.

At the CPython core sprint in Cambridge, the JIT core team met, and we wrote a plan for a 5% faster JIT by 3.15 and a 10% faster JIT by 3.16, with free-thread­ing sup­port. A side note, which was less head­line grab­bing, but vi­tal to the health of the pro­ject: was to de­crease the bus fac­tor. We wanted 2 ac­tive main­tain­ers in all 3 stages of the JIT; fron­tend (region se­lec­tor), mid­dle-end (optimizer), back­end (code gen­er­a­tor).

Previously, the JIT only had 2 ac­tive re­cur­rent con­trib­u­tors mid­dle-end. Today, the JIT has 4 ac­tive re­cur­rent con­trib­u­tors to the mid­dle-end, and I would con­sider the 2 non-core de­vel­op­ers (Hai Zhu and Reiden) ca­pa­ble and val­ued mem­bers.

What worked in at­tract­ing peo­ple were the usual soft­ware en­gi­neer­ing prac­tices: break­ing com­plex prob­lems down into man­age­able parts. Brandt started this ear­lier in 3.14, where he opened mul­ti­ple mega-is­sues that split op­ti­miz­ing the JIT into sim­ple tasks. E.g. we would say try op­ti­miz­ing a sin­gle in­struc­tion in the JIT. I took Brandt’s idea and did this for 3.15. Luckily, I had an eas­ier job as my is­sue in­volved con­vert­ing the in­ter­preter in­struc­tions to an eas­ily op­ti­mize­able form. To en­cour­age new con­trib­u­tors, I also laid out very de­tailed in­struc­tions that were im­me­di­ately ac­tion­able. I also clearly de­mar­cated units of work. I sus­pect that did help, as we have 11 con­trib­u­tors (including me) work­ing on that is­sue, con­vert­ing nearly the whole of the in­ter­preter to some­thing more JIT-optimizer friendly. The core was that the JIT could be bro­ken down from an opaque blob to some­thing that a C pro­gram­mer with no JIT ex­pe­ri­ence could con­tribute to.

Other things that worked: en­cour­ag­ing peo­ple, cel­e­brat­ing achieve­ments big or small. Every JIT PR had a clear out­come, which I sus­pect gave peo­ple a sense of di­rec­tion.

The com­mu­nity op­ti­miza­tion ef­forts paid off. The JIT went from 1% faster on x86_64 Linux to 3-4% faster (see the blue line be­low) over that time pe­riod:

Again, I at­tribute a lot of this to luck, but dur­ing the CPython core sprints in Cambridge, Brandt nerd-sniped me to rewrite the JIT fron­tend to a trac­ing one. I ini­tially did­n’t like the idea, but as a friendly form of spite-dri­ven-de­vel­op­ment, I thought I’d rewrite it just to prove to him it did­n’t work.

The ini­tial pro­to­type worked in 3 days, how­ever it took a month to get it JITting prop­erly with­out fail­ing the test suite. The ini­tial re­sults were dis­mal—about 6% slower on x86_64 Linux. I was about to ditch the idea, un­til a lucky ac­ci­dent hap­pened: I mis­in­ter­pert­ered a sug­ges­tion given by Mark.

Mark had sug­gested ear­lier to thread the dis­patch table through the in­ter­preter, thus hav­ing two dis­patch ta­bles in the in­ter­preter (one nor­mal in­ter­preter, and one for trac­ing). Mark sug­gested we should have the trac­ing table be trac­ing ver­sions of nor­mal in­struc­tions. However, I mis­un­der­stood and came up with an even more ex­treme ver­sion: in­stead of trac­ing ver­sions of nor­mal in­struc­tions, I had only one in­struc­tion re­spon­si­ble for trac­ing, and all in­struc­tions in the sec­ond table point to that. Yes I know this part is con­fus­ing, I’ll hope­fully try to ex­plain bet­ter one day. This turned out to be a re­ally re­ally good choice. I found that the ini­tial dual table ap­proach was so much slower due to a dou­bling of the size of the in­ter­preter, caus­ing huge com­piled code bloat, and nat­u­rally a slow­down. By us­ing only a sin­gle in­struc­tion and two ta­bles, we only in­crease the in­ter­preter by a size of 1 in­struc­tion, and also keep the base in­ter­preter ul­tra fast. I af­fec­tion­ally call this mech­a­nism dual dis­patch.

There’s a lot more that went into the de­sign of the trace record­ing in­ter­preter. I’m toot­ing my own horn here, but I truly think it’s a mini work of art. It took me 1 week to it­er­ate on the in­ter­preter un­til it was over­all faster. It went from 6% slower to roughly no speedup af­ter us­ing dual dis­patch. After that, I stamped out a bunch of slow edge cases in the trac­ing in­ter­preter to even­tu­ally make it 1.x% faster. Tracing the in­ter­preter it­self is only 3-5x slower by my own es­ti­ma­tions than the spe­cial­iz­ing in­ter­preter. Key to this is that it re­spects all nor­mal be­hav­ior of the spe­cial­iz­ing in­ter­preter and mostly does­n’t in­tef­ere with it.

Just to give you an idea of how much trace record­ing mat­tered: it in­creased the JIT code cov­er­age by 50%. This means all fu­ture op­ti­miza­tions would likely have been around 50% less ef­fec­tive (assuming all code ex­e­cutes the same, which of course is­n’t true, just bear with me please :).

So I have to thank Brandt and Mark for lead­ing me to stum­ble upon such a nice so­lu­tion.

The other lucky bet we made early on was to try ref­er­ence count elim­i­na­tion. This, again, was work orig­i­nally by Matt Page done in CPython byte­code op­ti­mizer (more de­tails in pre­vi­ous blog post on op­ti­miza­tion). I no­ticed that there was still a branch left in the JITted code per ref­er­ence count decre­ment even with the byte­code op­ti­mizer work. I thought: why not try elim­i­nat­ing the branch”, and had no clue how much it would help. It turns out a sin­gle branch is ac­tu­ally quite ex­pen­sive and these add up over time. Especially if it’s >=1 branch for every sin­gle Python in­struc­tion!

The other lucky part is how easy this was to par­al­lelize and how great it was a tool to teach peo­ple about the in­ter­preter and JIT. This was the main op­ti­miza­tion that we di­rected peo­ple to work on in the Python 3.15 JIT. Although it was a mostly man­ual refac­tor­ing process, it taught peo­ple the key parts they needed to learn about the JIT with­out over­hwhelm­ing them.

We have a great in­fra­struc­ture team. I say this partly in jest, be­cause it’s one per­son. In re­al­ity, our team” is cur­rently 4 ma­chines run­ning in Savannah’s closet. Nevertheless Savannah has done the work equiv­a­lent of an en­tire in­fra­struc­ture team for the JIT. The JIT could not have pro­gressed so quickly if we had noth­ing to re­port our per­for­mance num­bers. Daily JIT runs have been a game changer in the feed­back loop. It helped us catch re­gres­sions in JIT per­for­mance, and lets us know our op­ti­miza­tions ac­tu­ally work.

Mark is tech­ni­cally ex­cel­lent, and I think he knows the Internet gives him too much praise al­ready so I’m not go­ing to say any­thing more here :).

Diego is also great. He’s re­spon­si­ble for the JIT on ARM hard­ware, and also has re­cently started work on mak­ing the JIT friendly to pro­fil­ers. I can­not over­state how hard of a prob­lem this is.

Brandt laid the orig­i­nal foun­da­tion for our ma­chine code back­end, with­out which we’d have new con­trib­u­tors writ­ing as­sem­bler, which prob­a­bly would’ve put more peo­ple off.

I also want to en­cour­age the idea of talk­ing to peo­ple and shar­ing ideas.

A shoutout to CF Bolz-Tereick, who taught me a lot about PyPy. I spent a few months look­ing at PyPy’s source code, and I be­lieve this made me a bet­ter JIT de­vel­oper over­all. CF was very help­ful when I needed help.

I’m also part of a friendly com­piler chat with Max Bernstein, with­out which I’d likely have lost mo­ti­va­tion for this a long time ago. Max is a pro­lific writer, and a friendly com­piler per­son.

Ideas don’t ex­ist in a silo. I sus­pect I be­came bet­ter at writ­ing JITs thanks to hang­ing out with a bunch of com­piler peo­ple for some time. At the very least, look­ing at PyPy has broad­ened my view!

People are im­por­tant, and with some luck, JIT go brrr.

...

Read the original on fidget-spinner.github.io »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.