10 interesting stories served every morning and every evening.




1 589 shares, 48 trendiness

Hormuz Minesweeper

...

Read the original on hormuz.pythonic.ninja »

2 433 shares, 22 trendiness

The Three Pillars of JavaScript Bloat

Over the last cou­ple of years, we’ve seen sig­nif­i­cant growth of the e18e com­mu­nity and a rise in per­for­mance fo­cused con­tri­bu­tions be­cause of it. A large part of this is the cleanup” ini­tia­tive, where the com­mu­nity has been prun­ing pack­ages which are re­dun­dant, out­dated, or un­main­tained.

One of the most com­mon top­ics that comes up as part of this is dependency bloat” - the idea that npm de­pen­dency trees are get­ting larger over time, of­ten with long since re­dun­dant code which the plat­form now pro­vides na­tively.

In this post, I want to briefly look at what I think are the three main types of bloat in our de­pen­dency trees, why they ex­ist, and how we can start to ad­dress them.

The graph above is a com­mon sight in many npm de­pen­dency trees - a small util­ity func­tion for some­thing which seems like it should be na­tively avail­able, fol­lowed by many sim­i­larly small deep de­pen­den­cies.

So why is this a thing? Why do we need is-string in­stead of typeof checks? Why do we need ha­sown in­stead of Object.hasOwn (or Object.prototype.hasOwnProperty)? Three things:

Support for very old en­gines

Somewhere in the world, some peo­ple ap­par­ently ex­ist who need to sup­port ES3 - think IE6/7, or ex­tremely early ver­sions of Node.js.

For these peo­ple, much of what we take for granted to­day does not ex­ist. For ex­am­ple, they don’t have any of the fol­low­ing:

These are all ES5 fea­tures, mean­ing they sim­ply don’t ex­ist in ES3 en­gines.

For these un­for­tu­nate souls who are still run­ning old en­gines, they need to reim­ple­ment every­thing them­selves, or be pro­vided with poly­fills.

Alternatively, what’d be re­ally nice is if they up­graded.

The sec­ond rea­son for some of these pack­ages is safety”.

Basically, in­side Node it­self, there is a con­cept of primordials”. These are es­sen­tially just global ob­jects wrapped at startup and im­ported by Node from then on, to avoid Node it­self be­ing bro­ken by some­one mu­tat­ing the global name­space.

For ex­am­ple, if Node it­self uses Map and we re-de­fine what Map is - we can break Node. To avoid this, Node keeps a ref­er­ence to the orig­i­nal Map which it im­ports rather than ac­cess­ing the global.

You can read more about this here in the Node repo.

This makes a lot of sense for an en­gine, since it re­ally should­n’t fall over if a script messes up the global name­space.

Some main­tain­ers also be­lieve this is the cor­rect way to build pack­ages, too. This is why we have de­pen­den­cies like math-in­trin­sics in the graph above, which ba­si­cally re-ex­ports the var­i­ous Math.* func­tions to avoid mu­ta­tion.

Lastly, we have cross-realm val­ues. These are ba­si­cally val­ues you have passed from one realm to an­other - for ex­am­ple, from a web page to a child or vice versa.

In this sit­u­a­tion, a new RegExp(pattern) in an iframe, is not the same RegExp class as the one in the par­ent page. This means win­dow. RegExp !== iframeWin­dow.Reg­Exp, which of course means val in­stanceof RegExp would be false if it came from the iframe (another realm).

For ex­am­ple, I am a main­tainer of chai, and we have this ex­act is­sue. We need to sup­port as­ser­tions hap­pen­ing across realms (since a test run­ner may run tests in a VM or iframe), so we can’t rely on in­stanceof checks. For that rea­son, we use Object.prototype.toString.call(val) === [object RegExp]’ to check if some­thing is a regex, which works across realms since it does­n’t rely on the con­struc­tor.

In the graph above, is-string is ba­si­cally do­ing this same job in case we passed a new String(val) from one realm to an­other.

All of this makes sense for a very small group of peo­ple. If you’re sup­port­ing very old en­gines, pass­ing val­ues across realms, or want pro­tec­tion from some­one mu­tat­ing the en­vi­ron­ment - these pack­ages are ex­actly what you need.

The prob­lem is that the vast ma­jor­ity of us don’t need any of this. We’re run­ning a ver­sion of Node from the last 10 years, or us­ing an ever­green browser. We don’t need to sup­port pre-ES5 en­vi­ron­ments, we don’t pass val­ues across frames, and we unin­stall pack­ages which break the en­vi­ron­ment.

These lay­ers of niche com­pat­i­bil­ity some­how made their way into the hot path” of every­day pack­ages. The tiny group of peo­ple who ac­tu­ally need this stuff should be the ones seek­ing out spe­cial pack­ages for it. Instead, it is re­versed and we all pay the cost.

Some folks be­lieve that pack­ages should be bro­ken up to an al­most atomic level, cre­at­ing a col­lec­tion of small build­ing blocks which can later be re-used to build other higher level things.

This kind of ar­chi­tec­ture means we end up with graphs like this:

As you can see, the most gran­u­lar snip­pets of code have their own pack­ages. For ex­am­ple, she­bang-regex is the fol­low­ing at the time of writ­ing this post:

By split­ting code up to this atomic level, the the­ory is that we can then cre­ate higher level pack­ages sim­ply by join­ing the dots.

Some ex­am­ples of these atomic pack­ages to give you an idea of the gran­u­lar­ity:

* ar­rify - Converts a value to an ar­ray (Array.isArray(val) ? val : [val])

* cli-boxes - A JSON file con­tain­ing the edges of a box

* path-key - Get the PATH en­vi­ron­ment vari­able key for the cur­rent plat­form (PATH on Unix, Path on Windows)

* one­time - Ensure a func­tion is only called once

* is-wsl - Check if process.plat­form is linux and os.re­lease() con­tains mi­crosoft

If we wanted to build a new CLI for ex­am­ple, we could pull a few of these in and not worry about im­ple­men­ta­tion. We don’t need to do env[‘PATH’] || env[‘Path’] our­selves, we can just pull a pack­age for that.

In re­al­ity, most or all of these pack­ages did not end up as the reusable build­ing blocks they were meant to be. They’re ei­ther largely du­pli­cated across var­i­ous ver­sions in a wider tree, or they’re sin­gle-use pack­ages which only one other pack­age uses.

Let’s take a look at some of the most gran­u­lar pack­ages:

* she­bang-regex is used al­most solely by she­bang-com­mand by the same main­tainer

* cli-boxes is used al­most solely by boxen and ink by the same main­tainer

* one­time is used al­most solely by re­store-cur­sor by the same main­tainer

Each of these hav­ing only one con­sumer means they’re equiv­a­lent of in­line code but cost us more to ac­quire (npm re­quests, tar ex­trac­tion, band­width, etc.).

Taking a look at nux­t’s de­pen­dency tree, we can see a few of these build­ing blocks du­pli­cated:

Inlining them does­n’t mean we no longer du­pli­cate the code, but it does mean we don’t pay the cost of things like ver­sion res­o­lu­tion, con­flicts, cost of ac­qui­si­tion, etc.

Inlining makes du­pli­ca­tion al­most free, while pack­ag­ing makes it ex­pen­sive.

The more pack­ages we have, the larger our sup­ply chain sur­face area is. Every pack­age is a po­ten­tial point of fail­ure for main­te­nance, se­cu­rity, and so on.

For ex­am­ple, a main­tainer of many of these pack­ages was com­pro­mised last year. This meant hun­dreds of tiny build­ing blocks were com­pro­mised, which meant the higher level pack­ages we ac­tu­ally in­stall were also com­pro­mised.

Logic as sim­ple as Array.isArray(val) ? val : [val] prob­a­bly does­n’t need its own pack­age, se­cu­rity, main­te­nance, and so on. It can just be in­lined and we can avoid the risk of it be­ing com­pro­mised.

Similar to the first pil­lar, this phi­los­o­phy made its way into the hot path” and prob­a­bly should­n’t have. Again, we all pay the cost to no real ben­e­fit.

If you’re build­ing an app, you might want to use some future” fea­tures your cho­sen en­gine does­n’t sup­port yet. In this sit­u­a­tion, a poly­fill can come in handy - it pro­vides a fall­back im­ple­men­ta­tion where the fea­ture should be, so you can use it as if it were na­tively sup­ported.

For ex­am­ple, tem­po­ral-poly­fill poly­fills the new Temporal API so we can use Temporal re­gard­less of if the en­gine sup­ports it or not.

Now, if you’re build­ing a li­brary in­stead, what should you do?

In gen­eral, no li­brary should load a poly­fill as that is a con­sumer’s con­cern and a li­brary should­n’t be mu­tat­ing the en­vi­ron­ment around it. As an al­ter­na­tive, some main­tain­ers choose to use what’s called a pony­fill (sticking to the uni­corns, sparkles and rain­bows theme).

A pony­fill is ba­si­cally a poly­fill you im­port rather than one which mu­tates the en­vi­ron­ment.

This kinda works since it means a li­brary can use fu­ture tech by im­port­ing an im­ple­men­ta­tion of it which passes through to the na­tive one if it ex­ists, and uses the fall­back oth­er­wise. None of this mu­tates the en­vi­ron­ment, so it is safe for li­braries to use.

For ex­am­ple, fastly pro­vides @fastly/performance-observer-polyfill, which con­tains both a poly­fill and pony­fill for PerformanceObserver.

These pony­fills did their job at the time - they al­lowed the li­brary au­thor to use fu­ture tech with­out mu­tat­ing the en­vi­ron­ment and with­out forc­ing the con­sumer to know which poly­fills to in­stall.

The prob­lem comes when these pony­fills out­stay their wel­come. When the fea­ture they fill in for is now sup­ported by all en­gines we care about, the pony­fill should be re­moved. However, this of­ten does­n’t hap­pen and the pony­fill re­mains in place long af­ter it’s needed.

We’re now left with many, many pack­ages which rely on pony­fills for fea­tures we’ve all had for a decade now.

Unless these pack­ages are be­ing kept alive be­cause of Pillar 1, they’re usu­ally still used just be­cause no­body ever thought to re­move them.

When all long-term sup­port ver­sions of en­gines have the fea­ture, the pony­fill should be re­moved.

Much of this bloat is so deeply nested in de­pen­dency trees to­day that it is a fairly hefty task to un­ravel it all and get to a good place. It will take time, and it will take a lot of ef­fort from main­tain­ers and con­sumers.

Having said that, I do think we can make sig­nif­i­cant progress on this front if we all work to­gether.

Start ask­ing your­self, why do I have this pack­age?” and do I re­ally need it?”.

If you find some­thing which seems re­dun­dant, raise an is­sue with the main­tainer ask­ing if it can be re­moved.

If you en­counter a di­rect de­pen­dency which has many of these is­sues, have a look for an al­ter­na­tive which does­n’t. A good start for that is the mod­ule-re­place­ments pro­ject.

knip is a great pro­ject which can help you find and re­move un­used de­pen­den­cies, dead code, and much more. In this case, it can be a great tool to help you find and re­move de­pen­den­cies you no longer use.

This does­n’t solve the prob­lems above nec­es­sar­ily, but is a great start­ing point to help clean up the de­pen­dency tree be­fore do­ing more in­volved work.

You can read more about how knip deals with un­used de­pen­den­cies in their doc­u­men­ta­tion.

The e18e CLI has a su­per use­ful an­a­lyze mode to de­ter­mine which de­pen­den­cies are no longer needed, or have com­mu­nity rec­om­mended re­place­ments.

For ex­am­ple, if you get some­thing like this:

Using this, we can quickly iden­tify which di­rect de­pen­den­cies can be cleaned up. We can also then use the mi­grate com­mand to au­to­mat­i­cally mi­grate some of these de­pen­den­cies:

In this case, it will mi­grate from chalk to pic­o­col­ors, a much smaller pack­age which pro­vides the same func­tion­al­ity.

In the fu­ture, this CLI will even rec­om­mend based on your en­vi­ron­ment - for ex­am­ple, it could sug­gest the na­tive style­Text in­stead of a colours li­brary if you’re run­ning a new enough Node.

npm­graph is a great tool to vi­su­al­ize your de­pen­dency tree and in­ves­ti­gate where bloat is com­ing from.

For ex­am­ple, let’s take a look at the bot­tom half of ESLint’s de­pen­dency graph as of writ­ing this post:

We can see in this graph that the find-up branch is iso­lated, in that noth­ing else uses its deep de­pen­den­cies. For some­thing as sim­ple as an up­wards file-sys­tem tra­ver­sal, maybe we don’t need 6 pack­ages. We can then go look for an al­ter­na­tive, such as em­pathic which has a much smaller de­pen­dency graph and achieves the same thing.

The mod­ule re­place­ments pro­ject is be­ing used as a cen­tral data set for the wider com­mu­nity to doc­u­ment which pack­ages can be re­placed with na­tive func­tion­al­ity, or more per­for­mant al­ter­na­tives.

If you’re ever in need of an al­ter­na­tive or just want to check your de­pen­den­cies, this data set is great for that.

Similarly, if you come across pack­ages in your tree which are made re­dun­dant by na­tive func­tion­al­ity, or just have bet­ter bat­tle-tested al­ter­na­tives, this pro­ject is def­i­nitely a great place to con­tribute that so oth­ers can ben­e­fit from it.

Paired with the data, there’s also a code­mods pro­ject which pro­vides code­mods to au­to­mat­i­cally mi­grate some of these pack­ages to their sug­gested re­place­ments.

We all pay the cost for an in­cred­i­bly small group of peo­ple to have an un­usual ar­chi­tec­ture they like, or a level of back­wards com­pat­i­bil­ity they need.

This is­n’t nec­es­sar­ily a fault of the peo­ple who made these pack­ages, as each per­son should be able to build how­ever they want. Many of them are an older gen­er­a­tion of in­flu­en­tial JavaScript de­vel­op­ers - build­ing pack­ages in a darker time where many of the nice APIs and cross-com­pat­i­bil­ity we have to­day did­n’t ex­ist. They built the way they did be­cause it was pos­si­bly the best way at the time.

The prob­lem is that we never moved on from that. We still down­load all of this bloat to­day even though we’ve had these fea­tures for sev­eral years.

I think we can solve this by re­vers­ing things. This small group should pay the cost - they should have their own spe­cial stack pretty much only they use. Everyone else gets the mod­ern, light­weight, and widely sup­ported code.

Hopefully things like e18e and npmx can help with that through doc­u­men­ta­tion, tool­ing, etc. You can also help by tak­ing a closer look at your de­pen­den­cies and ask­ing why?”. Raise is­sues with your de­pen­den­cies ask­ing them if, and why they need these pack­ages any­more.

We can fix it.

...

Read the original on 43081j.com »

3 345 shares, 14 trendiness

Video Editor

Professional video edit­ing, right in your browserA pow­er­ful NLE ed­i­tor with GPU com­posit­ing, keyframe an­i­ma­tion, and real-time pre­view. No in­stalls re­quired. Everything you need to ed­it­Built on WebGPU and Rust/WASM for per­for­mance that ri­vals na­tive apps.We­bGPU-pow­ered com­posit­ing via Rust/WASM de­liv­ers near-na­tive per­for­mance for real-time pre­views and ex­ports.Can­vas-ren­dered time­line with un­lim­ited video and au­dio tracks, linked clips, and cross-tran­si­tions.An­i­mate any prop­erty with bezier eas­ing curves. Transform, opac­ity, ef­fects — every­thing is keyframe­able.Ap­ply bright­ness, con­trast, sat­u­ra­tion, blur, and hue ro­ta­tion — all GPU-computed with in­stant pre­view.Every­thing runs in the browser. Your me­dia stays lo­cal with the File System Access API — noth­ing leaves your ma­chine.

...

Read the original on tooscut.app »

4 299 shares, 39 trendiness

Knowledge That Never Goes Offline

Wikipedia, AI, maps, and ed­u­ca­tion tools run­ning on your own hard­ware — com­pletely free. No in­ter­net re­quired.

Knowledge That Never Goes Offline

Node for Offline Media, Archives, and Data — a free, open source of­fline server you in­stall on any com­puter. Download the con­tent you want, and it works with­out in­ter­net — for­ever. Similar prod­ucts cost hun­dreds of dol­lars. Project NOMAD is free.

Khan Academy, Wikipedia for Schools, and more — com­plete learn­ing re­sources for fam­i­lies any­where, even with­out con­nec­tiv­ity.

Run lo­cal LLMs, self-host your knowl­edge base, own your data. Built for beefy hard­ware and those who want full con­trol.

Cabin, RV, or sail­boat — bring a com­plete li­brary, AI as­sis­tant, and of­fline maps wher­ever you go. True dig­i­tal in­de­pen­dence.

When in­fra­struc­ture fails, NOMAD keeps work­ing. Medical ref­er­ences, sur­vival guides, and en­cy­clo­pe­dic knowl­edge — no in­ter­net re­quired.

Emergency PreparednessWhen in­fra­struc­ture fails, NOMAD keeps work­ing. Medical ref­er­ences, sur­vival guides, and en­cy­clo­pe­dic knowl­edge — no in­ter­net re­quired. Off-Grid LivingCabin, RV, or sail­boat — bring a com­plete li­brary, AI as­sis­tant, and of­fline maps wher­ever you go. True dig­i­tal in­de­pen­dence.Tech EnthusiastsRun lo­cal LLMs, self-host your knowl­edge base, own your data. Built for beefy hard­ware and those who want full con­trol.Ed­u­ca­tionKhan Academy, Wikipedia for Schools, and more — com­plete learn­ing re­sources for fam­i­lies any­where, even with­out con­nec­tiv­ity.

Whether you’re plan­ning for emer­gen­cies or liv­ing off-grid, Project NOMAD has you cov­ered.

Full of­fline map­ping with OpenStreetMap data. Navigate, plan routes, and ex­plore ter­rain with­out any cell ser­vice.

Run pow­er­ful large lan­guage mod­els com­pletely of­fline. Chat, write, an­a­lyze, code — all with­out send­ing data any­where.

Offline Wikipedia, Project Gutenberg, med­ical ref­er­ences, re­pair guides, and more — ter­abytes of hu­man knowl­edge at your fin­ger­tips.

Information LibraryPowered by KiwixOffline Wikipedia, Project Gutenberg, med­ical ref­er­ences, re­pair guides, and more — ter­abytes of hu­man knowl­edge at your fin­ger­tips. AI AssistantPowered by OllamaRun pow­er­ful large lan­guage mod­els com­pletely of­fline. Chat, write, an­a­lyze, code — all with­out send­ing data any­where.Of­fline MapsPowered by OpenStreetMapFull of­fline map­ping with OpenStreetMap data. Navigate, plan routes, and ex­plore ter­rain with­out any cell ser­vice.Ed­u­ca­tion PlatformPowered by KolibriKhan Academy courses, ed­u­ca­tional videos, in­ter­ac­tive lessons — com­plete K-12 cur­ricu­lum avail­able of­fline.

Watch the full walk­through to see what Project NOMAD can do on your hard­ware.

Wikipedia, AI, maps, and ed­u­ca­tion tools run­ning on your own hard­ware — com­pletely free. No in­ter­net re­quired.

Knowledge That Never Goes Offline

Node for Offline Media, Archives, and Data — a free, open source of­fline server you in­stall on any com­puter. Download the con­tent you want, and it works with­out in­ter­net — for­ever. Similar prod­ucts cost hun­dreds of dol­lars. Project NOMAD is free.

Khan Academy, Wikipedia for Schools, and more — com­plete learn­ing re­sources for fam­i­lies any­where, even with­out con­nec­tiv­ity.

Run lo­cal LLMs, self-host your knowl­edge base, own your data. Built for beefy hard­ware and those who want full con­trol.

Cabin, RV, or sail­boat — bring a com­plete li­brary, AI as­sis­tant, and of­fline maps wher­ever you go. True dig­i­tal in­de­pen­dence.

When in­fra­struc­ture fails, NOMAD keeps work­ing. Medical ref­er­ences, sur­vival guides, and en­cy­clo­pe­dic knowl­edge — no in­ter­net re­quired.

Emergency PreparednessWhen in­fra­struc­ture fails, NOMAD keeps work­ing. Medical ref­er­ences, sur­vival guides, and en­cy­clo­pe­dic knowl­edge — no in­ter­net re­quired. Off-Grid LivingCabin, RV, or sail­boat — bring a com­plete li­brary, AI as­sis­tant, and of­fline maps wher­ever you go. True dig­i­tal in­de­pen­dence.Tech EnthusiastsRun lo­cal LLMs, self-host your knowl­edge base, own your data. Built for beefy hard­ware and those who want full con­trol.Ed­u­ca­tionKhan Academy, Wikipedia for Schools, and more — com­plete learn­ing re­sources for fam­i­lies any­where, even with­out con­nec­tiv­ity.

Whether you’re plan­ning for emer­gen­cies or liv­ing off-grid, Project NOMAD has you cov­ered.

Full of­fline map­ping with OpenStreetMap data. Navigate, plan routes, and ex­plore ter­rain with­out any cell ser­vice.

Run pow­er­ful large lan­guage mod­els com­pletely of­fline. Chat, write, an­a­lyze, code — all with­out send­ing data any­where.

Offline Wikipedia, Project Gutenberg, med­ical ref­er­ences, re­pair guides, and more — ter­abytes of hu­man knowl­edge at your fin­ger­tips.

Information LibraryPowered by KiwixOffline Wikipedia, Project Gutenberg, med­ical ref­er­ences, re­pair guides, and more — ter­abytes of hu­man knowl­edge at your fin­ger­tips. AI AssistantPowered by OllamaRun pow­er­ful large lan­guage mod­els com­pletely of­fline. Chat, write, an­a­lyze, code — all with­out send­ing data any­where.Of­fline MapsPowered by OpenStreetMapFull of­fline map­ping with OpenStreetMap data. Navigate, plan routes, and ex­plore ter­rain with­out any cell ser­vice.Ed­u­ca­tion PlatformPowered by KolibriKhan Academy courses, ed­u­ca­tional videos, in­ter­ac­tive lessons — com­plete K-12 cur­ricu­lum avail­able of­fline.

Watch the full walk­through to see what Project NOMAD can do on your hard­ware.

Other of­fline prod­ucts charge hun­dreds and lock you into spe­cific hard­ware. Project NOMAD runs on any PC you choose — with GPU-accelerated AI — for free.

...

Read the original on www.projectnomad.us »

5 278 shares, 59 trendiness

Manyana

I’m re­leas­ing Manyana, a pro­ject which I be­lieve pre­sents a co­her­ent vi­sion for the fu­ture of ver­sion con­trol — and a com­pelling case for build­ing it.

It’s based on the fun­da­men­tally sound ap­proach of us­ing CRDTs for ver­sion con­trol, which is long over­due but has­n’t hap­pened yet be­cause of sub­tle UX is­sues. A CRDT merge al­ways suc­ceeds by de­f­i­n­i­tion, so there are no con­flicts in the tra­di­tional sense — the key in­sight is that changes should be flagged as con­flict­ing when they touch each other, giv­ing you in­for­ma­tive con­flict pre­sen­ta­tion on top of a sys­tem which never ac­tu­ally fails. This pro­ject works that out.

One im­me­di­ate ben­e­fit is much more in­for­ma­tive con­flict mark­ers. Two peo­ple branch from a file con­tain­ing a func­tion. One deletes the func­tion. The other adds a line in the mid­dle of it. A tra­di­tional VCS gives you this:

<<<<<<< left

def cal­cu­late(x):

a = x * 2

log­ger.de­bug(f”a={a}“)

b = a + 1

re­turn b

>>>>>>> right

Two opaque blobs. You have to men­tally re­con­struct what ac­tu­ally hap­pened.

Manyana gives you this:

<<<<<<< be­gin deleted left

def cal­cu­late(x):

a = x * 2

======= be­gin added right

log­ger.de­bug(f”a={a}“)

======= be­gin deleted left

b = a + 1

re­turn b

>>>>>>> end con­flict

Each sec­tion tells you what hap­pened and who did it. Left deleted the func­tion. Right added a line in the mid­dle. You can see the struc­ture of the con­flict in­stead of star­ing at two blobs try­ing to fig­ure it out.

CRDTs (Conflict-Free Replicated Data Types) give you even­tual con­sis­tency: merges never fail, and the re­sult is al­ways the same no mat­ter what or­der branches are merged in — in­clud­ing many branches mashed to­gether by mul­ti­ple peo­ple work­ing in­de­pen­dently. That one prop­erty turns out to have pro­found im­pli­ca­tions for every as­pect of ver­sion con­trol de­sign.

Line or­der­ing be­comes per­ma­nent. When two branches in­sert code at the same point, the CRDT picks an or­der­ing and it sticks. This pre­vents prob­lems when con­flict­ing sec­tions are both kept but re­solved in dif­fer­ent or­ders on dif­fer­ent branches.

Conflicts are in­for­ma­tive, not block­ing. The merge al­ways pro­duces a re­sult. Conflicts are sur­faced for re­view when con­cur­rent ed­its hap­pen too near” each other, but they never block the merge it­self. And be­cause the al­go­rithm tracks what each side did rather than just show­ing the two out­comes, the con­flict pre­sen­ta­tion is gen­uinely use­ful.

History lives in the struc­ture. The state is a weave — a sin­gle struc­ture con­tain­ing every line which has ever ex­isted in the file, with meta­data about when it was added and re­moved. This means merges don’t need to find a com­mon an­ces­tor or tra­verse the DAG. Two states go in, one state comes out, and it’s al­ways cor­rect.

One idea I’m par­tic­u­larly ex­cited about: re­base does­n’t have to de­stroy his­tory. Conventional re­base cre­ates a fic­tional his­tory where your com­mits hap­pened on top of the lat­est main. In a CRDT sys­tem, you can get the same ef­fect — re­play­ing com­mits one at a time onto a new base — while keep­ing the full his­tory. The only ad­di­tion needed is a primary an­ces­tor” an­no­ta­tion in the DAG.

This mat­ters be­cause ag­gres­sive re­bas­ing quickly pro­duces merge topolo­gies with no sin­gle com­mon an­ces­tor, which is ex­actly where tra­di­tional 3-way merge falls apart. CRDTs don’t care — the his­tory is in the weave, not re­con­structed from the DAG.

Manyana is a demo, not a full-blown ver­sion con­trol sys­tem. It’s about 470 lines of Python which op­er­ate on in­di­vid­ual files. Cherry-picking and lo­cal undo aren’t im­ple­mented yet, though the README lays out a vi­sion for how those can be done well.

What it is is a proof that CRDT-based ver­sion con­trol can han­dle the hard UX prob­lems and come out with bet­ter an­swers than the tools we’re all us­ing to­day — and a co­her­ent de­sign for build­ing the real thing.

The code is pub­lic do­main. The full de­sign doc­u­ment is in the README.

...

Read the original on bramcohen.com »

6 266 shares, 11 trendiness

hectorvent/floci: Light, fluffy, and always free

Named af­ter floc­cus — the cloud for­ma­tion that looks ex­actly like pop­corn.

A free, open-source lo­cal AWS em­u­la­tor. No ac­count. No fea­ture gates. No CI re­stric­tions. Just docker com­pose up.

LocalStack’s com­mu­nity edi­tion sun­set in March 2026 — re­quir­ing auth to­kens, drop­ping CI sup­port, and freez­ing se­cu­rity up­dates. Floci is the no-strings-at­tached al­ter­na­tive.

# docker-com­pose.yml

ser­vices:

floci:

im­age: hec­tor­vent/​floci:lat­est

ports:

- 4566:4566”

vol­umes:

- ./data:/app/data

docker com­pose up

All ser­vices are avail­able at http://​lo­cal­host:4566. Use any AWS re­gion — cre­den­tials can be any­thing.

ex­port AWS_ENDPOINT_URL=http://​lo­cal­host:4566

ex­port AWS_DEFAULT_REGION=us-east-1

ex­port AWS_ACCESS_KEY_ID=test

ex­port AWS_SECRET_ACCESS_KEY=test

# Try it

aws s3 mb s3://​my-bucket

aws sqs cre­ate-queue –queue-name my-queue

aws dy­namodb list-ta­bles

Point your ex­ist­ing AWS SDK at http://​lo­cal­host:4566 — no other changes needed.

// Java (AWS SDK v2)

DynamoDbClient client = DynamoDbClient.builder()

.endpointOverride(URI.create(“http://​lo­cal­host:4566))

.region(Region.US_EAST_1)

.credentialsProvider(StaticCredentialsProvider.create(

AwsBasicCredentials.create(“test”, test”)))

.build();

# Python (boto3)

im­port boto3

client = boto3.client(“s3”,

end­point_url=“http://​lo­cal­host:4566,

re­gion_­name=“us-east-1”,

aws_ac­cess_key_id=“test”,

aws_se­cret_ac­cess_key=“test”)

// Node.js (AWS SDK v3)

im­port { S3Client } from @aws-sdk/client-s3”;

const client = new S3Client({

end­point: http://​lo­cal­host:4566,

re­gion: us-east-1”,

cre­den­tials: { ac­cessKeyId: test”, sec­re­tAc­cessKey: test” },

for­cePath­Style: true,

All set­tings are over­rid­able via en­vi­ron­ment vari­ables (FLOCI_ pre­fix).

MIT — use it how­ever you want.

...

Read the original on github.com »

7 265 shares, 24 trendiness

danveloper/flash-moe: Running a big model on a small laptop

Read the pa­per — Full tech­ni­cal de­tails, 90+ ex­per­i­ments, and the story of how an AI and a hu­man built this in 24 hours.

Pure C/Metal in­fer­ence en­gine that runs Qwen3.5-397B-A17B (a 397 bil­lion pa­ra­me­ter Mixture-of-Experts model) on a MacBook Pro with 48GB RAM at 4.4+ to­kens/​sec­ond with pro­duc­tion-qual­ity out­put in­clud­ing tool call­ing.

The en­tire 209GB model streams from SSD through a cus­tom Metal com­pute pipeline. No Python. No frame­works. Just C, Objective-C, and hand-tuned Metal shaders.

*2-bit quan­ti­za­tion pro­duces \name\ in­stead of name” in JSON out­put, mak­ing tool call­ing un­re­li­able. 4-bit is the pro­duc­tion con­fig­u­ra­tion.

The model has 60 trans­former lay­ers: 45 GatedDeltaNet (linear at­ten­tion) + 15 stan­dard full at­ten­tion. Each layer has 512 ex­perts, of which K=4 are ac­ti­vated per to­ken (plus one shared ex­pert). Hidden di­men­sion is 4096.

SSD Expert Streaming — Expert weights (209GB at 4-bit) are read from NVMe SSD on de­mand via par­al­lel pread() with GCD dis­patch groups. Only the K=4 ac­tive ex­perts per layer are loaded (~6.75MB each). The OS page cache man­ages caching — no cus­tom cache needed (“Trust the OS prin­ci­ple). Inspired by Apple’s LLM in a Flash” pa­per.

FMA-Optimized Dequant Kernel — The in­ner loop of the 4-bit de­quan­tized ma­trix-vec­tor mul­ti­ply re­arranges the math from (nibble * scale + bias) * x to fma(nib­ble, scale*x, bias*x). Pre-computing scale*x and bias*x lets the GPU fused mul­ti­ply-add unit do de­quant+mul­ti­ply in one in­struc­tion. 12% faster than the naive for­mu­la­tion.

Deferred GPU Expert Compute — CMD3 (expert for­ward pass) is sub­mit­ted with­out wait­ing. The GPU ex­e­cutes it while the CPU pre­pares the next layer. The com­bine + resid­ual + norm are also on GPU, feed­ing di­rectly into the next lay­er’s at­ten­tion pro­jec­tions.

Accelerate BLAS for Linear Attention — The GatedDeltaNet re­cur­rence uses cblas_ss­cal, cblas_s­gemv, and cblas_s­ger for the 64-head × 128×128 state ma­trix up­date. 64% faster than scalar code.

Trust the OS — No cus­tom ex­pert cache. The OS page cache (~35GB) man­ages ex­pert data caching via stan­dard LRU. Every cus­tom caching ap­proach we tested (Metal LRU, mal­loc cache, LZ4 com­pressed cache) was slower due to GPU mem­ory pres­sure or over­head. The page cache achieves ~71% hit rate nat­u­rally.

On Apple Silicon, SSD DMA and GPU com­pute share the same mem­ory con­troller and can­not be prof­itably over­lapped. The GPUs de­quant ker­nels are band­width-sat­u­rated at ~418 GiB/s. Even small back­ground SSD DMA causes dis­pro­por­tion­ate GPU la­tency spikes through mem­ory con­troller ar­bi­tra­tion. The se­r­ial pipeline (GPUSSDGPU) is hard­ware-op­ti­mal.

cd met­al_in­fer

make

# 4-bit in­fer­ence (needs packed_­ex­perts/ di­rec­tory)

./infer –prompt Explain quan­tum com­put­ing” –tokens 100

# 2-bit in­fer­ence (faster but breaks tool call­ing)

./infer –prompt Explain quan­tum com­put­ing” –tokens 100 –2bit

# Interactive chat with tool call­ing

./chat

# Per-layer tim­ing break­down

./infer –prompt Hello” –tokens 20 –timing

This is a pri­mary de­vel­op­ment ma­chine. The en­gine ex­plic­itly con­trols mem­ory:

* No OOM risk. Expert data streams from SSD on de­mand.

...

Read the original on github.com »

8 260 shares, 28 trendiness

Windows Native App Development Is a Mess

I’m a Windows guy; I al­ways have been. One of my first pro­gram­ming books was , which cru­cially came with a trial ver­sion of Visual C++ that my ten-year-old self could in­stall on my par­ents’ com­puter. I re­mem­ber be­ing on a fam­ily va­ca­tion when .NET 1.0 came out, work­ing my way through a C# tome and gear­ing up to rewrite my Neopets cheat­ing pro­grams from MFC into Windows Forms. Even my very first job af­ter uni­ver­sity was at a .NET shop, al­though I worked mostly on the fron­tend.

While I fol­lowed the Windows de­vel­op­ment ecosys­tem from the side­lines, my pro­fes­sional work never in­volved writ­ing na­tive Windows apps. (Chromium is tech­ni­cally a na­tive app, but is more like its own op­er­at­ing sys­tem.) And for my hobby pro­jects, the web was al­ways a bet­ter choice. But, spurred on by fond child­hood mem­o­ries, I thought writ­ing a fun lit­tle Windows util­ity pro­gram might be a good re­tire­ment pro­ject.

Well. I am here to re­port that the scene is a com­plete mess. I to­tally un­der­stand why no­body writes na­tive Windows ap­pli­ca­tions these days, and in­stead peo­ple turn to Electron.

The util­ity I built, Display Blackout, scratched an itch for me: when play­ing games on my three-mon­i­tor setup, I wanted to black out my left and right dis­plays. Turning them off will cause Windows to spasm for sev­eral sec­onds and throw all your cur­rent win­dow po­si­tion­ing out of whack. But for OLED mon­i­tors, throw­ing up a black over­lay will turn off all the pix­els, which is just as good.

To be clear, this is not an orig­i­nal idea. I was orig­i­nally us­ing an AutoHotkey script, which upon writ­ing this post I found out has since mor­phed into a full Windows ap­pli­ca­tion. Other | in­car­na­tions of the idea are even avail­able on the Microsoft Store. But, I thought I could cre­ate a slightly nicer and more mod­ern UI, and any­way, the point was to learn, not to cre­ate a com­mer­cial prod­uct.

For our pur­poses, what’s in­ter­est­ing about this app is the sort of ca­pa­bil­i­ties it needs:

Enumerating the ma­chine’s dis­plays and their bounds

Let’s keep those in mind go­ing for­ward.

Look at this beau­ti­ful UI that I made. Surely you will agree that it is bet­ter than all other soft­ware in this space.

In the be­gin­ning, there was the Win32 API, in C. Unfortunately, this API is still highly rel­e­vant to­day, in­clud­ing for my pro­gram.

Over time, a se­ries of ab­strac­tions on top of this emerged. The main pre-.NET one was the C++ li­brary, which used mod­ern-at-the-time lan­guage fea­tures like classes and tem­plates to add some ob­ject-ori­en­ta­tion on top of the raw C func­tions.

The ab­strac­tion train re­ally got go­ing with the in­tro­duc­tion of .NET. .NET was many things, but for our pur­poses the most im­por­tant part was the in­tro­duc­tion of a new pro­gram­ming lan­guage, C#, that ran as JITed byte­code on a new vir­tual ma­chine, in the same style as Java. This brought au­to­matic mem­ory man­age­ment (and thus mem­ory safety) to Windows pro­gram­ming, and gen­er­ally gave Microsoft a more mod­ern foun­da­tion for their ecosys­tem. Additionally, the .NET li­braries in­cluded a whole new set of APIs for in­ter­act­ing with Windows. On the UI side in par­tic­u­lar, .NET 1.0 (2002) started out with Windows Forms. Similar to MFC, it was largely a wrap­per around the Win32 win­dow­ing and con­trol APIs.

With .NET 3.0 (2006), Microsoft in­tro­duced . Now, in­stead of cre­at­ing all con­trols as C# ob­jects, there was a sep­a­rate markup lan­guage, : more like the HTML + JavaScript re­la­tion­ship. This also was the first time they re­drew con­trols from scratch, on the GPU, in­stead of wrap­ping the Win32 API con­trols that shipped with the OS. At the time, this felt like a fresh start, and a good foun­da­tion for the fore­see­able fu­ture of Windows apps.

The next big pivot was with the re­lease of Windows 8 (2012) and the in­tro­duc­tion of WinRT. Similar to .NET, it was an at­tempt to cre­ate new APIs for all of the func­tion­al­ity needed to write Windows ap­pli­ca­tions. If de­vel­op­ers stayed in­side the lines of WinRT, their apps would meet the mod­ern stan­dard of sand­boxed apps, such as those on Android and iOS, and be de­ploy­able across Windows desk­tops, tablets, and phones. It was still XAML-based on the UI side, but with every­thing slightly dif­fer­ent than it was in WPF, to sup­port the more con­strained cross-de­vice tar­gets.

This strat­egy got a do-over in Windows 10 (2015) with , with some sand­box­ing re­stric­tions lifted to al­low for more ca­pa­ble desk­top/​phone/​Xbox/​HoloLens apps, but still not quite the same power as full .NET apps with WPF. At the same time, with both WinRT and UWP, cer­tain new OS-level fea­tures and in­te­gra­tions (such as push no­ti­fi­ca­tions, live tiles, or pub­li­ca­tion in the Microsoft Store) were only granted to apps that used these frame­works. This led to awk­ward ar­chi­tec­tures where ap­pli­ca­tions like Chrome or Microsoft Office would have WinRT/UWP bridge apps around old-school cores, com­mu­ni­cat­ing over or sim­i­lar.

With Windows 11 (2021), Microsoft fi­nally gave up on the at­tempts to move every­one to some more-sand­boxed and more-mod­ern plat­form. The Windows App SDK ex­poses all the for­merly WinRT/UWP-exclusive fea­tures to all Windows apps, whether writ­ten in stan­dard C++ (no more C++/CLI) or writ­ten in .NET. The SDK in­cludes WinUI 3, yet an­other XAML-based, drawn-from-scratch con­trol li­brary.

So did you catch all that? Just look­ing at the UI frame­work evo­lu­tion, we have:

In the spirit of this be­ing a learn­ing pro­ject, I knew I wanted to use the lat­est and great­est first-party foun­da­tion. That meant writ­ing a WinUI 3 app, us­ing the Windows App SDK. There ends up be­ing three ways to go about this:

This is a painful choice. C++ will pro­duce lean apps, run­time-linked against the Windows APP SDK li­braries, with easy in­terop down into any Win32 C APIs that I might need. But, in 2026, writ­ing a green­field ap­pli­ca­tion in a mem­ory-un­safe lan­guage like C++ is a crime.

What would be ideal is if I could use the sys­tem’s .NET, and just dis­trib­ute the C# byte­code, sim­i­lar to how all web apps share the same web plat­form pro­vided by the browser. This is called framework-dependent de­ploy­ment”. However, for no rea­son I can un­der­stand, Microsoft has de­cided that even the lat­est ver­sions of Windows 11 only get .NET 4.8.1 pre­in­stalled. (The cur­rent ver­sion of .NET is 10.) So dis­trib­ut­ing an app this way in­curs a tragedy of the com­mons, where the first app to need mod­ern .NET will cause Windows to show a di­a­log prompt­ing the user to down­load and in­stall the .NET li­braries. This is not the op­ti­mal user ex­pe­ri­ence!

That leaves .NET AOT. Yes, I am com­pil­ing the en­tire .NET run­time—in­clud­ing the vir­tual ma­chine, garbage col­lec­tor, stan­dard li­brary, etc.—into my bi­nary. The com­piler tries to trim out un­used code, but the re­sult is still a solid 9 MiB for an app that blacks out some mon­i­tors.

There’s a sim­i­lar painful choice when it comes to dis­tri­b­u­tion. Although Windows is happy to sup­port hand-rolled or third-party-tool-gen­er­ated setup.exe in­stallers, the Microsoft-recommended path for a mod­ern app with con­tainer­ized in­stall/​unin­stall is MSIX. But this for­mat re­lies heav­ily on code sign­ing cer­tifi­cates, which seem to cost around $200–300/year for non-US res­i­dents. The un­signed side­load­ing ex­pe­ri­ence is ter­ri­ble, re­quir­ing a cryp­tic PowerShell com­mand only us­able from an ad­min ter­mi­nal. I could avoid side­load­ing if Microsoft would just ac­cept my app into their store, but they re­jected it for not of­fer­ing unique last­ing value”.

The tragedy here is that this all seems so un­nec­es­sary. .NET could be dis­trib­uted via Windows Update, so the lat­est ver­sion is al­ways pre­sent, mak­ing frame­work-de­pen­dent de­ploy­ment vi­able. Or at least there could be a MSIX pack­age for .NET avail­able, so that other MSIX pack­ages could de­clare a de­pen­dency on it. Unsigned MSIX side­loads use the same crowd-sourced rep­u­ta­tion sys­tem that EXE in­stallers get. Windows code sign­ing certs could cost $100/year, in­stead of $200+, like the equiv­a­lent costs for the Apple ecosys­tem. But like every­thing else about mod­ern Windows de­vel­op­ment, it’s all just … half-assed.

It turns out that it’s a lot of work to recre­ate one’s OS and UI APIs every few years. Coupled with the in­ter­mit­tent at­tempts at sand­box­ing and dep­re­cat­ing too pow­er­ful” func­tion­al­ity, the re­sult is that each new layer has gaps, where you can’t do cer­tain things which were pos­si­ble in the pre­vi­ous frame­work.

This is not a new prob­lem. Even back with MFC, you would of­ten find your­self need­ing to drop down to Win32 APIs. And .NET has had P/Invoke since 1.0. So, es­pe­cially now that Microsoft is no longer re­quir­ing that you only use the lat­est frame­work in ex­change for new ca­pa­bil­i­ties, hav­ing to drop down to a pre­vi­ous layer is not the end of the world. But it’s frus­trat­ing: what is the point of us­ing Microsoft’s lat­est and great­est, if half your code is just in­terop goop to get at the old APIs? What’s the point of pro­gram­ming in C#, if you have to wrap a bunch of C APIs?

Let’s re­visit the list of things my app needs to do, and com­pare them to what you can do us­ing the Windows App SDK:

Enumerating the ma­chine’s dis­plays and their bounds: can enu­mer­ate, as long as you use a for loop in­stead of a fore­ach loop. But watch­ing for changes re­quires P/Invoke, be­cause the mod­ern API does­n’t ac­tu­ally work.

Placing bor­der­less, ti­tle­bar-less, non-ac­ti­vat­ing black win­dows: much of this is doable, but non-ac­ti­vat­ing needs P/Invoke.

Optionally run­ning at startup: can do, with a nice sys­tem-set­tings-in­te­grated off-by-de­fault API.

Displaying a tray icon with a few menu items: not avail­able. Not only does the tray icon it­self need P/Invoke, the con­cept of menus for tray icons is not stan­dard­ized, so de­pend­ing on which wrap­per pack­age you pick, you’ll get one of sev­eral dif­fer­ent con­text menu styles.

The Windows IME sys­tem com­po­nent uses a mod­ern frosted-glass style, match­ing a few other sys­tem com­po­nents but no apps (including Microsoft apps) that I can find.

The OneNote first-party app uses a white back­ground, and uses bold to in­di­cate the left-click ac­tion.

The Phone Link bun­dled app is pretty sim­i­lar to OneNote.

Command Palette comes from PowerToys, which is sup­posed to be a WinUI 3 show­case. Similar to OneNote and Phone Link, but with ex­tra Left-click” and Double-click” in­di­ca­tors seen nowhere else.

The Windows Security sys­tem com­po­nent uses dif­fer­ent mar­gins, and in­ex­plic­a­bly, is the only app to po­si­tion the menu on the left.

1Password seems to be try­ing for the same style as the white-back­ground Windows com­po­nents and Microsoft apps, but with dif­fer­ent mar­gins than all of them.

Signal seems roughly the same as 1Password. A shared li­brary?

Discord seems sim­i­lar to 1Password and Signal, but it in­serted an un­s­e­lec­table brand­ing menu item”.

Steam is too cool to fit into the host OS, and just draws some­thing com­pletely cus­tom.

For Display Blackout, I used the ap­proach pro­vided by WinUIEx. This matches the sys­tem IME menu, al­though not in ver­ti­cal off­set or hor­i­zon­tal cen­ter­ing.

But these are just the head­line fea­tures. Even some­thing as sim­ple as au­to­mat­i­cally siz­ing your app win­dow to its con­tents was lost some­where along the way from WPF to WinUI 3.

Given how of­ten you need to call back down to Win32 C APIs, it does­n’t help that the in­terop tech­nol­ogy is it­self un­der­go­ing a tran­si­tion. The mod­ern way ap­pears to be some­thing called CsWin32, which is sup­posed to take some of the pain out of P/Invoke. But it can’t even cor­rectly wrap strings in­side of structs. To my eyes, it ap­pears to be one of those un­der­funded, per­pet­u­ally pre-1.0 pro­jects with unin­spir­ing changel­ogs, on track to get aban­doned af­ter a cou­ple years.

And CsWin32’s prob­lems aren’t just im­ple­men­ta­tion gaps: some of them trace back to miss­ing fea­tures in C# it­self. The doc­u­men­ta­tion con­tains this darkly hi­lar­i­ous pas­sage:

Some pa­ra­me­ters in win32 are [optional, out] or [optional, in, out]. C# does not have an id­iomatic way to rep­re­sent this con­cept, so for any method that has such pa­ra­me­ters, CsWin32 will gen­er­ate two ver­sions: one with all ref or out pa­ra­me­ters in­cluded, and one with all such pa­ra­me­ters omit­ted.

The C# lan­guage does­n’t have a way to spec­ify a foun­da­tional pa­ra­me­ter type of the Win32 API? One which is a lin­ear com­bi­na­tion of two ex­ist­ing sup­ported pa­ra­me­ter types? One might think that an ad­van­tage of con­trol­ling C# would be that Microsoft has care­fully shaped and co­e­volved it to be the per­fect pro­gram­ming lan­guage for Windows APIs. This does not ap­pear to be the case.

Indeed, it’s not just in in­terop with old Win32 APIs where C# falls short of its tar­get plat­for­m’s needs. When WPF first came out in 2006, with its em­pha­sis on two-way data bind­ing, every­one quickly re­al­ized that the boil­er­plate in­volved in cre­at­ing classes that could bind to UI was un­sus­tain­able. Essentially, every prop­erty needs to be­come a get­ter/​set­ter pair, with the set­ter hav­ing a same-value guard and a call to fire an event. (And fir­ing an event is full of cer­e­mony in C#.) People tried var­i­ous so­lu­tions to pa­per over this, from base classes to code gen­er­a­tors. But the real so­lu­tion here is to put some­thing in the lan­guage, like JavaScript has done with dec­o­ra­tors and prox­ies.

So when I went to work on my app, I was as­ton­ished to find that twenty years af­ter the re­lease of WPF, the boil­er­plate had barely changed. (The sole im­prove­ment is that C# got a fea­ture that lets you omit the name of the prop­erty when fir­ing the event.) What has the C# lan­guage team been do­ing for twenty years, that cre­at­ing na­tive ob­serv­able classes never be­came a pri­or­ity?

Honestly, the whole pro­ject of na­tive Windows app de­vel­op­ment feels like it’s not a pri­or­ity for Microsoft. The rel­e­vant is­sue track­ers are full of de­vel­op­ers en­coun­ter­ing painful bugs and gaps, and get­ting lit­tle-to-no re­sponse from Microsoft en­gi­neers. The Windows App SDK changelog is mostly about them adding new ma­chine learn­ing APIs. And fa­mously, many first-party apps, from Visual Studio Code to Outlook to the Start menu it­self, are writ­ten us­ing web tech­nolo­gies.

This is prob­a­bly why large parts of the com­mu­nity have de­cided to go their own way, in­vest­ing in third-party UI frame­works like Avalonia and Uno Platform. From what I can tell brows­ing their land­ing pages and GitHub repos­i­to­ries, these are bet­ter-main­tained, and writ­ten by peo­ple who loved WPF and wished WinUI were as ca­pa­ble. They also em­brace cross-plat­form de­vel­op­ment, which cer­tainly is im­por­tant for some use cases.

But at that point: why not Electron? Seriously. C# and XAML are not that amaz­ing, com­pared to, say, TypeScript/React/CSS. As we saw from my list above, to do most any­thing be­yond the ba­sics, you’re go­ing to need to reach down into Win32 in­terop any­way. If you use some­thing like Tauri, you don’t even need to bun­dle a whole Chromium bi­nary: you can use the sys­tem we­b­view. Ironically, the sys­tem we­b­view re­ceives up­dates every 4 weeks (soon to be 2?), whereas the sys­tem .NET is per­pet­u­ally stuck at ver­sion 4.8.1!

It’s still pos­si­ble for Microsoft to turn this around. The Windows App SDK ap­proach does seem like an im­prove­ment over the long di­gres­sion into WinRT and UWP. I’ve iden­ti­fied some low-hang­ing fruit around pack­ag­ing and de­ploy­ment above, which I’d love for them to act on. And their re­cent an­nounce­ment of a fo­cus on Windows qual­ity in­cludes a line about us­ing WinUI 3 more through­out the OS, which could in the­ory trickle back into im­prov­ing WinUI it­self.

I’m not hold­ing my breath. And from what I can tell, nei­ther are most de­vel­op­ers. The Hacker News com­men­tariat loves to be­moan the death of na­tive apps. But given what a mess the Windows app plat­form is, I’ll pick the web stack any day, with Electron or Tauri to bridge down to the rel­e­vant Win32 APIs for OS in­te­gra­tion.

...

Read the original on domenic.me »

9 223 shares, 20 trendiness

25 Years of Eggs

Everyone needs a re­ward­ing hobby. I’ve been scan­ning all of my re­ceipts since 2001. I never typed in a sin­gle price - just kept the im­ages. I fig­ured some­day the tech­nol­ogy to read them would catch up, and the data would be in­ter­est­ing.

This year I tested it. Two AI cod­ing agents, 11,345 re­ceipts. I started with eggs. If you can track one item across 25 years of gar­bled ther­mal prints, OCR fail­ures, and folder ty­pos, you can track any­thing.

14 days. 1.6 bil­lion to­kens. 589 egg re­ceipts found. Here’s what the data says.

Ok so let’s make a pro­ject plan. In the ~/Records/ we have a ton of re­ceipts. Many are pdf/​im­age/​etc. I want to go through and ex­tract the ac­tual con­tent of the re­ceipts to find how much we spend on eggs. Receipts are no­to­ri­ously ter­ri­ble to OCR, so we might need to do some­thing more ad­vanced.

Codex ex­plored my file sys­tem, found two ex­ist­ing SQLite data­bases I’d for­got­ten about, dis­cov­ered 11,345 re­ceipts across PDFs, emails, and im­ages, and came back with a pro­ject plan. I said write this out to a plan.md please.” It did. We were build­ing within the hour.

The whole thing took 14 days. Maybe 15 hours of me ac­tu­ally at the key­board - short bursts of di­rec­tion-giv­ing sep­a­rated by long stretches of the agents just run­ning. Codex ran 15 in­ter­ac­tive ses­sions. Claude han­dled 10.

The old­est re­ceipts were flatbed scans - mul­ti­ple re­ceipts per page, ran­dom ori­en­ta­tions, white pa­per on a white scan­ner bed. Codex and I tried seven clas­si­cal CV ap­proaches to find re­ceipt bound­aries. Edge de­tec­tion. Adaptive thresh­old­ing. Contour analy­sis. Morphological op­er­a­tions. Watershed seg­men­ta­tion. Template match­ing. A grid-based de­com­po­si­tion I pitched as a clas­sic HackerRank prob­lem.”

None of them worked. The core is­sue: re­ceipts are white and so is the scan­ner bed. I started call­ing it the shades of white” prob­lem. The clever­est at­tempt was in­spired by re­mov­ing tourists from land­mark pho­tos - stack all scans, com­pute the me­dian pixel at each po­si­tion, sub­tract to re­veal edges. I thought that one was go­ing to work. Best F1: 0.302.

We also threw ma­cOS Vision OCR at it (via a Swift script Codex wrote on the fly), Tesseract, sev­eral other tools. I was start­ing to think the flatbed scans might just be a loss. Then I tried Meta’s SAM3.

One API call with text=“re­ceipt”. 0.92-0.98 con­fi­dence on every bound­ary. Four sec­onds per scan. 1,873 re­ceipts from 760 multi-re­ceipt pages. Seven ap­proaches in hours; SAM3 in an af­ter­noon.

Receipts land at ran­dom an­gles, and OCR needs them up­right. We tried Tesseract’s ori­en­ta­tion de­tec­tion, ma­cOS Vision OCR, Moondream 2 and 3 - each one bet­ter than the last but none re­li­able enough. Then I re­al­ized that every time I pasted a re­ceipt into our Claude con­ver­sa­tion to de­bug some­thing, it was al­ready read­ing the text per­fectly. Rotated, faded, did­n’t mat­ter.

Why am I build­ing a ro­ta­tion pipeline when the tool I’m talk­ing to al­ready solves this? So we sent all 11,345 re­ceipts through Sonnet and Codex. Sometimes the an­swer is star­ing you right in the face.

Halfway through the pro­ject, Tesseract was the weak link. It read OAT MILK as OATH ILK.” It dropped dec­i­mals - $4.37 be­came $437. On old ther­mal prints it pro­duced noth­ing at all. Codex opened 20 of the worst ones by hand and found that some weren’t even re­ceipts. A fam­ily photo. A post­card. A greet­ing card. All filed un­der Receipts.”

I found PaddleOCR-VL - a 0.9B pa­ra­me­ter vi­sion-lan­guage model that runs lo­cally on Apple Silicon. First test on a sam­ple bank state­ment: clean, ac­cu­rate text in 2.1 sec­onds. Tesseract was faster but dra­mat­i­cally nois­ier. Second test on a tall Fred Meyer re­ceipt: dis­as­ter. The model en­tered a rep­e­ti­tion loop, hal­lu­ci­nat­ing TILL YGRT end­lessly.

The fix turned out to be sim­ple - split tall re­ceipts into slices. Dynamic slic­ing based on as­pect ra­tio: num_s­lices = max(2, round(as­pec­t_ra­tio / 1.5)). Five par­al­lel shards ran overnight. GPU pegged at 100% for 10.8 hours. In the morn­ing: 11,345 re­ceipts OCR’d suc­cess­fully. Cleaner text for every re­ceipt in the archive.

PaddleOCR-VL is­n’t a Codex re­place­ment - it can’t do struc­tured ex­trac­tion or fol­low in­struc­tions. It’s a bet­ter Tesseract. The real pipeline: re­ceipt im­age → PaddleOCR-VL (local, clean text) → Codex/Claude (structured ex­trac­tion).

Once re­ceipts were seg­mented, ori­ented, and OCR’d, they needed struc­tured ex­trac­tion - find the egg line items, pull prices and quan­ti­ties.

It started with regex. The mod­els love regex. Keyword match­ing for egg,” money pat­terns for prices. Heuristics found eggs in 25/25 pos­i­tive sam­ples with 0 false pos­i­tives. Not bad. But on the full cor­pus, false neg­a­tives piled up - Fred Meyer ab­bre­vi­ated codes like STO LRG BRUNN, Whole Foods trun­cated to EDGS, OCR man­gled EGGS into LG EGO 12 CT. No regex catches these.

So I told Codex we have un­lim­ited to­kens, let’s use them all,” and we piv­oted to send­ing every re­ceipt through Codex for struc­tured ex­trac­tion. From that one sen­tence, Codex came back with a par­al­lel worker ar­chi­tec­ture - shard­ing, health man­age­ment, check­point­ing, retry logic. The whole thing. When I ran out of to­kens on Codex mid-run, it auto-switched to Claude and kept go­ing. I did­n’t ask it to do that. I did­n’t know it had hap­pened un­til I read the logs.

But the runs kept crash­ing. Long CLI jobs died when ses­sions timed out. The script com­mit­ted re­sults at end-of-run, so early deaths lost every­thing. I watched it hap­pen three times. On the fourth at­tempt I said I would have ex­pected we start a new process per batch.” That was the fix - one fresh process per batch, hard call cap, exit cleanly, re­sume from cache. Codex patched it, launched it in a tmux ses­sion, and the ETA dropped from 12 hours to 3. Not a hard fix. Just the kind of thing you know af­ter you’ve watched enough overnight jobs die at 3 AM.

11,345 re­ceipts processed. The thing that was sup­posed to take all night fin­ished be­fore I went to bed.

First I needed ground truth. I asked Claude to build me a la­bel­ing tool - key­board-first, re­ceipt im­age on the left, clas­si­fi­ca­tion data on the right, ar­row keys to nav­i­gate, sin­gle key­press to ver­dict. It built the whole Flask app in 22 min­utes. I sat down and hand-la­beled 375 re­ceipts.

Regex found 650 re­ceipts men­tion­ing egg.” Against those 375 la­bels: 88% re­call. The misses told the story - ab­bre­vi­ated codes, OCR gar­ble, trun­cated de­scrip­tions. No key­word search catches STO LRG BRUNN.

The fix: use those hand-la­beled edge cases as few-shot ex­am­ples in an LLM clas­si­fier. Twenty ex­am­ples of what eggs” looks like on a gar­bled ther­mal print from 2003. Batch 10 re­ceipts per call. Eight par­al­lel work­ers. Two hours. 11,345 re­ceipts clas­si­fied.

Final ac­cu­racy: 99%+. Every sup­posed miss” by the LLM turned out to be a mis­la­bel in the ground truth. A bi­cy­cle shop re­ceipt the old heuris­tic had flagged. A bar­code-only scan. Egg noo­dles. The clas­si­fier was more cor­rect than my la­bels.

Then more QA. A sec­ond tool for eye­balling 497 weak im­ages: Space for no-eggs, X for has-eggs. A third for data en­try on 95 re­ceipts with miss­ing fields - numpad-op­ti­mized, auto-ad­vanc­ing. Four tools to­tal, each built in min­utes, each one I ground through by hand.

So how good is the data? I pulled 372 ran­dom sam­ples and checked them by hand. Initially: 96% cor­rect. The er­rors were mostly gar­bled OCR on old scans. One was a hal­lu­ci­na­tion - the pipeline fab­ri­cated egg data for a re­ceipt that con­tained no eggs at all.

* Email re­ceipts silently pre­fer­ring text/​plain over text/​html, drop­ping pric­ing lines that only ex­isted in the HTML part

Here’s what made the qual­ity good: every time I caught some­thing, I could show the agents what to look for and they’d go fix it every­where. I caught a store ad­dress hid­ing in OCR noise: 915 Ny 45th St” was 915 NW 45th St, Seattle. I showed them the pat­tern, they ran a re­cov­ery pass on 40 miss­ing-lo­ca­tion re­ceipts - all 40 re­solved.

Codex and Claude are ex­cel­lent at build­ing tools and ex­tract­ing struc­tured data, but they could­n’t seg­ment an im­age or re­place an OCR en­gine. The right an­swer was a stack of spe­cial­ized mod­els - SAM3 for seg­men­ta­tion, PaddleOCR for text, Codex and Claude for every­thing else. I ex­pected this, but it was worth try­ing the sim­ple path first.

These are the days of mir­a­cle and won­der. I can’t wait to see what 30 years of eggs looks like.

...

Read the original on john-rush.com »

10 203 shares, 14 trendiness

my first patch to the linux kernel

How a sign-ex­ten­sion bug in C made me pull my hair out for days but be­came my first patch to the Linux ker­nel!

A while ago, I started dip­ping my toe into vir­tu­al­iza­tion. It’s a topic that many peo­ple have heard of or are us­ing on a daily ba­sis but a few know and think about how it works un­der the hood.

I like to learn by rein­vent­ing the wheel, and nat­u­rally, to learn vir­tu­al­iza­tion I started by try­ing to build a Type-2 hy­per­vi­sor. This ap­proach is sim­i­lar to how KVM (Linux) or bhyve (FreeBSD) are built.

My ex­per­i­men­tal hy­per­vi­sor (and VMM) is still a work-in-progress and is avail­able on my Github: poolad­khay/​evmm.

Since vir­tu­al­iza­tion is hard­ware as­sisted these days , the hy­per­vi­sor needs to com­mu­ni­cate di­rectly with the CPU by run­ning cer­tain priv­i­leged in­struc­tions; which means a Type-2 hy­per­vi­sor is es­sen­tially a Kernel Module that ex­poses an API to the user-space where a Virtual Machine Monitor (VMM) like QEMU or Firecracker is run­ning and or­ches­trat­ing VMs by uti­liz­ing that API.

In this post, I want to de­scribe ex­actly how I found that bug. But to make it a bit more ed­u­ca­tional, I’m go­ing to set the stage first and talk about a few core con­cepts so you can see ex­actly where the bug emerges.

The x86 ar­chi­tec­ture in pro­tected mode (32-bit mode) en­vi­sions a task switch­ing mech­a­nism that is fa­cil­i­tated by the hard­ware. The ar­chi­tec­ture de­fines a Task State Segment (TSS) which is a re­gion in the mem­ory that holds in­for­ma­tion about a task (General pur­pose reg­is­ters, seg­ment reg­is­ters, etc.). The idea was that any given task or thread would have its own TSS, and when the switch hap­pens, a spe­cific reg­is­ter (Task Register or TR) would get up­dated to point to the new task .

This was aban­doned in fa­vor of soft­ware-de­fined task switch­ing which gives more gran­u­lar con­trol and porta­bil­ity to the op­er­at­ing sys­tem ker­nel.

But the TSS was not en­tirely aban­doned. In mod­ern days (64-bit sys­tems) the ker­nel uses a TSS-per-core ap­proach where the main job of TSS is to hold a few stack point­ers that are very crit­i­cal for the ker­nel and CPUs nor­mal op­er­a­tion. More specif­i­cally, it holds the ker­nel stack of the cur­rent thread which is used when the sys­tem wants to switch from user-space to the ker­nel-space.

It also holds a few known good stacks for crit­i­cal events like Non-Maskable Interrupts (NMIs) and Double Faults. These are events that if not han­dled cor­rectly, can cause a triple fault and crash a CPU core or cause an im­me­di­ate sys­tem re­boot.

We know that mem­ory ac­cess is gen­er­ally con­sid­ered to be ex­pen­sive and caching val­ues some­where on the CPU die is the pre­ferred ap­proach if pos­si­ble. This is where the TR reg­is­ter comes into the pic­ture. It has a vis­i­ble part which is a 16-bit off­set that we have al­ready dis­cussed as well as a hid­den part that holds di­rect in­for­ma­tion about the TSS (Base ad­dress, Limit, and Access rights). This saves the CPU the trou­ble of in­dex­ing into the GDT to even­tu­ally find the TSS every time it’s needed.

A hy­per­vi­sor is es­sen­tially a task switcher where tasks are op­er­at­ing sys­tems. In or­der for mul­ti­ple op­er­at­ing sys­tems to run on the same sil­i­con chip, the hy­per­vi­sor must swap the en­tire state of the CPU which in­cludes up­dat­ing the hid­den part of the TR reg­is­ter as well.

In a pre­vi­ous blog post I de­scribed how Intel im­ple­mented their vir­tu­al­iza­tion ex­ten­sion (VT-x) and how each vCPU (vCore) is given its own VMCS (Virtual Machine Control Structure) block where its state is saved to or re­stored from by the hard­ware when switch­ing be­tween host and guest OSes.

I sug­gest read­ing that post if you’re in­ter­ested in the topic but VMCS con­sists of four main ar­eas:

Host-state area has two fields which cor­re­spond to the vis­i­ble part and one of the hid­den parts (base ad­dress) the TR reg­is­ter:

While guest-state area has four (one vis­i­ble plus all three hid­den parts):

The rea­son is that the hard­ware as­sumes the host OS to be a mod­ern 64-bit op­er­at­ing sys­tem where TR limit and Access Rights are fixed known val­ues (0x67 and 0x11 re­spec­tively). But the guest OS can be vir­tu­ally any op­er­at­ing sys­tem with any con­straints.

Naturally, it is the hy­per­vi­sor’s job to set these val­ues on ini­tial run and to up­date them when needed (e.g. when the ker­nel thread that is run­ning a vCPU is mi­grated to an­other phys­i­cal CPU core, the hy­per­vi­sor must up­date the host state to match the new core).

To set these val­ues, I borrowed” some code from the linux ker­nel tree (KVM self­tests):

vmwrite(HOST_TR_BASE,

get_de­sc64_base((struct de­sc64 *)(get_gdt().address + get_tr())));

This piece of code does the fol­low­ing:

* Gets the ad­dress of GDT.

* Indexes into it us­ing the value of TR reg­is­ter.

* Parses the TSS seg­ment de­scrip­tor and ex­tracts the mem­ory ad­dress of TSS.

* Writes the ad­dress into the HOST_TR_BASE sec­tion of the VMCS us­ing the spe­cial VMWRITE in­struc­tion .

So far, so good!

If for any rea­son this op­er­a­tion fails to ex­tract and write the cor­rect ad­dress, upon the next con­text switch from user-space to ker­nel-space (or next NMI or next Double fault), when the CPU hard­ware tries to read the ker­nel stack from the TSS to up­date the Stack Pointer reg­is­ter, it ei­ther re­ceives garbage or an un­mapped ad­dress. Either way, the CPU will even­tu­ally face a dou­ble fault (a fault that hap­pens when try­ing to han­dle an­other fault like a page fault) and when try­ing to use one of the known good stacks for han­dling the dou­ble fault, it will fail again which will make it a triple fault and BOOM! The core dies or we get a sud­den re­boot.

Now lets talk about the is­sue that I was fac­ing.

I started de­vel­op­ing my hy­per­vi­sor on a vir­tu­al­ized in­stance of Fedora, to avoid crash­ing my ma­chine in case some­thing went wrong. By the time I re­al­ized some­thing is in­deed wrong, I had al­ready de­vel­oped the abil­ity to put the CPU in VMX op­er­a­tion, run a hard­coded loop in VMX non-root mode that would use the VMCALL in­struc­tion to trap into the hy­per­vi­sor (VMX root) and ask it to print a mes­sage, then re­sume the loop (VMRESUME).

Additionally, VMCS was pro­grammed to trap ex­ter­nal in­ter­rupts (e.g. timer ticks). Upon an exit, the hy­per­vi­sor would check if we (the cur­rent ker­nel thread) needs to be resched­uled, keep­ing the ker­nel sched­uler happy.

I was us­ing pre­empt no­ti­fier api which lets threads pro­vide two cus­tom func­tions (sched_in and sched_out) that are called by the sched­uler when it’s about to de­sched­ule the thread as well as right be­fore resched­ul­ing it. These func­tions are then re­spon­si­ble for cleanups and ini­tial­iza­tion work that is re­quired.

In my case, sched_out would un­load the VMCS from the cur­rent core, and sched_in would load it on the new core while reini­tial­iz­ing it us­ing a se­ries of VMWRITEs to match the new core’s state.

On my vir­tu­al­ized dev en­vi­ron­ment with only three vC­PUs, every­thing was work­ing just fine. Until I de­cided to give it a try on my main ma­chine where the hy­per­vi­sor would talk to an ac­tual phys­i­cal CPU.

Seconds af­ter run­ning the loop, the sys­tem crashed, in a very un­pre­dictable way. I was log­ging the core switches and did­n’t find any mean­ing­ful cor­re­la­tion be­tween the last core num­ber and the crash. Additionally, some­times it would last longer and some­times it was im­me­di­ate. After in­ves­ti­gat­ing ker­nel logs a few times, I saw a pat­tern in the se­quence of events that caused the sys­tem to even­tu­ally hang:

* The Fatal VM-Exit: An NMI trig­gered a VM-Exit on CPU 5 and nat­u­rally the hard­ware tried to lo­cate a valid ker­nel stack from TSS to han­dle the priv­i­lege tran­si­tion.

* Core Death: CPU 5 hit a fa­tal Page Fault at­tempt­ing to read an un­mapped mem­ory ad­dress, re­sult­ing in a Kernel Oops. CPU 5 was left com­pletely par­a­lyzed with in­ter­rupts dis­abled.

* IPI Lockup: CPU 6 at­tempted a rou­tine sys­tem-wide up­date (kernel text patch­ing) re­quir­ing an Inter-Processor Interrupt (IPI) ac­knowl­edg­ment from all cores. CPU 6 be­came per­ma­nently stuck in an in­fi­nite loop wait­ing for the dead CPU 5 to re­spond.

* Cascading Paralysis: As other cores (3, 8, 11, etc.) at­tempted stan­dard cross-core com­mu­ni­ca­tions (like mem­ory map TLB flushes and RCU syn­chro­niza­tions), they too fell into the IPI trap, wait­ing in­def­i­nitely for CPU 5.

* Terminal State: The RCU sub­sys­tem starved, pe­riph­eral dri­vers (like Wi-Fi) crashed from time­outs, and the sys­tem en­tered a to­tal, un­re­cov­er­able dead­lock.

So why no triple faults?!

The Kernel Oops killed the ac­tive task and halted op­er­a­tions on CPU 5. However, it left CPU 5 in a zombie” state. Alive enough to keep the moth­er­board pow­ered on, but with its in­ter­rupts dis­abled, mak­ing it en­tirely un­re­spon­sive to the rest of the sys­tem.

Soon I re­al­ized that the hy­per­vi­sor works ab­solutely fine when pinned to one core (e.g. via taskset com­mand), so there must be some­thing hap­pen­ing while mov­ing be­tween cores. Additionally, I did­n’t dare to ques­tion the code I stole from the Linux ker­nel source, and I was try­ing hard to find an is­sue in the code I wrote my­self. This even­tu­ally led to rewrit­ing a por­tion of the hy­per­vi­sor code with an al­ter­na­tive method which would achieve the same goal.

For ex­am­ple, from read­ing Intel’s Software Developer Manual (SDM) , I knew that when mov­ing from core A to core B, core A must run the VMCLEAR in­struc­tion to un­load the VMCS, and only then can core B load the VMCS us­ing the VMPTRLD to be able to ex­e­cute the guest code. For that, I was us­ing sm­p_­cal­l_­func­tion_s­in­gle which re­lies on IPIs to run a piece of code on an­other CPU, that I re­placed with the pre­empt no­ti­fiers.

Eventually, (while pulling my hair out) I re­al­ized I have elim­i­nated all pos­si­ble parts of the hy­per­vi­sor that played a role in mov­ing be­tween cores.

Then there was an­other clue!

While run­ning the hy­per­vi­sor on my vir­tual dev en­vi­ron­ment (QEMU + Fedora) I ob­served that by in­creas­ing the num­ber of vCores, I can re­pro­duce the is­sue and there is also a new be­hav­ior. Sometimes the VM re­boots im­me­di­ately (instead of freez­ing) and af­ter the re­boot, there is no trace of any logs re­lated to the pre­vi­ous ses­sion. And I con­cluded that a triple fault has hap­pened.

This turned my at­ten­tion to the TR and TSS. I started look­ing for al­ter­na­tive ways of set­ting the HOST_TR_BASE and re­al­ized that the KVM it­self (not KVM self­tests) uses a dif­fer­ent method:

* Linux uses per-cpu TSS and GDT, so set these when switch­ing

* proces­sors. See 22.2.4.

vm­c­s_writel(HOST_TR_BASE, (unsigned long)&get_cpu_en­try_area(cpu)->tss.x86_tss);

And that was it! Using this method to set HOST_TR_BASE fixed my hy­per­vi­sor and helped me keep what­ever san­ity I had left.

Remember that piece of code I took from the ker­nel source. It used the get_de­sc64_base func­tion to ex­tract and write the ad­dress of TSS into the HOST_TR_BASE. This func­tion has this de­f­i­n­i­tion:

sta­tic in­line uin­t64_t get_de­sc64_base(const struct de­sc64 *desc)

re­turn ((uint64_t)desc->base3 << 32) |

(desc->base0 | ((desc->base1) << 16) | ((desc->base2) << 24));

TSS seg­ment de­scrip­tor has four fields that must be stitched to­gether to form the ad­dress of the TSS .

The C stan­dard dic­tates Integer Promotion. Whenever a type smaller than an int is used in an ex­pres­sion, the com­piler au­to­mat­i­cally pro­motes it to a stan­dard int (which is a 32-bit signed in­te­ger on mod­ern x86-64 ar­chi­tec­tures) be­fore per­form­ing the op­er­a­tion.

If an int can rep­re­sent all val­ues of the orig­i­nal type (as re­stricted by the width, for a bit-field), the value is con­verted to an int; oth­er­wise, it is con­verted to an un­signed int. These are called the in­te­ger pro­mo­tions. All other types are un­changed by the in­te­ger pro­mo­tions.

This pro­mo­tion has a con­se­quence: if the re­sult­ing value af­ter pro­mo­tion has a 1 in its most sig­nif­i­cant bit (32nd bit), this value con­sid­ered neg­a­tive by the com­piler and if casted to a larger type like a uin­t64_t in our case, sign ex­ten­sion hap­pens.

Lets see an ex­am­ple:

We have an 8-bit un­signed in­te­ger (uint8_t) with 11001100 bit pat­tern. If we left-shift it by 24, it still can be rep­re­sented by an int which is 32 bits long. So the com­piler gen­er­ates this value: 11001100000000000000000000000000 and con­sid­ers it to be an int which is a signed type.

Now if we try to per­form any op­er­a­tion on this value, it would fol­low the pro­to­col for signed val­ues. In our case, we are ORing it with a uin­t64_t. So the com­piler would cast our int (a 32-bit signed value) into uin­t64_t (a 64-bit un­signed value), which is where the sign-ex­ten­sion hap­pens which would turn our value to 11111111111111111111111111111111_11001100000000000000000000000000 be­fore OR hap­pens.

Saw the prob­lem?

Because the up­per 32 bits are sign-ex­tended to all 1s (Hex: 0xFFFFFFFF), the bit­wise OR op­er­a­tion com­pletely de­stroys base3 (In a bit­wise OR, 1 | X equals 1). Therefore, what­ever data was in base3 is per­ma­nently over­writ­ten by the 1s from the sign ex­ten­sion.

Here is an ac­tual ex­am­ple with real” ad­dresses:

base0 = 0x5000

base1 = 0xd6

base2 = 0xf8

base3 = 0xfffffe7c

Expected re­turn: 0xfffffe7cf8d65000

Actual re­turn: 0xfffffffff8d65000

This also ex­plains when the prob­lem would hap­pen: Only and only if base2 has a 1 as its most sig­nif­i­cant bit. Any other value would not cor­rupt the re­sult­ing ad­dress.

The fix is ac­tu­ally very sim­ple. We must cast val­ues to un­signed types be­fore the bit-shift op­er­a­tion:

sta­tic in­line uin­t64_t get_de­sc64_base(const struct de­sc64 *desc)

re­turn (uint64_t)desc->base3 << 32 |

(uint64_t)desc->base2 << 24 |

(uint64_t)desc->base1 << 16 |

(uint64_t)desc->base0;

This will pre­vent the sign-ex­ten­sion from hap­pen­ing.

Finally, this is the patch I sent, which was ap­proved and merged:

20251222174207.107331-1-mj@pooladkhay.com/“>https://lore.kernel.org/kvm/20251222174207.107331-1-mj@pooladkhay.com/

I can’t fin­ish this post with­out talk­ing about AI!

You may won­der whether I tried ask­ing an LLM for help or not. Well, I did. In fact it was very help­ful in some tasks like sum­ma­riz­ing ker­nel logs [^13] and ex­tract­ing the gist of them. But when it came to de­bug­ging based on all the clues that were avail­able, it con­cluded that my code did­n’t have any bugs, and that the CPU hard­ware was faulty.

...

Read the original on pooladkhay.com »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.