10 interesting stories served every morning and every evening.

Steam Machine

store.steampowered.com

© Valve Corporation. All rights re­served. All trade­marks are prop­erty of their re­spec­tive own­ers in the US and other coun­tries. Privacy Policy |  Legal |  Accessibility |  Steam Subscriber Agreement |  Refunds |  Cookies

Steam Hardware - Steam Machine launches today! - Steam News

store.steampowered.com

© Valve Corporation. All rights re­served. All trade­marks are prop­erty of their re­spec­tive own­ers in the US and other coun­tries. Privacy Policy |  Legal |  Accessibility |  Steam Subscriber Agreement |  Refunds |  Cookies

Pledging Another $400,000 to the Zig Software Foundation

mitchellh.com

My fam­ily is pledg­ing an­other $400,0001 to the Zig Software Foundation (ZSF). This brings our to­tal pledged sup­port for ZSF to $700,000, af­ter our ini­tial do­na­tion in 2024.

Zig con­tin­ues to earn my re­spect as a tech­ni­cal pro­ject and as a com­mu­nity. The 2026 de­vlog shows steady progress on the hard prob­lems of build­ing an ex­cel­lent lan­guage and com­piler. I also deeply re­spect the pro­jec­t’s ap­proach to main­tain­er­ship and com­mu­nity, re­flected in ini­tia­tives like Loris Cro’s Contributor Poker and Zig’s AI Ban. That phi­los­o­phy con­tin­ues to at­tract and de­velop some of the most tal­ented peo­ple in open source.

Recently, Zig’s strict no-LLM con­tri­bu­tion pol­icy be­came a pub­lic topic of dis­cus­sion again, es­pe­cially in the con­text of Bun’s Zig fork and Rust rewrite. I have no prob­lem with what Bun did, I think Bun is a great pro­ject, and I’m not in­ter­ested in turn­ing this into a Bun post. Instead, what stood out to me was how quickly peo­ple vil­lainized one an­other. Too much of the con­ver­sa­tion lacked em­pa­thy and re­spect for view­points dif­fer­ent from our own.

I use AI heav­ily. I’ve writ­ten about my AI adop­tion jour­ney and ship­ping real fea­tures with AI as­sis­tance. I’m also quite vo­cal about re­main­ing ra­tio­nal about its ca­pa­bil­i­ties and frus­trated with its neg­a­tive im­pacts on open source.

The point is that I have opin­ions. Those opin­ions don’t fully align with ZSFs ap­proach. And yet, I have noth­ing but re­spect for ZSF: the peo­ple, the poli­cies, and the pro­ject. Part of what makes the in­ter­net and open source great is that pro­jects can be weird and dif­fer­ent. They can set un­usual bound­aries, build their own cul­ture, and pur­sue qual­ity in ways that won’t make sense to every­one.

Zig is ex­cep­tional soft­ware: am­bi­tious, prac­ti­cal, in­de­pen­dent, and un­usu­ally se­ri­ous about qual­ity. Ghostty ex­ists in large part be­cause Zig made it pos­si­ble for me to build the kind of soft­ware I wanted to build. This is why I sup­port Zig.

I’m proud to sup­port Zig and the Zig Software Foundation again. Please con­sider do­nat­ing if you can.

Footnotes

$200,000 per year split over two years, the same struc­ture as our 2024 do­na­tion. ↩

$200,000 per year split over two years, the same struc­ture as our 2024 do­na­tion. ↩

Never Give Them Your Face

nevergivethemyourface.com

They want your face. It will be called safety. Verification. Age as­sur­ance. A small step to pro­tect chil­dren. But strip the lan­guage away and the de­mand is plain: be­fore you may speak, post, or read, you must first prove who you are. And the only way they’ve fig­ured out how to do it is with your gov­ern­ment ID, or with your face held up to a cam­era that de­cides whether you are old enough to be trusted. This is the deal now be­ing writ­ten into law on three con­ti­nents, and you are meant to ac­cept it qui­etly. Don’t.

It’s al­ways won’t some­one think of the chil­dren?!”. But this af­fects every­one.

No one dis­putes that the in­ter­net can hurt kids. That grief is real, but it’s be­ing ex­ploited. Here is the trick: to con­firm that a child is not pre­sent, a ser­vice has to check every­body. Every adult passes through the check­point. A law writ­ten about six­teen-year-olds qui­etly be­comes an iden­tity re­quire­ment for the en­tire in­ter­net. You are not carded be­cause you are sus­pected of any­thing. You are carded be­cause card­ing has be­come the price of ad­mis­sion to life on the web.

We run back­ground checks on peo­ple who want to buy a gun, but we do not back­ground check every­one at all times just in case. Yet that is ex­actly the de­sign here. It’s a per­mit check at the door of every con­ver­sa­tion, ap­plied to all, jus­ti­fied by the few.

It is not age ver­i­fi­ca­tion. It is iden­tity ver­i­fi­ca­tion.

Watch the words drift. This whole sys­tem was sold as age as­sur­ance, which is a yes-or-no ques­tion, are you over eigh­teen? But al­most none of these sys­tems are built to an­swer only that. They are built to know who you are: your name, your date of birth, your doc­u­ment num­ber, your face. This is not age ver­i­fi­ca­tion at all. It is forced iden­tity track­ing. Your real-world iden­tity cap­tured by not only Meta, Facebook, Twitter, Instagram, etc, but shared broadly with every creepy agency you al­ready worry about having all your data”.

Name the places now de­mand­ing age ver­i­fi­ca­tion,” and see how many will ac­cept a plain gov­ern­ment doc­u­ment that says only that you are over eigh­teen — and noth­ing else. Almost none will. Because age was never the point.

Name the places now de­mand­ing age ver­i­fi­ca­tion,” and see how many will ac­cept a plain gov­ern­ment doc­u­ment that says only that you are over eigh­teen — and noth­ing else. Almost none will. Because age was never the point.

We spent a gen­er­a­tion teach­ing peo­ple the first rule of the in­ter­net: never give out your real iden­tity to strangers. We have a word, doxxing, for in­flict­ing that ex­po­sure on some­one against their will. And now the same gov­ern­ments and plat­forms are ask­ing every cit­i­zen to do it to them­selves, vol­un­tar­ily, as a con­di­tion of log­ging in.

You can change a pass­word. You can­not change your face.

A leaked pass­word is an in­con­ve­nience. You re­set it and move on. Your face, your dri­ver’s li­cense, the unique geom­e­try a scan­ner re­duces to a num­ber can­not be re­set. A face scan is not a pho­to­graph. It is a three-di­men­sional map of you, a bio­met­ric tem­plate pre­cise enough to be matched later against a sur­veil­lance cam­era on a street cor­ner. When you hand over and it lives on some­one else’s server, of­ten a third-party ven­dor you never chose, can­not name, and can­not hold to ac­count.

Every one of those data­bases is a hon­ey­pot. The ver­i­fier promises your doc­u­ments are deleted the mo­ment they are checked. They are not al­ways deleted, and the promise is worth­less the day the com­pany is breached. Remember the last twenty years of worth­les $17.99 Equifax IDentityGuard+ cred­its from all those data breaches? It has hap­pened, it will hap­pen again, ex­cept this time it’s not your email, hashed pass­word, or even your SSN. It’s your face and pass­port that’s for sale on the dark web.

It does not work — and it makes the dan­ger worse.

Here is the in­sult be­neath the in­jury: it fails at the one thing it promises. Determined teenagers route around age gates like breath­ing — a bor­rowed lo­gin, a VPN, a check­box, a ver­i­fied ac­count bought for the price of a cof­fee. Within hours of one plat­form rolling out age brack­ets, pre-ver­i­fied ac­counts for every age were for sale on eBay. Teenagers ma­chete their way through tech­nolo­gies de­signed to protect” or limit them in the same way that wa­ter finds the cracks in the wall. They have all the time in the world, all of the in­cen­tives, and all of the so­cial struc­ture and ob­fus­cated chat chan­nels to do it.

Worse, the ar­chi­tec­ture built to protect” chil­dren can en­dan­ger them. Sort users into age-la­beled pens and you have not only failed to stop a preda­tor, you have cre­ated a chil­dren in­dex, a phone­book, a way to fil­ter di­rectly for chil­dren. Teenagers pushed off main­stream plat­forms do not stop go­ing on­line (see point about wa­ter above). They move to smaller, darker, un­mod­er­ated cor­ners, away from the very over­sight that was sup­posed to keep them safe. The chil­dren are not saved. The sur­veil­lance is the only thing that sur­vives in­tact.

Safe now, ??? later

The data­base you are help­ing build for a trust­wor­thy gov­ern­ment does not stay in trust­wor­thy hands. Administrations change. A reg­istry that merely cat­a­logs who you are to­day be­comes, un­der a fu­ture gov­ern­ment, a map of who to find. We al­ready know that US fed­eral agen­cies spy on cit­i­zens whole­sale: who at­tended which protest, who read which fo­rum, who be­longs to which group. People are right to be afraid of what a hos­tile regime would do with a ready-made list. The data does not for­get, and it does not take sides. It sim­ply waits for who­ever holds it next.

The whole in­ter­net starts to feel like the of­fice: every­one too fright­ened to say any­thing but the safe thing, lest a real name at­tached to a real opin­ion cost them a real job.

The whole in­ter­net starts to feel like the of­fice: every­one too fright­ened to say any­thing but the safe thing, lest a real name at­tached to a real opin­ion cost them a real job.

A prin­ci­pled stance

Most peo­ple are fine with this, based on the same de­bunked nothing to hide” fal­lac­ies that are al­ways trot­ted out in these con­ver­sa­tions. Surveys find over­whelm­ing ma­jori­ties want chil­dren pro­tected on­line, and large ma­jori­ties say they sup­port age ver­i­fi­ca­tion in the ab­stract.

This is not a pop­u­lar­ity con­test, and re­fusal is not a vote you are try­ing to win. A ver­i­fi­ca­tion regime does not need your ap­proval — it needs your par­tic­i­pa­tion. It only works if nearly every­one com­plies. The point of re­fusal is not to per­suade a ma­jor­ity be­fore act­ing; it is to deny the sys­tem the uni­ver­sal co­op­er­a­tion it re­quires to func­tion at all. You do not need to win the poll. Just don’t up­load the photo. Never give them your face.

If Starbucks asked to scan your ID and put it in a na­tional data­base to sell you a latte, would you give it to them? No, be­cause you value your iden­tity more than your latte. Do you not value your iden­tity more than your abil­ity to see some ran­dom cousin post about their re­pug­nant po­lit­i­cal opin­ions or a pic­ture of some­one’s dog?

I am but one

In the­ory, us nor­mal in­ter­net users can stop this whole sys­tem by opt-ing out, by boy­cotting the process. Imagine a National Month of Identity Choice”, where no one used any plat­form de­mand­ing your face, no one logged on, no one saw any ads, no one bought any spon­sored pro­jects. The plat­forms would see mas­sive rev­enue drops, and there would be in­tense lob­by­ing to re­verse these aw­ful laws. We can do it.

The only word they can­not route around is no.

These sys­tems run on com­pli­ance. They as­sume you will sigh, up­load the photo, and move on. Their en­tire busi­ness model de­pends on it. Which is also their weak­ness. A ver­i­fi­ca­tion wall that no one ver­i­fies for is a wall with no one stand­ing at it.

So refuse. Refuse the scan. Refuse the up­load. Close the ac­counts that de­mand it and tell them, in writ­ing, ex­actly why you are leav­ing. The plat­forms need you far more than you need them. \You can live with­out the feed, they can­not live with­out the crowd. Do not com­ply in ad­vance. The face on your ID is the most per­ma­nent thing you own.

Never give them your face.

The request could not be satisfied

ipvm.com

403 ERROR

Generated by cloud­front (CloudFront) Request ID: ET9IvxeE6AAzJp21UsurB-YySo1HpR1AbkgRjx_mQlRDfbiQSsHjfg==

GLM-5.2 - How to Run Locally | Unsloth Documentation

unsloth.ai

Models

GLM-5.2 - How to Run Locally

Run the new GLM-5.2 model by Z.ai on lo­cal hard­ware!

GLM-5.2 is Z.ai’s new open model, de­liv­er­ing SOTA per­for­mance across long-hori­zon cod­ing, rea­son­ing, and agen­tic tasks. With 744B pa­ra­me­ters, 40B ac­tive pa­ra­me­ters, and a 1M con­text win­dow, it can now be run lo­cally us­ing Unsloth Dynamic GGUFs. GLM-5.2 is the strongest open model to date, per­form­ing on par with Claude 4.8 Opus, GPT-5.5, and Gemini 3.1 Pro across Artificial Analysis and many other bench­marks.

Dynamic 1-bit reaches ~76.2% top-1 ac­cu­racy while be­ing 86% smaller. Dynamic 2-bit reaches ~82% ac­cu­racy while be­ing 84% smaller. This means the model is not 86% worse since it’s 86% smaller - it rather is only ~24% less ac­cu­rate than the full 1.5TB model. Thanks Z.ai for giv­ing Unsloth day-zero ac­cess. GLM-5.2-GGUF

Run GLM-5.2 TutorialsQuantization Results

⚙️ Usage Guide

The 2-bit dy­namic quant UD-IQ2_M uses 239GB of disk space - this can di­rectly fit on a 256GB uni­fied mem­ory Mac and works well in a 1x24GB GPU and 256GB of RAM with MoE of­fload­ing. The 1-bit quant will fit on a 223GB RAM and 8-bit re­quires 810GB RAM.

Table: Inference hard­ware re­quire­ments (units = to­tal mem­ory: RAM + VRAM, or uni­fied mem­ory)

223 GB

245 GB

290 – 360 GB

372 – 475 GB

570 GB

810 GB

For best per­for­mance, make sure your to­tal avail­able mem­ory, in­clud­ing VRAM and sys­tem RAM, ex­ceeds the quan­tized model file size by a com­fort­able mar­gin.

GLM-5.2 has 3 think­ing modes. Non-thinking and Thinking in two modes: High + Max. Use Max Thinking for com­pli­cated tasks. In Unsloth Studio you can eas­ily tog­gle High + Max Thinking and non-Think­ing with a UI.

Use these set­tings for most use cases:

tem­per­a­ture = 1.0

tem­per­a­ture = 1.0

top_p = 0.95

top_p = 1.0

Maximum con­text win­dow: 1,048,576.

Maximum con­text win­dow: 1,048,576.

GLM 5.2 uses rea­son­ing by de­fault. It also sup­ports rea­son­ing ef­forts where rea­son­ing_­ef­fort can be high”, max” or dis­abled.

To dis­able think­ing, use –chat-template-kwargs {“enable_thinking”:false}’. If you’re on Windows Powershell, use: –chat-template-kwargs {"enable_thinking":false}”

You can also use –reasoning on or –reasoning off in llama.cpp as well now!

For rea­son­ing ef­fort cus­tomiza­tion and or to dis­able rea­son­ing, use the be­low ex­am­ples:

We also ran KLD (KL Divergence) bench­marks to gauge the ac­cu­racy of our quan­ti­za­tions of GLM-5.2-GGUF. Dynamic 4-bit UD-Q4_K_XL and dy­namic 5-bit UD-Q5_K_XL are mostly loss­less, and smaller quants also work great by dy­nam­i­cally leav­ing im­por­tant lay­ers in higher pre­ci­sion, and unim­por­tant ones to low bits.

On pure top-1% ac­cu­racy, dy­namic 1-bit gets around 76.2% ac­cu­racy yet be­ing 86% smaller! Dynamic 2-bit gets around 82% ac­cu­racy whilst be­ing 84% smaller. This shows dy­nam­i­cally quan­tiz­ing some lay­ers to higher pre­ci­sion does not make the model 86% worse yet be­ing 86% smaller - but only 24% less ca­pa­ble than the full 1.5TB model.

But what does 76% ac­cu­racy” ac­tu­ally de­scribe?

76% top-1% does NOT mean The cap­i­tal of France is” be­comes choos­ing 76% Paris and 24% of Sydney. The model is NOT dumber” by 24%. For this, Paris is al­ways 100%, and Sydney is 0%. The 76% num­ber in­cludes filler words and stop words across the en­tire cor­pus for ex­am­ple ask­ing:

Create a novel” will get due to LLM sam­pling:

I will now cre­ate a novel…

I will now cre­ate a novel…

The novel is be­low:

The novel is be­low:

What genre would you like it to be?

What genre would you like it to be?

Each ex­am­ple is cor­rect, but the [I, The, What] dis­tri­b­u­tion is what changes - the base­line might use [I] 100% of the time, but now [I] is 76% and [The] is 24%.

It does NOT mean that you get in­cor­rect out­puts like gib­ber­ish or in­cor­rect out­puts 24% of the time.

99.9% KLD is also gen­er­ally good - there is a larger up­lift from 4bit on­wards though, so for mas­sive out of dis­tri­b­u­tion tasks, dy­namic 4-bit is prob­a­bly best.

Top-1% is a forced” bi­no­mial dis­tru­tion of the KLD it­self. KLD is the distance” be­tween the prob­a­bil­i­ties of the base­line (BF16 or Q8_0) vs the quan­tized ver­sion. The goal of quan­ti­za­tion is to min­i­mize the be­low ob­jec­tive:

Where f is the lan­guage mod­el’s for­ward and q is the quan­ti­za­tion op­er­a­tion and W is the pa­ra­me­ters or weights of the model. The goal is to make the distance” be­tween the log­its out­put of the base­line f(W) and the quan­tized mod­el’s out­put as small as pos­si­ble. If you can make 0 KLD, then you have per­fectly re­con­structed the model!

We use mean KLD like be­low since it’s ex­pen­sive to run KLD across the full train­ing cor­pus (15T to­kens for ex­am­ple) - in­stead we do sam­pling, and get a small rep­re­sen­ta­tive sub­set of the train­ing cor­pus / down­stream task, and op­ti­mize that. Mean KLD gen­er­ally fol­lows a mo­not­o­nic trend vs disk space, and shows even at 1-bit GLM 5.2 works well!

Top-1% ac­cu­racy is sim­ply a greedy de­cod­ing op­er­a­tor where we as­sume the argmax item will be picked, and for 1bit, 76% it picks the same as the argmax from the base­line.

You can now run GLM-5.2 in llama.cpp and Unsloth Studio. We will be uti­liz­ing the 239GB UD-IQ2_M quant for best re­sults in terms of ac­cess­bil­ity and ac­cu­racy.

GLM-5.2 can run in Unsloth Studio, an open-source web UI for lo­cal AI. Unsloth Studio au­to­mat­i­cally of­floads to RAM and de­tects multi­GPU se­tups. With Unsloth Studio, you can run mod­els lo­cally on MacOS, Windows, Linux and:

Search, down­load, run GGUFs and safeten­sor mod­els

Search, down­load, run GGUFs and safeten­sor mod­els

Fast CPU + GPU in­fer­ence via llama.cpp

Fast CPU + GPU in­fer­ence via llama.cpp

Install and Launch Unsloth

To in­stall, run in your ter­mi­nal:

MacOS, Linux, WSL:

Windows PowerShell:

Launch Unsloth

MacOS, Linux, WSL and Windows:

Then open http://​127.0.0.1:8888 (or your spe­cific URL) in your browser.

Launch Unsloth se­curely with HTTPS and Cloudflare

NEW! Unsloth now pro­vides a se­cure way to launch Studio over HTTPS through a free Cloudflare tun­nel. Use the be­low (works in Windows, Mac & Linux):

Search and down­load GLM-5.2

Unsloth Studio au­to­mat­i­cally of­floads to RAM and de­tects multi­GPU se­tups. On first launch you will need to cre­ate a pass­word to se­cure your ac­count and sign in again later.

Then go to the Studio Chat tab and search for GLM-5.2 in the search bar and down­load your de­sired model and quant. Ensure you have enough com­pute the run the model.

Run GLM-5.2

Inference pa­ra­me­ters should be auto-set when us­ing Unsloth Studio, how­ever you can still change it man­u­ally. You can also edit the con­text length, chat tem­plate and other set­tings.

For more in­for­ma­tion, you can view our Unsloth Studio in­fer­ence guide.

For this guide we’ll be run­ning the UD-IQ2_M quant which will re­quire at least 245GB RAM. Feel free to change quan­ti­za­tion type. For these tu­to­ri­als, we will us­ing llama.cpp for fast lo­cal in­fer­ence. GGUF: GLM-5.2-GGUF

Obtain the lat­est llama.cpp on GitHub here. You can fol­low the build in­struc­tions be­low as well. Change -DGGML_CUDA=ON to -DGGML_CUDA=OFF if you don’t have a GPU or just want CPU in­fer­ence. For Apple Mac / Metal de­vices, set -DGGML_CUDA=OFF then con­tinue as usual - Metal sup­port is on by de­fault.

You can now use llama.cpp di­rectly to load and down­load mod­els, just like ol­lama run. First, se­lect the quan­ti­za­tion type you want like UD-IQ2_M. Also use ex­port LLAMA_CACHE=“unsloth/GLM-5.2-GGUF” to force llama.cpp to save to a spe­cific lo­ca­tion. Note this down­load process might be very slow, so it’s prob­a­bly best to use the man­ual down­load process in the next sec­tion.

If you want to down­load the model man­u­ally (much faster!), we can down­load the model via the code be­low (after in­stalling pip in­stall hug­ging­face_hub). If down­loads get stuck, see: Hugging Face Hub, XET de­bug­ging

If you want to use the dy­namic 1bit, then do:

Then run the model in con­ver­sa­tion mode. Use un­sloth/​GLM-5.2-GGUF/​UD-IQ2_M/​GLM-5.2-UD-IQ2_M-00001-of-00006.gguf for 2bit or un­sloth/​GLM-5.2-GGUF/​UD-IQ1_S/​GLM-5.2-UD-IQ1_S-00001-of-00006.gguf for 1bit.

When you launch llama-cli, you will see:

Then af­ter prompt­ing it to make a short Flappy Bird game, we get:

With the full con­ver­sa­tion and game be­low:

And the game has sound and works won­der­fully! Reminder this was a 1-bit quan­ti­za­tion and it worked well!

📐Long con­text via KV Cache quan­ti­za­tion

To uti­lize long con­text in llama.cpp, we need to em­ploy KV cache quan­ti­za­tion to re­duce mem­ory us­age. Recently llama.cpp added higher ac­cu­racy tricks to KV cache quan­ti­za­tion - see and other PRs!

Currently, these KV cache dtypes are sup­ported:

By de­fault f16 is used. If you use q4_0 which is around 4.5 bits per weight, you can ex­tend around 16 / 4.5 = 3.5x longer con­text lengths! So if you model used to sup­port 10K, 35K can be in reach! q4_1 is prob­a­bly bet­ter since you also get a shift­ing pa­ra­me­ter, and is 5 bits per weight - so 3.2x longer con­texts.

Use it like be­low:

You can view fur­ther be­low for GLM-5.2 bench­marks in table for­mat:

Reasoning

HLE

40.5

49.8*

41.4*

45

31

41.4

37

37.7

HLE (w/ Tools)

54.7

57.9*

52.2*

51.4*

52.3

53.5

The text in Claude Code’s “Extended Thinking” output is not authentic. – blog

patrickmccanna.net

Claude Code records each ses­sion to disk. Those logs in­clude thinking blocks” — the mod­el’s own rea­son­ing as it works.

I went to in­spect that rea­son­ing this week­end and found a sig­na­ture (600 char­ac­ters long) and no text.

So I read the docs: https://​plat­form.claude.com/​docs/​en/​build-with-claude/​ex­tended-think­ing

Some de­tails worth be­ing aware of:

Claude en­crypts its rea­son­ing into that sig­na­ture.

Anthropic holds the key. Your ma­chine does­n’t re­ceive it.

The API hands back a SUMMARY of rea­son­ing, NOT the rea­son­ing it­self.

Getting the full think­ing out­put re­quires an en­ter­prise agree­ment.

Matt Green looked into this and has some more de­tailed ob­ser­va­tions on the sig­na­ture blocks.

This is worth know­ing be­fore you promise any­one an au­dit trail. Also- BEWARE: The extended-thinking” out­put from ctrl+o is a sum­mary of Fable/Opus’ think­ing. It is­n’t the ac­tual think­ing that drove the mod­el’s ac­tions in a ses­sion- but a sum­mary of the think­ing logic. This is like sav­ing a bmp as a .jpeg and then edit­ing the .jpeg and sav­ing it back as a .bmp. The con­ver­sion pro­duces data loss. [edit: I orig­i­nally had the or­der in­verted, which trig­gered some HN read­ers. Apologies!]

I’m un­der­whelmed by how Anthropic is pre­sent­ing the be­hav­ior of their ap­pli­ca­tion. If you ever need a record of the logic a used by YOUR AGENT dur­ing a ses­sion:

you can’t pro­duce the logic us­ing the lo­cal files. The rea­son­ing logs on your sys­tem are not ac­ces­si­ble to you.

You can log the in­puts, the out­puts, and the ac­tions of a run­ning Claude code with some scrappy scrap­ing- but even then- it’s not the ac­tual rea­son­ing that drove the agen­t’s be­hav­ior.

And the lan­guage in the docs is aw­fully in­di­rect. If you haven’t had your cof­fee, you might miss that extended think­ing re­turns a sum­mary of Claude’s full think­ing process”

Performance im­prove­ments in Open Source mod­els need to come faster.

Jobs and Software is Fucked

urflow.bearblog.dev

19 Jun, 2026

Or how I de­cided to scream into the void about how aw­ful the job mar­ket is. I’m still in a very sour mood af­ter the lat­est rounds of in­ter­views but ul­ti­mately this is my blog and a way to let off some of that sour­ness.

For ref­er­ence: I’m a soft­ware en­gi­neer of about 10ish years of ex­pe­ri­ence. I’ve worked for a small con­trac­tor be­fore, and my pre­vi­ous time spent was about seven years work­ing at Blizzard. I was laid off in June 2025 along with the rest of my team and I’ve been look­ing for a job since. This is prob­a­bly the worst job mar­ket I’ve seen in a while.

Over the past six months I’ve had a few in­ter­views that got to the fi­nal stages, a cou­ple that got fil­tered out early on and a bunch more that end up com­pletely silent de­spite my skills fit­ting the job to a tee.

The ones that sting the most are the ones where I was in the fi­nal rounds and got passed over for some­one else. In par­tic­u­lar I’ve had a few cases where every­thing seemed pos­i­tive and there was just an­other can­di­date or in­ter­nal trans­fer that ended up get­ting ahead. I would reach back out to the re­cruiter af­ter­wards (since this had been go­ing on for weeks now) pok­ing at them for other po­si­tions that fit my skills, only for them to go silent. I would promptly re­move them from my LinkedIn con­nec­tions af­ter­wards be­cause I’m ex­hausted of point­less con­nec­tions and re­cruiters.

But the ones that an­noy me the most are all the ini­tial fil­ters. Coderpad, hack­er­rank, ex­ams with AI proc­tors. All of these are in­tensely bull­shit be­cause com­pa­nies use them as fil­ters for dif­fer­ent mech­a­nisms, all of which are de­feated by some ass­hole on their phone with what­ever AI of the week they’re us­ing. So I’m in­her­ently at a dis­ad­van­tage be­cause I try to play by the rules as these shitty apps full screen them­selves and lock you out of any sort of API ref­er­ence or help­ful pages for when I need to re­call off the top of my head the proper way to in­stan­ti­ate a list or heap in X lan­guage. As much as I want to strongly refuse these fil­ters the only choice I have is to com­ply, be­cause many of these fil­ters are made by peo­ple so far away from any sort of dev work that the only thing they know is more code is bet­terer.

Trying to nav­i­gate all of this is a Sisyphean task amidst all of the com­pa­nies heav­ily push­ing AI garbage every­where. It’s not enough that jobs want to force you to de­grade your­self and your skill to ful­fill some quota on to­kens burnt to Claude but they’ve quickly built up a sys­tem that also makes it even more im­pos­si­ble to ac­tu­ally get past lay­ers of key­worded re­sume screen­ing and busy­work. Every time you ap­ply for a job it feels like com­pa­nies are scat­ter­ing le­gos in front of you and telling you to dance bare­footed on top of them to prove your worth.

And over time you start to in­ter­nal­ize how shitty things are; how you’re of­ten the least best can­di­date out of a pool of other more des­per­ate can­di­dates. I’m one of the lucky ones too be­cause I’m ac­tu­ally get­ting in­ter­views even if they go nowhere. People just out of col­lege get to ex­pe­ri­ence com­pa­nies pulling up the lad­der com­pletely hop­ing that Anthropic will re­move the need for ju­niors en­tirely.

It’s not like the job mar­ket was that much bet­ter be­fore AI in­fested every sin­gle cor­ner of the mar­ket, but it su­per­charged all of the worst as­pects of every­thing. I’ve seen peo­ple sup­pos­edly smarter than I ad­vo­cate for just giv­ing in, con­ced­ing to AI cod­ing as it’s the fu­ture. But do­ing so means toss­ing out my friends who make art or the peo­ple who work their asses off to prop­erly test and re­view code or the writ­ers pour­ing all of their en­ergy into even mun­dane di­a­logue. It means throw­ing out my dig­nity as a soft­ware en­gi­neer, as some­one that truly gives a shit about se­cu­rity and code.

I won’t let it take away the joy I have in be­ing a weird com­puter nerd. I can’t. But it’s all so tire­some.

Moebius Project Page

hustvl.github.io

Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance

(*) Equal Contribution, (†) Project Leader, (📧) Corresponding Author.

1Huazhong University of Science and Technology 2VIVO AI Lab

Abstract

While 10B-level in­dus­trial foun­da­tion mod­els have pushed the bound­aries of im­age in­paint­ing, their pro­hib­i­tive com­pu­ta­tional costs se­verely hin­der prac­ti­cal de­ploy­ment. Constructing a highly op­ti­mized task-spe­cific spe­cial­ist of­fers a promis­ing so­lu­tion; how­ever, ex­treme struc­tural com­pres­sion in­evitably trig­gers a se­vere rep­re­sen­ta­tion bot­tle­neck. To con­quer this, we pro­pose Moebius, a highly ef­fi­cient light­weight in­paint­ing frame­work. We sys­tem­at­i­cally re­con­struct the dif­fu­sion back­bone by in­tro­duc­ing the Local-λ Mix Interaction (LλMI) block. Comprising Local-λ and Interactive-λ mod­ules, it el­e­gantly sum­ma­rizes spa­tial con­texts and global se­man­tic pri­ors into fixed-size lin­ear ma­tri­ces, pre­serv­ing com­plex la­tent in­ter­ac­tions while dras­ti­cally shed­ding pa­ra­me­ters. Furthermore, to un­lock the full rep­re­sen­ta­tional ca­pac­ity of this highly com­pact ar­chi­tec­ture, we syn­er­gis­ti­cally pair it with an adap­tive multi-gran­u­lar­ity dis­til­la­tion strat­egy. Operating strictly within the la­tent space to avoid ex­pen­sive pixel-space de­cod­ing, this strat­egy dy­nam­i­cally bal­ances mul­ti­ple gra­di­ent-based losses to achieve high-fi­delity align­ment. Extensive ex­per­i­ments across nat­ural and por­trait bench­marks demon­strate that this op­ti­mal syn­ergy en­ables Moebius to ri­val or even sur­pass the gen­er­a­tion qual­ity of the 10B-level in­dus­trial gen­er­al­ist FLUX.1-Fill-Dev. Remarkably, Moebius achieves this us­ing less than 2\% of the pa­ra­me­ters (0.22B vs. 11.9B) while de­liv­er­ing a >15× ac­cel­er­a­tion in to­tal in­fer­ence time, set­ting a new ef­fi­ciency stan­dard for high-fi­delity in­paint­ing.

Method

Overall pipeline of Moebius. We adopt the Latent Diffusion Model (LDM) frame­work equipped with Latent Categories Guidance (LCG). To achieve ex­treme ar­chi­tec­tural ef­fi­ciency, the de­nois­ing U-Net is sys­tem­at­i­cally re­struc­tured us­ing our pro­posed LλM I blocks (detailed in Sec. 3.2). Furthermore, an adap­tive multi-gran­u­lar­ity dis­til­la­tion strat­egy (Sec. 3.3) is ap­plied dur­ing train­ing to align our light­weight spe­cial­ist with the high-ca­pac­ity teacher, suc­cess­fully mit­i­gat­ing the ca­pac­ity drop caused by ex­treme struc­tural com­pres­sion.

Highlights

📉 Extreme Parametric Efficiency (< 2%): Moebius op­er­ates with a mere 0.22B (226M) pa­ra­me­ters, which rep­re­sents less than 2% of the size of the colos­sal in­dus­trial gi­ant FLUX.1-Fill-Dev (11.9B). It shat­ters the heavy-com­pute nar­ra­tive, mak­ing high-qual­ity in­paint­ing ac­ces­si­ble on con­sumer-grade and edge de­vices.

15× Inference Speedup (26ms/step): Achieves a blis­ter­ing in­fer­ence la­tency of only 26.01 ms per step on a sin­gle GPU. Combined with op­ti­mized sam­pling steps, Moebius de­liv­ers an over­all >15× to­tal run­time ac­cel­er­a­tion com­pared to 10B-level mod­els.

🏆 10B-Level Inpainting Quality (on-par-with/surpass FLUX.1-Fill-Dev across 6 bench­marks): Size con­trac­tion does not mean rep­re­sen­ta­tion degra­da­tion. Through the syn­er­gis­tic op­ti­miza­tion of ar­chi­tec­ture and dis­til­la­tion, Moebius per­forms on par with, and in cer­tain sce­nar­ios (such as com­plex tex­tures and fa­cial plau­si­bil­ity), sur­passes 10B-level state-of-the-art (SOTA) gen­er­al­ist mod­els (FLUX.1-Fill-Dev, SD3.5 Large-Inpainting) across 6 com­pre­hen­sive bench­marks span­ning both nat­ural scenes (Places2) and por­trait scenes (CelebA-HQ, FFHQ).

💡 Synergistic Core Innovations:

Architecture Design (LλMI Block): Reformulates both self- and cross-at­ten­tion by con­dens­ing spa­tial con­text and global se­man­tic pri­ors into fixed-size lin­ear ma­tri­ces, by­pass­ing qua­dratic com­pu­ta­tional over­head. Adaptive Multi-Granularity Distillation Strategy: Transfers the rep­re­sen­ta­tional ca­pac­ity from our PixelHacker (teacher) strictly within the la­tent space (avoiding ex­pen­sive pixel-space de­cod­ing). It bridges the gi­ant ca­pac­ity gap by align­ing multi-gran­u­lar­ity su­per­vi­sion—rang­ing from mi­cro­scopic in­ter­me­di­ate fea­tures to macro­scopic dif­fu­sion tra­jec­to­ries—while dy­nam­i­cally bal­anc­ing train­ing via a gra­di­ent norm adap­tive loss weight­ing mech­a­nism. Optimal Synergistic Balancing: Systematically ex­plores the mu­tual con­straint and up­per bound be­tween com­pact struc­ture and dis­til­la­tion. By map­ping this ar­chi­tec­ture-dis­til­la­tion syn­ergy fron­tier, we en­sure our 0.22B Moebius (student) ab­sorbs the max­i­mum se­man­tic rea­son­ing of PixelHacker (teacher) with­out trig­ger­ing rep­re­sen­ta­tion sat­u­ra­tion.

Architecture Design (LλMI Block): Reformulates both self- and cross-at­ten­tion by con­dens­ing spa­tial con­text and global se­man­tic pri­ors into fixed-size lin­ear ma­tri­ces, by­pass­ing qua­dratic com­pu­ta­tional over­head.

Adaptive Multi-Granularity Distillation Strategy: Transfers the rep­re­sen­ta­tional ca­pac­ity from our PixelHacker (teacher) strictly within the la­tent space (avoiding ex­pen­sive pixel-space de­cod­ing). It bridges the gi­ant ca­pac­ity gap by align­ing multi-gran­u­lar­ity su­per­vi­sion—rang­ing from mi­cro­scopic in­ter­me­di­ate fea­tures to macro­scopic dif­fu­sion tra­jec­to­ries—while dy­nam­i­cally bal­anc­ing train­ing via a gra­di­ent norm adap­tive loss weight­ing mech­a­nism.

Optimal Synergistic Balancing: Systematically ex­plores the mu­tual con­straint and up­per bound be­tween com­pact struc­ture and dis­til­la­tion. By map­ping this ar­chi­tec­ture-dis­til­la­tion syn­ergy fron­tier, we en­sure our 0.22B Moebius (student) ab­sorbs the max­i­mum se­man­tic rea­son­ing of PixelHacker (teacher) with­out trig­ger­ing rep­re­sen­ta­tion sat­u­ra­tion.

🚀 Task-Specific Specialist over Bloated Generalists: Rather than blindly scal­ing up, Moebius an­swers a fun­da­men­tal ques­tion: Can a model be smarter, lighter, and faster when the task is ex­plic­itly de­fined? It serves as a highly op­ti­mized spe­cial­ist that lib­er­ates real-world im­age in­paint­ing and AI ob­ject re­moval from pa­ra­me­ter bloat.

Visualizations

- Natural Scenes -

- Portrait Scenes -

Comparison on Natural Scenes (Places2)

Comparison on Portrait Scenes (CelebA-HQ, FFHQ)

BibTeX

@misc{DuanAndXu2026Moebius, ti­tle={Moe­bius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance}, au­thor={Kang­sheng Duan and Ziyang Xu and Wenyu Liu and Xiaohu Ruan and Xiaoxin Chen and Xinggang Wang}, year={2026}, eprint={2606.19195}, archivePre­fix={arXiv}, pri­ma­ryClass={cs.CV}, url={https://​arxiv.org/​abs/​2606.19195}, }

My Mathematical Regression

blog.dahl.dev

I came across my 10 year old repo for pro­ject Euler so­lu­tions. (N.B! Euler spoil­ers on­wards).

Naturally it’s full of python files. One file stood out. It was just called prob­lem15.txt.

I pulled up the prob­lem.

I imag­ined get­ting this at work. I think I would reach for python. Maybe start with a naive brute force. Throw a bunch of loops to­gether. If that did­n’t solve it, reach for mem­o­iza­tion. Dynamic pro­gram­ming, let’s go! (this is just me fan­ta­siz­ing. At work I would just give it to an AI and con­tinue with my day)

And let’s see how I solved it when I was still an en­gi­neer­ing stu­dent

prob­lem15.txt⌗

doesnt even need to pro­gram any­thing for this prob­lem there are 6 so­lu­tions to the 2x2 grid there are 2 so­lu­tions to 1x1 grid there are 20 so­lu­tions to a 3x3 grid this fol­lows the pat­tern of (2n) choose n so (2*20) choose 20 = 137846528820

(If you aren’t fa­mil­iar with dis­crete math, see the bi­no­mial co­ef­fi­cient for syn­tax)

I’m im­pressed by past self! And pre­sent me be­came sad. It feels like an Asimov book where the main char­ac­ter finds past knowl­edge, cod­i­fied by the an­cients. But it’s just me when I was in school.

I remixed this pic­ture to cope.

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

Visit pancik.com for more.