10 interesting stories served every morning and every evening.




1 796 shares, 33 trendiness

Google AI Edge Gallery-app

AI Edge Gallery is the pre­mier des­ti­na­tion for run­ning the world’s most pow­er­ful open-source Large Language Models (LLMs) on your mo­bile de­vice. Experience high-per­for­mance Generative AI di­rectly on your hard­ware—fully of­fline, pri­vate, and light­ning-fast.

Now Featuring: Gemma 4

This up­date brings of­fi­cial sup­port for the newly re­leased Gemma 4 fam­ily. As the cen­ter­piece of this re­lease, Gemma 4 al­lows you to test the cut­ting edge of on-de­vice AI. Experience ad­vanced rea­son­ing, logic, and cre­ative ca­pa­bil­i­ties with­out ever send­ing your data to a server.

Core Features

- Agent Skills: Transform your LLM from a con­ver­sa­tion­al­ist into a proac­tive as­sis­tant. Use the Agent Skills tile to aug­ment model ca­pa­bil­i­ties with tools like Wikipedia for fact-ground­ing, in­ter­ac­tive maps, and rich vi­sual sum­mary cards. You can even load mod­u­lar skills from a URL or browse com­mu­nity con­tri­bu­tions on GitHub Discussions.

- AI Chat with Thinking Mode: Engage in fluid, multi-turn con­ver­sa­tions and tog­gle the new Thinking Mode to peek under the hood.” This fea­ture al­lows you to see the mod­el’s step-by-step rea­son­ing process, which is per­fect for un­der­stand­ing com­plex prob­lem-solv­ing. Note: Thinking Mode cur­rently works with sup­ported mod­els, start­ing with the Gemma 4 fam­ily.

- Ask Image: Use mul­ti­modal power to iden­tify ob­jects, solve vi­sual puz­zles, or get de­tailed de­scrip­tions us­ing your de­vice’s cam­era or photo gallery.

- Audio Scribe: Transcribe and trans­late voice record­ings into text in real-time us­ing high-ef­fi­ciency on-de­vice lan­guage mod­els.

- Prompt Lab: A ded­i­cated work­space to test dif­fer­ent prompts and sin­gle-turn use cases with gran­u­lar con­trol over model pa­ra­me­ters like tem­per­a­ture and top-k.

- Mobile Actions: Unlock of­fline de­vice con­trols and au­to­mated tasks pow­ered en­tirely by a fine­tune of FuntionGemma 270m.

- Tiny Garden: A fun, ex­per­i­men­tal mini-game that uses nat­ural lan­guage to plant and har­vest a vir­tual gar­den us­ing a fine­tune of FunctionGemma 270m.

- Model Management & Benchmark: Gallery is a flex­i­ble sand­box for a wide va­ri­ety of open-source mod­els. Easily down­load mod­els from the list or load your own cus­tom mod­els. Manage your model li­brary ef­fort­lessly and run bench­mark tests to un­der­stand ex­actly how each model per­forms on your spe­cific hard­ware.

- 100% On-Device Privacy: All model in­fer­ences hap­pen di­rectly on your de­vice hard­ware. No in­ter­net is re­quired, en­sur­ing to­tal pri­vacy for your prompts, im­ages, and sen­si­tive data.

Built for the Community

AI Edge Gallery is an open-source pro­ject de­signed for the de­vel­oper com­mu­nity and AI en­thu­si­asts alike. Explore our ex­am­ple fea­tures, con­tribute your own skills, and help shape the fu­ture of the on-de­vice agent ecosys­tem.

Check out the source code on GitHub:

https://​github.com/​google-ai-edge/​gallery

Note: This app is in ac­tive de­vel­op­ment. Performance is de­pen­dent on your de­vice’s hard­ware (CPU/GPU). For sup­port or feed­back, con­tact us at google-ai-edge-gallery-an­droid-feed­back@google.com.

...

Read the original on apps.apple.com »

2 772 shares, 47 trendiness

arman-bd/guppylm: A ~9M parameter LLM that talks like a small fish.

This pro­ject ex­ists to show that train­ing your own lan­guage model is not magic.

No PhD re­quired. No mas­sive GPU clus­ter. One Colab note­book, 5 min­utes, and you have a work­ing LLM that you built from scratch — data gen­er­a­tion, to­k­enizer, model ar­chi­tec­ture, train­ing loop, and in­fer­ence. If you can run a note­book, you can train a lan­guage model.

It won’t pro­duce a bil­lion-pa­ra­me­ter model that writes es­says. But it will show you ex­actly how every piece works — from raw text to trained weights to gen­er­ated out­put — so the big mod­els stop feel­ing like black boxes.

GuppyLM is a tiny lan­guage model that pre­tends to be a fish named Guppy. It speaks in short, low­er­case sen­tences about wa­ter, food, light, and tank life. It does­n’t un­der­stand hu­man ab­strac­tions like money, phones, or pol­i­tics — and it’s not try­ing to.

It’s trained from scratch on 60K syn­thetic con­ver­sa­tions across 60 top­ics, runs on a sin­gle GPU in ~5 min­utes, and pro­duces a model small enough to run in a browser.

Vanilla trans­former. No GQA, no RoPE, no SwiGLU, no early exit. As sim­ple as it gets.

* Experiences the world through wa­ter, tem­per­a­ture, light, vi­bra­tions, and food

* Is friendly, cu­ri­ous, and a lit­tle dumb

60 top­ics: greet­ings, feel­ings, tem­per­a­ture, food, light, wa­ter, tank, noise, night, lone­li­ness, bub­bles, glass, re­flec­tion, breath­ing, swim­ming, col­ors, taste, plants, fil­ter, al­gae, snails, scared, ex­cited, bored, cu­ri­ous, happy, tired, out­side, cats, rain, sea­sons, mu­sic, vis­i­tors, chil­dren, mean­ing of life, time, mem­ory, dreams, size, fu­ture, past, name, weather, sleep, friends, jokes, fear, love, age, in­tel­li­gence, health, singing, TV, and more.

Downloads the pre-trained model from HuggingFace and lets you chat. Just run all cells.

pip in­stall torch to­k­eniz­ers

python -m gup­pylm chat

from datasets im­port load­_­dataset

ds = load­_­dataset(“ar­man-bd/​gup­pylm-60k-generic”)

print(ds[“train”][0])

# {‘input’: hi gup­py’, output’: hello. the wa­ter is nice to­day.’, category’: greeting’}

Why no sys­tem prompt? Every train­ing sam­ple had the same one. A 9M model can’t con­di­tion­ally fol­low in­struc­tions — the per­son­al­ity is baked into the weights. Removing it saves ~60 to­kens per in­fer­ence.

Why sin­gle-turn only? Multi-turn de­graded at turn 3-4 due to the 128-token con­text win­dow. A fish that for­gets is on-brand, but gar­bled out­put is­n’t. Single-turn is re­li­able.

Why vanilla trans­former? GQA, SwiGLU, RoPE, and early exit add com­plex­ity that does­n’t help at 9M params. Standard at­ten­tion + ReLU FFN + LayerNorm pro­duces the same qual­ity with sim­pler code.

Why syn­thetic data? A fish char­ac­ter with con­sis­tent per­son­al­ity needs con­sis­tent train­ing data. Template com­po­si­tion with ran­dom­ized com­po­nents (30 tank ob­jects, 17 food types, 25 ac­tiv­i­ties) gen­er­ates ~16K unique out­puts from ~60 tem­plates.

...

Read the original on github.com »

3 716 shares, 33 trendiness

Microsoft Hasn’t Had a Coherent GUI Strategy Since Petzold

A few years ago I was in a meet­ing with de­vel­op­ers and some­one asked a sim­ple ques­tion: What’s the right frame­work for a new Windows desk­top app?”

Dead si­lence. One per­son sug­gested WPF. Another said WinUI 3. A third asked if they should just use Electron. The meet­ing went side­ways and we never did an­swer the ques­tion.

That si­lence is the story. And the story goes back thirty-plus years.

When a plat­form can’t an­swer how should I build a UI?” in un­der ten sec­onds, it has failed its de­vel­op­ers. Full stop.

In 1988, Charles Petzold pub­lished Programming Windows. 852 pages. Win16 API in C. And for all its bulk, it rep­re­sented some­thing re­mark­able: a sin­gle, co­her­ent, au­thor­i­ta­tive an­swer to how you write a Windows ap­pli­ca­tion. In the busi­ness, we call that a strategy’.

Win32 that fol­lowed was big­ger but still co­her­ent. Message loops. Window pro­ce­dures. GDI. The men­tal model was a bit whacky, but it was one men­tal model. Petzold ex­plained it. It was the F=MA of Windows. Simple. Powerful. You learned it. You used it. You were suc­cess­ful.

Clarity is your friend! One OS, one API, one lan­guage, one book. There was no com­mit­tee de­bat­ing man­aged-code al­ter­na­tives. There was just Win32 and Petzold, and it worked. This was Physics not Chemistry (this works but only for this slice of the pe­riod table. And only un­der these pres­sures.  And only within this tem­per­a­ture. And only if the Moon is in the 7th house of Jupiter).

What hap­pened next is a mas­ter­class in how a com­pany with bril­liant peo­ple and enor­mous re­sources can pro­duce a thirty-year boof-a-rama by op­ti­miz­ing for the wrong things.  AKA Brillant peo­ple do­ing stu­pid things.

Win32 had real lim­i­ta­tions, so Microsoft did what Microsoft does: it shipped some­thing new for the de­vel­oper con­fer­ence. Several some­things.

MFC (1992) wrapped Win32 in C++. If Win32 was in­el­e­gant, MFC was Win32 wear­ing a tuxedo made of other tuxe­dos. Then came OLE. COM. ActiveX. None of these were re­ally GUI frame­works — they were com­po­nent ar­chi­tec­tures — but they in­fected every cor­ner of Windows de­vel­op­ment and in­tro­duced a level of cog­ni­tive com­plex­ity that makes Kierkegaard read like Hemingway.

I sat through a con­fer­ence ses­sion in the late nineties try­ing to un­der­stand the dif­fer­ence be­tween an OLE doc­u­ment, a COM ob­ject, and an ActiveX con­trol. I looked at the pre­sen­ter like they had a rat’s tail hang­ing out of his mouth for the en­tire hour.

Microsoft was­n’t sell­ing a co­her­ent story. It was sell­ing tech­nol­ogy prim­i­tives and telling de­vel­op­ers to fig­ure out the story them­selves. That’s the Conference Keynote Cluster***k — Microsoft op­ti­mized for an ex­ec­u­tive im­press­ing peo­ple with their keynote and not the suc­cess of the users or de­vel­op­ers.

At PDC 2003, Microsoft un­veiled Longhorn — gen­uinely one of the most com­pelling tech­ni­cal vi­sions the com­pany had ever put in front of de­vel­op­ers. Three pil­lars: WinFS (a re­la­tional file sys­tem), Indigo (unified com­mu­ni­ca­tions), and Avalon — later WPF — a GPU-accelerated, vec­tor-based UI sub­sys­tem dri­ven by a de­clar­a­tive XML lan­guage called XAML. Developers saw the Avalon demos and went nuts. It was the right vi­sion.

It was also, in the words of Jim Allchin’s in­ter­nal memo from January 2004, a pig.”

By August 2004, Microsoft an­nounced a com­plete de­vel­op­ment re­set. Scrapped. Start over from the Server 2003 code­base. And af­ter the re­set, lead­er­ship is­sued a quiet di­rec­tive: no f***ing man­aged code in Windows. All new code in C++. WPF would ship along­side Vista, but the shell it­self would not use it.

The Windows team’s bit­ter­ness to­ward .NET never healed. From their per­spec­tive, gam­bling on a new man­aged-code frame­work had pro­duced the most em­bar­rass­ing fail­ure in the com­pa­ny’s his­tory. That bit­ter­ness cre­ated a thir­teen-year in­sti­tu­tional civil war be­tween the Windows team and the .NET team that would ul­ti­mately or­phan WPF, kill Silverlight, doom UWP, and give us the GUI ecosys­tem boof-a-rama we have to­day.

WPF shipped in late 2006. It was re­mark­able — XAML, hard­ware-ac­cel­er­ated ren­der­ing, real data bind­ing. If Microsoft had made it the de­fin­i­tive an­swer and in­vested re­lent­lessly, the story might have ended dif­fer­ently. Instead, in 2007, they launched Silverlight: a stripped-down browser plu­gin to com­pete with Flash, cross-plat­form, el­e­gant, and the foun­da­tion for Windows Phone. Around 2010 it looked like the rich client fu­ture.

Then at MIX 2010, a Microsoft ex­ec­u­tive said in a Q&A that Silverlight was not a cross-plat­form strat­egy — it was about Windows Phone. HTML5 was now pol­icy. The Silverlight team was not told this was com­ing. Developers who had bet their LOB ap­pli­ca­tions on Silverlight found out from a con­fer­ence Q&A.

Silverlight was­n’t killed by tech­ni­cal fail­ure. The tech­nol­ogy was fine. It was killed by a busi­ness strat­egy de­ci­sion, and de­vel­op­ers were the last to know.

Remember that pat­tern. We’ll see it again.

Apple had sold 200 mil­lion iPhones. The iPad was eat­ing into PC sales. Microsoft’s an­swer was Windows 8 and Metro — a touch-first run­time called WinRT that was de­lib­er­ately not built on .NET. Remember the Windows team’s bit­ter­ness? Here it man­i­fests. WinRT was a na­tive C++ run­time. Clean break from WPF, WinForms, and a decade of de­vel­oper in­vest­ment in .NET.

There were ac­tu­ally two sto­ries be­ing told si­mul­ta­ne­ously in­side Microsoft. The Windows team was build­ing WinRT. The .NET team was still evan­ge­liz­ing WPF. Different build­ings, dif­fer­ent VPs, dif­fer­ent road maps.

What de­vel­op­ers heard at //Build 2012: the fu­ture is WinRT, and also HTML+JS is first-class, and also .NET still works, and also C++ is back, and also you should write Metro apps, and also your WPF code still runs fine. That is not a strat­egy. That is a Hunger Games stage where six teams are fight­ing for your at­ten­tion.

Enterprise de­vel­op­ers took one look at UWPs sand­box­ing, its Store de­ploy­ment re­quire­ment, and its miss­ing Win32 APIs, and walked away. The frame­work de­signed to win them into the mod­ern era had been op­ti­mized for a tablet app store that never ma­te­ri­al­ized.

Windows 10 brought Universal Windows Platform — write once, run on PC, phone, Xbox, HoloLens. Compelling on pa­per. The prob­lem: Windows Phone was dy­ing, and Microsoft’s own flag­ship apps — Office, Visual Studio, the shell it­self — weren’t us­ing UWP. The mes­sage was clear even if no one said it out loud.

When UWP stalled, the of­fi­cial an­swer be­came it de­pends. Use UWP for new apps, keep WPF for ex­ist­ing ones, add mod­ern APIs via XAML Islands, wait for WinUI 3, but also WinUI 2 ex­ists for UWP specif­i­cally, and Project Reunion will fix every­thing, ex­cept we’re re­nam­ing it Windows App SDK and it still does­n’t fully re­place UWP and…

Project Reunion / WinUI 3 rep­re­sents gen­uine progress. But ask your­self why the prob­lem ex­isted at all. UWPs con­trols were tied to the OS be­cause the Windows team owned them. The .NET team did­n’t. The de­vel­oper tools team did­n’t. Project Reunion was an or­ga­ni­za­tional workaround dressed up as a tech­ni­cal so­lu­tion.

One de­vel­op­er’s sum­mary, writ­ten in 2024: I’ve been fol­low­ing Microsoft’s con­stant changes: UAP, UWP, C++/CX re­placed by C++/WinRT with­out tool sup­port, XAML Islands, XAML Direct, Project Reunion, the restart of WinAppSDK, the chaotic switch be­tween WinUI 2.0 and 3.0…” Fourteen years. Fourteen piv­ots. That per­son de­serves a medal and an apol­ogy, in that or­der.

Here is every GUI tech­nol­ogy ac­tu­ally ship­ping on Windows to­day:

* Win32 (1985) — Still here. Still used. Petzold’s book still ap­plies.

* MFC (1992) — C++ wrap­per on Win32. Maintenance mode. Lives in en­ter­prise and CAD.

* WinForms (2002) — .NET wrap­per on Win32. Available but dis­cour­aged.” Still fastest for data-en­try forms.

* Electron — Chromium + Node.js. VS Code, Slack, Discord. The most widely de­ployed desk­top GUI tech­nol­ogy on Windows right now — and Microsoft had noth­ing to do with it.

* Avalonia — Open source WPF spir­i­tual suc­ces­sor. Used by JetBrains, GitHub, Unity — de­vel­op­ers who stopped wait­ing for Microsoft.

* Uno Platform — WinUI APIs on every plat­form. More com­mit­ted to WinUI than Microsoft is.

* Delphi / RAD Studio — Still alive. Still fast. Still in ver­ti­cal mar­ket soft­ware.

* Java Swing / JavaFX — Yes, still in pro­duc­tion. The en­ter­prise never for­gets.

Seventeen ap­proaches. Five pro­gram­ming lan­guages. Three ren­der­ing philoso­phies. That is not a plat­form. I might not have a dic­tio­nary de­f­i­n­i­tion for the term boof-a-rama but I know one when I see it.

Every failed GUI ini­tia­tive traces back to one of three causes: in­ter­nal team pol­i­tics (Windows vs. .NET), a de­vel­oper con­fer­ence an­nounce­ment dri­ving a pre­ma­ture plat­form bet (Metro, UWP), or a busi­ness strat­egy pivot that or­phaned de­vel­op­ers with­out warn­ing (Silverlight). None of these are tech­ni­cal fail­ures. The tech­nol­ogy was of­ten gen­uinely good — WPF was good, Silverlight was good, XAML is good. The or­ga­ni­za­tional fail­ure was the prod­uct.

You ei­ther have a Plausible Theory of Success that cov­ers the full life­cy­cle — adop­tion, in­vest­ment, main­te­nance, and mi­gra­tion — or you have a de­vel­oper con­fer­ence keynote.

One is a strat­egy. The other is a thirty-year boof-a-rama.

Charles Petzold wrote six edi­tions of Programming Windows try­ing to keep up with each new thing Microsoft an­nounced. He stopped af­ter the sixth, which cov­ered WinRT for Windows 8. That was 2012.

...

Read the original on www.jsnover.com »

4 713 shares, 33 trendiness

Why Switzerland Has 25 Gbit Internet and America Doesn't

What Can Be Done

You may have heard about 25 Gbit sym­met­ri­cal in­ter­net in Switzerland. This is of­ten cited as the fastest ded­i­cated (non-shared) res­i­den­tial con­nec­tion in the world. However, did you ever won­der why Switzerland has such fast in­ter­net at a rea­son­able price while the United States and other coun­tries like Switzerland’s neigh­bor Germany are falling be­hind?

What is the fun­da­men­tal dif­fer­ence be­tween the coun­tries that leads to such a stark dif­fer­ence in in­ter­net speeds and prices?

Free mar­kets, reg­u­la­tion, tech­nol­ogy, or all three?

Let’s take a closer look at the sit­u­a­tion in Switzerland, Germany, and the United States.

This ar­ti­cle is writ­ten by me and spell checked with AI. Many of the im­ages are gen­er­ated by AI and are mostly to break up the wall of text.

This Article is also avail­able as a video (My first):

As men­tioned, in Switzerland, you can get 25 Gigabit per sec­ond fiber in­ter­net to your home, sym­met­ric and ded­i­cated. If you don’t need such ex­treme speed, you can get 1 or 10 Gigabit from mul­ti­ple com­pet­ing providers for very lit­tle money. All over a con­nec­tion that is­n’t shared with your neigh­bors. In fact, some­one could of­fer 100 Gigabit or more to­day; there is noth­ing pre­vent­ing this other than the cost of end­point equip­ment.

In the United States, if you’re lucky enough to have fiber, you might get 1 Gigabit. But of­ten it’s shared with your neigh­bors. And you usu­ally have ex­actly one choice of provider. Maybe two, if you count the ca­ble com­pany that of­fers slower speeds for the same price.

In Germany, you are in a some­what sim­i­lar sit­u­a­tion to the United States. Fiber ser­vice is lim­ited to one provider and is of­ten shared with your neigh­bors.

The United States prides it­self on free mar­kets. On com­pe­ti­tion. On let­ting busi­nesses fight it out. A dereg­u­lated mar­ket with no brakes.

Germany, on the other hand, is fa­mous for over-reg­u­la­tion, mak­ing it dif­fi­cult for busi­nesses to op­er­ate, yet it is in a sim­i­lar sit­u­a­tion to the United States.

Switzerland has a highly reg­u­lated tele­com sec­tor with strong over­sight and gov­ern­ment-backed in­fra­struc­ture pro­jects, but reg­u­la­tions in Switzerland dif­fer from those in Germany.

So why is the coun­try that wor­ships free mar­kets pro­duc­ing stag­na­tion, mo­nop­o­lies, and in­fe­rior in­ter­net, while the coun­try with heavy reg­u­la­tion is pro­duc­ing hy­per-com­pe­ti­tion, world-lead­ing speeds, and con­sumer choice?

And at the same time, the coun­try with the most reg­u­la­tion is suf­fer­ing the same prob­lems as the coun­try with the least.

The an­swer re­veals a fun­da­men­tal truth about cap­i­tal­ism and reg­u­la­tion that most peo­ple get wrong.

To un­der­stand the fail­ure, you have to un­der­stand what econ­o­mists call a natural mo­nop­oly.”

A nat­ural mo­nop­oly is an in­dus­try where the cost of build­ing the in­fra­struc­ture is so high, and the cost of serv­ing an ad­di­tional cus­tomer is so low, that com­pe­ti­tion ac­tu­ally de­stroys value.

Think about wa­ter pipes. It would be in­sane to have three dif­fer­ent wa­ter com­pa­nies each dig­ging up your street to lay their own pipes. You’d have three times the con­struc­tion, three times the dis­rup­tion, three times the cost. And at the end of it, you’d still only use one of them.

The ra­tio­nal so­lu­tion is to build the in­fra­struc­ture once, as a shared, neu­tral as­set, and let dif­fer­ent com­pa­nies com­pete to pro­vide the ser­vice over that in­fra­struc­ture.

That’s how wa­ter works. That’s how elec­tric­ity works in most places. And in Switzerland, that’s how fiber op­tic in­ter­net works.

But in the United States and Germany, they did the op­po­site.

In Germany, the free mar­ket” ap­proach meant let­ting any com­pany dig up the street to lay their own fiber. The re­sult is called overbuild.” Multiple net­works run­ning in par­al­lel trenches, of­ten just me­ters apart.

Billions of eu­ros spent on re­dun­dant con­crete and as­phalt. Money that could have been spent on faster equip­ment, lower prices, or con­nect­ing rural ar­eas, in­stead wasted on dig­ging the same hole twice, lit­er­ally.

But is­n’t Germany heav­ily reg­u­lated? Yes, but the reg­u­la­tions fo­cus heav­ily on in­fra­struc­ture com­pe­ti­tion rather than duct shar­ing en­force­ment.

Germany cham­pi­ons in­fra­struc­ture com­pe­ti­tion, mean­ing it prefers mul­ti­ple com­pa­nies lay­ing their own ca­bles rather than shar­ing a sin­gle net­work. At the same time, the reg­u­la­tory sys­tem wastes enor­mous amounts of time on wait­ing for dig­ging per­mits and on court­room bat­tles just to ob­tain ba­sic in­for­ma­tion about ex­ist­ing ducts.

Germany also has a large in­cum­bent, Deutsche Telekom, which uses ex­ist­ing reg­u­la­tions to its com­pet­i­tive ad­van­tage against smaller ISPs. While Germany does have laws re­quir­ing Deutsche Telekom to share its ducts with com­peti­tors, in prac­tice smaller ISPs face un­rea­son­able hur­dles such as high fees, pro­ce­dural de­lays, and le­gal dou­ble bur­dens that un­der­mine ef­fec­tive ac­cess.

Sharing ducts is not as bad as dig­ging two trenches but it is still a waste of re­sources.

The United States took a dif­fer­ent path, but the re­sult is equally bad. Instead of over­build, they got ter­ri­to­r­ial mo­nop­o­lies, in some places paid for by the fed­eral gov­ern­ment.

In most American cities, you don’t have a choice of fiber providers. You have what­ever in­cum­bent hap­pens to serve your neigh­bor­hood. Comcast has one area. Spectrum has an­other. AT&T has a third.

This is mar­keted as com­pe­ti­tion. But it’s not. It’s a car­tel. Each com­pany gets its own pro­tected ter­ri­tory, and con­sumers get no choice. If you don’t like your provider, your only al­ter­na­tive is of­ten DSL from the 1990s or a cel­lu­lar hotspot.

This is what hap­pens when you let nat­ural mo­nop­o­lies op­er­ate with­out over­sight. They don’t com­pete on price or qual­ity. They ex­tract rent.

And be­cause these net­works are built on the cheap us­ing P2MP, or shared ar­chi­tec­ture, your gigabit” con­nec­tion is shared with your en­tire neigh­bor­hood. At 8 PM, when every­one streams Netflix, that gi­ga­bit be­comes 200 megabits. Or 100. Or less.

The provider still charges you for gigabit.” They just don’t tell you that you’re shar­ing it with 31 other house­holds.

And it gets worse. In the United States, even if a com­peti­tor wanted to chal­lenge the in­cum­bent, they of­ten can’t. Because the Point of Presence, the cen­tral hub where all the fiber lines from homes con­verge, is pri­vate. It be­longs to Comcast or AT&T. Your fiber ter­mi­nates in their build­ing. A com­peti­tor can’t just in­stall equip­ment there. They would have to build their own net­work from scratch, dig­ging up the same streets, to reach you.

Now look at Switzerland. Here, the phys­i­cal in­fra­struc­ture, the fiber in the ground, is treated as a neu­tral, shared as­set. It’s built once, of­ten by a pub­lic or semi-pub­lic en­tity.

Every home gets a ded­i­cated 4-strand fiber line. Point-to-Point. Not shared. Not split 32 ways.

That ded­i­cated fiber ter­mi­nates in a neu­tral, open hub. And any in­ter­net ser­vice provider can con­nect to that hub.

Init7, Swisscom, Salt, or a tiny lo­cal ISP, they all have equal ac­cess to the phys­i­cal line that goes into your home.

This means you, the con­sumer, have gen­uine choice. When you sign up with a provider, you sim­ply give them your OTO (Optical Termination Outlet) num­ber, the unique iden­ti­fier printed on the fiber op­tic plate in your home. It tells the provider ex­actly which fiber con­nec­tion is yours. That’s it. No tech­ni­cian needs to visit. No one needs to dig up your street. You just call, give them the num­ber, and within days (not al­ways the case…), your new ser­vice is ac­tive.

And be­cause your home has four sep­a­rate fiber strands, you’re not locked into a sin­gle provider. You can have Init7 on one strand, Swisscom on an­other, and a lo­cal util­ity on a third. You can switch providers with a phone call. You can try a new provider with­out can­cel­ing your old one first. The com­pe­ti­tion hap­pens on price, speed, and cus­tomer ser­vice but not on who hap­pens to own the ca­ble in front of your house.

In Switzerland, you can get 25 Gigabit per sec­ond fiber to your home. Today. Symmetric. Dedicated. Not shared with your neigh­bors.

In Switzerland, you have a choice of a dozen or more providers in most cities. Prices are com­pet­i­tive. Customer ser­vice mat­ters be­cause you can leave at any time.

In the United States, the ma­jor­ity of house­holds have only one choice for high-speed in­ter­net. Speeds are lower. Prices are higher. And the tech­nol­ogy is of­ten a decade be­hind.

The free mar­ket” promised in­no­va­tion. It de­liv­ered rent-seek­ing. The in­cum­bents have no in­cen­tive to up­grade be­cause you have nowhere else to go.

American broad­band prices have risen faster than in­fla­tion for decades. Speeds have in­creased only when a com­peti­tor, usu­ally a mu­nic­i­pal util­ity, forces the in­cum­bent to re­spond.

Without com­pe­ti­tion, there is no in­no­va­tion. There is only profit ex­trac­tion.

Switzerland did­n’t ar­rive at this model by ac­ci­dent nor did it hap­pen be­cause tele­com com­pa­nies were feel­ing gen­er­ous. It hap­pened be­cause reg­u­la­tors forced it to hap­pen.

Back in 2008, when the in­dus­try sat down at the Round Table or­ga­nized by the Federal Communications Commission, it was Swisscom, the in­cum­bent it­self, that pushed for the four-fiber Point-to-Point model. The com­pany ar­gued that a sin­gle fiber would cre­ate a mo­nop­oly and that reg­u­la­tion would be nec­es­sary.

So the stan­dard was set. Four fibers per home. Point-to-Point. Open ac­cess for com­peti­tors on Layer 1 - the phys­i­cal fiber it­self.

Then, in 2020, Swisscom changed course. The com­pany an­nounced a new net­work ex­pan­sion strat­egy, this time us­ing P2MP, the shared model with split­ters. On pa­per, they ar­gued it was cheaper and faster to de­ploy.

But the ef­fect was clear. Under the P2MP de­sign, com­peti­tors would no longer have di­rect ac­cess to the phys­i­cal fiber. Instead of plug­ging into their own ded­i­cated fiber strand, they would have to rent ac­cess from Swisscom at a higher net­work layer - ef­fec­tively be­com­ing re­sellers of Swisscom’s in­fra­struc­ture. The open, com­pet­i­tive ma­trix that had been care­fully built over years would dis­ap­pear.

The small ISP Init7 filed a com­plaint with Switzerland’s com­pe­ti­tion au­thor­ity, COMCO, which later opened an in­ves­ti­ga­tion. In December 2020, they is­sued a pre­cau­tion­ary mea­sure: Swisscom could not con­tinue its P2MP roll­out un­less it guar­an­teed the same Layer 1 ac­cess that the orig­i­nal stan­dard pro­vided.

Swisscom fought this all the way to the Federal Court. They lost. In 2021, the Federal Administrative Court con­firmed COMCOs mea­sures, stat­ing that Swisscom had failed to demon­strate sufficient tech­no­log­i­cal or eco­nomic grounds” to de­vi­ate from the es­tab­lished fiber stan­dard. In April 2024, COMCO fi­nal­ized its rul­ing, fin­ing Swisscom 18 mil­lion francs for vi­o­lat­ing an­titrust law.

Swisscom is 51% owned by the Swiss Confederation. So, in sim­ple terms, 51% state-owned and 49% pri­vately/​in­sti­tu­tion­ally owned. Whether this makes the fine symbolic” is a mat­ter of opin­ion.

The re­sult? Swisscom was forced to re­turn to the four-fiber, Point-to-Point ar­chi­tec­ture it had orig­i­nally cham­pi­oned. Competitors re­tained their di­rect, phys­i­cal ac­cess to the fiber net­work. The walled gar­den was pre­vented.

Whether in­tended or not, the ef­fect of Swisscom’s P2MP shift was clear: com­peti­tors would have been locked out of the phys­i­cal in­fra­struc­ture.

Swisscom is a bit of a walk­ing con­tra­dic­tion. Being ma­jor­ity state-owned, it’s sup­posed to be a pub­lic ser­vice. But it’s also a pri­vate com­pany, and max­i­miz­ing profit ben­e­fits the state cof­fers. But that is some­thing for an­other blog post.

This is the para­dox that con­fuses so many peo­ple.

The American and German ap­proach of let­ting in­cum­bents build mo­nop­o­lies, al­low­ing waste­ful over­build, and re­fus­ing to reg­u­late nat­ural mo­nop­o­lies is of­ten called a free mar­ket.’

But it’s not free. And it’s not a mar­ket.

True cap­i­tal­ism re­quires com­pe­ti­tion. But in­fra­struc­ture is a nat­ural mo­nop­oly. If you treat it like a reg­u­lar con­sumer prod­uct, you don’t get com­pe­ti­tion. You get waste, or you get a mo­nop­oly.

The Swiss model un­der­stands this. They built the in­fra­struc­ture once, as a shared, neu­tral as­set, and then let the mar­ket com­pete on the ser­vices that run over it.

That’s not anti-cap­i­tal­ist. It’s ac­tu­ally bet­ter cap­i­tal­ism. It di­rects com­pe­ti­tion to where it adds value, not to where it de­stroys it.

The free mar­ket does­n’t mean let­ting pow­er­ful in­cum­bents do what­ever they want. It means cre­at­ing the con­di­tions where gen­uine com­pe­ti­tion can thrive.

What Can Be Done

So what can other coun­tries learn from Switzerland? Here are the key pol­icy changes that would help:

Mandate open ac­cess to phys­i­cal in­fra­struc­ture - re­quire in­cum­bents to share fiber ducts and dark fiber with com­peti­tors at cost-based prices. This is not socialism” - it is how elec­tric­ity and wa­ter work. Enforce Point-to-Point ar­chi­tec­ture - re­quire that every home gets ded­i­cated fiber strands, not shared split­ters. This en­sures com­peti­tors can ac­cess the phys­i­cal layer, not just re­sell band­width.Cre­ate a neu­tral fiber stan­dard - es­tab­lish na­tional stan­dards that re­quire multi-fiber de­ploy­ment to every home, as Switzerland did in 2008.Empower com­pe­ti­tion au­thor­i­ties - give reg­u­la­tors like COMCO real teeth to en­force these rules. Fines must be large enough to mat­ter.Sup­port mu­nic­i­pal fiber - al­low cities and towns to build their own fiber net­works when in­cum­bents fail to serve res­i­dents ad­e­quately.

If you care about faster in­ter­net and lower prices, push your rep­re­sen­ta­tives to sup­port these poli­cies. The tech­nol­ogy ex­ists. The money ex­ists. What is miss­ing is the po­lit­i­cal will to de­mand real com­pe­ti­tion.

...

Read the original on sschueller.github.io »

5 609 shares, 156 trendiness

No, I Won't Download Your App. The Web Version is A-OK.

No, I Won’t Download Your App. The Web Version is A-OK.

As some­one who prefers us­ing ser­vices via their web­sites, I’ve got­ten ter­ri­bly jaded lately. Almost every­one wants me, and by ex­ten­sion, you, to use their darn apps to con­sume con­tent and off their web ver­sions.

Whether it’s the ob­vi­ous so­cial me­dia apps or some­thing as ba­sic as park­ing, the app is the pri­or­ity and the site the red-headed stepchild. And they aren’t too sub­tle in the push ei­ther. It might be a modal cov­er­ing half the web ver­sion with links to the App Store, an im­me­di­ate popup af­ter a bit of scrolling, or a header scream­ing the app is 10x bet­ter,” but it’s al­ways there and it’s al­ways grat­ing.

Let’s not even go into the cases where the app is the only op­tion to ac­cess the ser­vice. A mi­nor an­noy­ance for or­der­ing food, but a ma­jor has­sle when it’s a pub­lic ser­vice or util­ity.

On prin­ci­ple, I like con­trol over what I see and how I see it. Apps are su­per lim­ited; while in a browser, I can do a lot of very nifty things to im­prove us­abil­ity.

A ser­vice lacks a dark mode? I can use any num­ber of user scripts. Reddit in­tro­duced a gam­ing sec­tion in the side­bar? Two-second fix that I bun­dled into my ex­ten­sion [1]. Between user­scripts, ad-block­ers, and cus­tom ex­ten­sions, I’m ba­si­cally a god, swag­ger­ing through my realm.

This con­trol, or lack thereof, also ex­plains the app mak­er’s ad­ver­sar­ial stance to­wards users. They are of­ten a black hole of dark pat­terns, and they’d like noth­ing get­ting in their way. Apps make it eas­ier for them to push no­ti­fi­ca­tions, col­lect in­tru­sive teleme­try, and keep you in­side their walled gar­den. A bet­ter user ex­pe­ri­ence is the pitch but se­cur­ing bet­ter user re­ten­tion is the end goal.

Most apps are just that. Text and me­dia in a never-end­ing, all-con­sum­ing feed or a multi-page form, clev­erly dis­guised by the user in­ter­face.

Excluding heavy 3D gam­ing or util­i­ties that gen­uinely re­quire deep in­te­gra­tion with your phone’s hard­ware (like ac­cess­ing the LiDAR scan­ner for AR), what are we ac­tu­ally left with? A thin client whose main job is to fetch data from an API and ren­der it onto na­tive views.

Why do I need to down­load a 100+ MB app, give it per­mis­sion to track my lo­ca­tion, and let it run back­ground processes just to browse through a restau­rant menu, buy a ticket, or scroll through a list of posts? At the end of the day, it is al­most al­ways just JSON be­ing parsed and ren­dered. Yet, com­pa­nies in­sist on re­build­ing their ba­sic con­tent as na­tive shells just to claim a per­ma­nent square of real es­tate on my home screen.

If a ser­vice is go­ing to pull you out of the browser, it should at least of­fer a pol­ished, na­tive ex­pe­ri­ence. But more of­ten than not, the app you just down­loaded is a com­pro­mise.

Anyone who en­dured the iOS-spe­cific shader com­pi­la­tion jank in early Flutter apps [2] knows ex­actly how grat­ing this can be (this spe­cific bug was fixed 2023ish fwiw). Before they swapped Skia out for the Impeller en­gine, I had to cap­ture and ship pre­com­piled shaders with my apps just to stop the UI from stut­ter­ing the first time an an­i­ma­tion ran.

The re­sult is of­ten the un­canny val­ley of user in­ter­faces. It’s not bro­ken, but it is sub­tly dif­fer­ent, some­times janky. The scroll ve­loc­ity does­n’t quite match the rest of the OS. The swipe back ges­ture hes­i­tates for a few mil­lisec­onds.

Human brains are re­mark­ably good at de­tect­ing when a sys­tem’s tim­ing is off. This is how the XZ back­door was caught: an en­gi­neer no­ticed their SSH lo­gins tak­ing a frac­tion of a sec­ond longer than usual. It’s not that unique — my old FPS bud­dies could tell our server re­gion just by fir­ing a shot and feel­ing the lag. [3]

These mi­cro in­ter­ac­tions mat­ter, be­cause with­out that fi­nal layer of pol­ish, the en­tire fa­cade of a na­tive ex­pe­ri­ence falls apart. Not every app is like this, ob­vi­ously, but enough of them are this way that it sours the en­tire ex­pe­ri­ence.

When that full-screen modal pops up de­mand­ing you down­load the app to read the rest of a thread, users choose the path of least re­sis­tance. They down­load and they move on.

To a PM star­ing at an an­a­lyt­ics dash­board, I’m an ac­cept­able ca­su­alty, an in­con­se­quen­tial mi­nor­ity. If de­grad­ing the web ver­sion suc­cess­fully fun­nels 80% of users into the App Store, that PM gets a pro­mo­tion and a big pay bump. As al­ways, ac­tions fol­low the in­cen­tive. Our de­mo­graphic is sim­ply too small to fac­tor into their quar­terly met­rics.

This is the en­shit­ti­fi­ca­tion loop in its full glory, work­ing ex­actly as in­tended. A ser­vice builds its ini­tial au­di­ence on the open web be­cause it’s fric­tion­less and in­dex­able. Once the user base is suf­fi­ciently locked in, the web ver­sion is de­lib­er­ately hob­bled to force every­one into the na­tive app. Once you’re in­side the app, the walls close in: you are now a cap­tive au­di­ence for a feed full of ads that your ad-blocker can no longer touch.

There is no fi­nan­cial in­cen­tive to main­tain a stel­lar web ex­pe­ri­ence any­more. The browser, once the great uni­ver­sal plat­form, is in­creas­ingly be­ing re­duced to a top-of-fun­nel mar­ket­ing chan­nel for the App Store. The de­press­ing part of it is that the num­bers prove it works.

...

Read the original on www.0xsid.com »

6 381 shares, 16 trendiness

love2d/love: LÖVE is an awesome 2D game framework for Lua.

LÖVE is an awe­some frame­work you can use to make 2D games in Lua. It’s free, open-source, and works on Windows, ma­cOS, Linux, Android, and iOS.

We use our wiki for doc­u­men­ta­tion. If you need fur­ther help, feel free to ask on our fo­rums, our Discord server, or our sub­red­dit.

We use the main’ branch for de­vel­op­ment of the next ma­jor re­lease, and there­fore it should not be con­sid­ered sta­ble.

There are also branches for cur­rently re­leased ma­jor ver­sions, which may have fixes and changes meant for up­com­ing patch re­leases within that ma­jor ver­sion.

We tag all our re­leases (since we started us­ing mer­cu­r­ial and git), and have bi­nary down­loads avail­able for them.

Experimental changes are some­times de­vel­oped in a sep­a­rate love-ex­per­i­ments repos­i­tory.

Files for re­leases are in the re­leases sec­tion on GitHub. The site has links to files and ad­di­tional plat­form con­tent for the lat­est re­lease.

There are also un­sta­ble/​nightly builds:

* Builds for some plat­forms are au­to­mat­i­cally cre­ated af­ter each com­mit and are avail­able through GitHub’s CI in­ter­faces.

* For ubuntu linux they are in ppa:bartbes/​love-un­sta­ble

* For arch linux there’s love-git in the AUR.

The test suite in test­ing/ cov­ers all the LÖVE APIs, and tests them the same way de­vel­op­ers use them. You can view cur­rent test cov­er­age from any ac­tion.

You can run the suite lo­cally like you would run a nor­mal LÖVE pro­ject, e.g.:

love test­ing

See the readme in the test­ing folder for more info.

The best places to con­tribute are through the is­sue tracker and the of­fi­cial Discord server or IRC chan­nel.

For code con­tri­bu­tions, pull re­quests and patches are wel­come. Be sure to read the source code style guide. Changes and new fea­tures typ­i­cally get dis­cussed in the is­sue tracker or on Discord or the fo­rums be­fore a pull re­quest is made.

Follow the in­struc­tions at the mega­source repos­i­tory page.

Because in-tree builds are not al­lowed, the Makefiles needs to be gen­er­ated in a sep­a­rate build di­rec­tory. In this ex­am­ple, folder named build is used:

Download or clone this repos­i­tory and copy, move, or sym­link the ma­cOS/​Frame­works sub­folder into love’s plat­form/​xcode/​ma­cosx folder and the shared sub­folder into love’s plat­form/​xcode folder.

Then use the Xcode pro­ject found at plat­form/​xcode/​love.xcode­proj to build the love-ma­cosx tar­get.

Download the love-ap­ple-de­pen­den­cies zip file cor­re­spond­ing to the LÖVE ver­sion be­ing used from the Releases page, un­zip it, and place the iOS/​li­braries sub­folder into love’s plat­form/​xcode/​ios folder and the shared sub­folder into love’s plat­form/​xcode folder.

Or, down­load or clone this repos­i­tory and copy, move, or sym­link the iOS/​li­braries sub­folder into love’s plat­form/​xcode/​ios folder and the shared sub­folder into love’s plat­form/​xcode folder.

Then use the Xcode pro­ject found at plat­form/​xcode/​love.xcode­proj to build the love-ios tar­get.

See readme-iOS.rtf for more in­for­ma­tion.

...

Read the original on github.com »

7 369 shares, 15 trendiness

Running Google Gemma 4 Locally With LM Studio’s New Headless CLI & Claude Code

Cloud AI APIs are great un­til they are not. Rate lim­its, us­age costs, pri­vacy con­cerns, and net­work la­tency all add up. For quick tasks like code re­view, draft­ing, or test­ing prompts, a lo­cal model that runs en­tirely on your hard­ware has real ad­van­tages: zero API costs, no data leav­ing your ma­chine, and con­sis­tent avail­abil­ity.

Google’s Gemma 4 is in­ter­est­ing for lo­cal use be­cause of its mix­ture-of-ex­perts ar­chi­tec­ture. The 26B pa­ra­me­ter model only ac­ti­vates 4B pa­ra­me­ters per for­ward pass, which means it runs well on hard­ware that could never han­dle a dense 26B model. On my 14” MacBook Pro M4 Pro with 48 GB of uni­fied mem­ory, it fits com­fort­ably and gen­er­ates at 51 to­kens per sec­ond. Though there’s sig­nif­i­cant slow­downs when used within Claude Code from my ex­pe­ri­ence.

Google re­leased Gemma 4 as a fam­ily of four mod­els, not just one. The lineup spans a wide range of hard­ware tar­gets:

The E” mod­els (E2B, E4B) use Per-Layer Embeddings to op­ti­mize for on-de­vice de­ploy­ment and are the only vari­ants that sup­port au­dio in­put (speech recog­ni­tion and trans­la­tion). The 31B dense model is the most ca­pa­ble, scor­ing 85.2% on MMLU Pro and 89.2% on AIME 2026.

Why I picked the 26B-A4B. The mix­ture-of-ex­perts ar­chi­tec­ture is the key. It has 128 ex­perts plus 1 shared ex­pert, but only ac­ti­vates 8 ex­perts (3.8B pa­ra­me­ters) per to­ken. A com­mon rule of thumb es­ti­mates MoE dense - equiv­a­lent qual­ity as roughly sqrt(to­tal x ac­tive pa­ra­me­ters), which puts this model around 10B ef­fec­tive. In prac­tice, it de­liv­ers in­fer­ence cost com­pa­ra­ble to a 4B dense model with qual­ity that punches well above that weight class. On bench­marks, it scores 82.6% on MMLU Pro and 88.3% on AIME 2026, close to the dense 31B (85.2% and 89.2%) while run­ning dra­mat­i­cally faster.

The chart be­low tells the story. It plots Elo score against to­tal model size on a log scale for re­cent open-weight mod­els with think­ing en­abled. The blue-high­lighted re­gion in the up­per left is where you want to be: high per­for­mance, small foot­print.

Gemma 4 26B-A4B (Elo ~1441) sits firmly in that zone, punch­ing well above its 25.2B pa­ra­me­ter weight. The 31B dense vari­ant scores slightly higher (~1451) but is still re­mark­ably com­pact. For con­text, mod­els like Qwen 3.5 397B-A17B (~1450 Elo) and GLM-5 (~1457 Elo) need 100-600B to­tal pa­ra­me­ters to reach sim­i­lar scores. Kimi-K2.5 (~1457 Elo) re­quires over 1,000B. The 26B-A4B achieves com­pet­i­tive Elo with a frac­tion of the pa­ra­me­ters, which trans­lates di­rectly into lower mem­ory re­quire­ments and faster lo­cal in­fer­ence.

This is what makes MoE mod­els trans­for­ma­tive for lo­cal use. You do not need a clus­ter or a high-end GPU rig to run a model that com­petes with 400B+ pa­ra­me­ter be­he­moths. A lap­top with 48 GB of uni­fied mem­ory is enough.

For lo­cal in­fer­ence on a 48 GB Mac, this is the sweet spot. The dense 31B would con­sume more mem­ory and gen­er­ate to­kens slower be­cause every pa­ra­me­ter par­tic­i­pates in every for­ward pass. The E4B is lighter but no­tice­ably less ca­pa­ble. The 26B-A4B gives you 256K max con­text, vi­sion sup­port (useful for an­a­lyz­ing screen­shots and di­a­grams), na­tive func­tion/​tool call­ing, and rea­son­ing with con­fig­urable think­ing modes, all at 51 to­kens/​sec­ond on my hard­ware.

LM Studio has been a pop­u­lar desk­top app for run­ning lo­cal mod­els for a while. Version 0.4.0 changed the ar­chi­tec­ture fun­da­men­tally by in­tro­duc­ing llm­ster, the core in­fer­ence en­gine ex­tracted from the desk­top app and pack­aged as a stand­alone server.

The prac­ti­cal re­sult: you can now run LM Studio en­tirely from the com­mand line us­ing the lms CLI. No GUI re­quired. This makes it us­able on head­less servers, in CI/CD pipelines, SSH ses­sions, or just for de­vel­op­ers who pre­fer stay­ing in the ter­mi­nal.

* llm­ster dae­mon: a back­ground ser­vice that man­ages model load­ing and in­fer­ence with­out the desk­top app

* Parallel re­quest pro­cess­ing: con­tin­u­ous batch­ing in­stead of se­quen­tial queu­ing, so mul­ti­ple re­quests to the same model run con­cur­rently

* Stateful REST API: a new /v1/chat end­point that main­tains con­ver­sa­tion his­tory across re­quests

# Linux/Mac

curl -fsSL https://​lm­stu­dio.ai/​in­stall.sh | bash

# Windows

irm https://​lm­stu­dio.ai/​in­stall.ps1 | iexlms dae­mon up­lms run­time up­date llama.cpp

lms run­time up­date mlxlms get google/​gemma-4-26b-a4b

The CLI shows you the vari­ant it will down­load (Q4_K_M quan­ti­za­tion by de­fault, 17.99 GB) and asks for con­fir­ma­tion:

↓ To down­load: model google/​gemma-4-26b-a4b - 64.75 KB

└─ ↓ To down­load: Gemma 4 26B A4B Instruct Q4_K_M [GGUF] - 17.99 GB

About to down­load 17.99 GB.

? Start down­load?

❯ Yes

No

Change vari­ant se­lec­tion

If you al­ready have the model, the CLI tells you and shows the load com­mand:

✔ Start down­load? yes

Model al­ready down­loaded. To use, run: lms load google/​gemma-4-26b-a4blms lsYou have 10 mod­els, tak­ing up 118.17 GB of disk space.

LLM PARAMS ARCH SIZE DEVICE

gemma-3-270m-it-mlx 270m gem­ma3_­text 497.80 MB Local

google/​gemma-4-26b-a4b (1 vari­ant) 26B-A4B gem­ma4 17.99 GB Local

gpt-oss-20b-mlx 20B gp­t_oss 22.26 GB Local

llama-3.2-1b-in­struct 1B Llama 712.58 MB Local

nvidia/​nemotron-3-nano (1 vari­ant) 30B nemotron_h 17.79 GB Local

ope­nai/​gpt-oss-20b (1 vari­ant) 20B gpt-oss 12.11 GB Local

qwen/​qwen3.5-35b-a3b (1 vari­ant) 35B-A3B qwen35­moe 22.07 GB Local

qwen2.5-0.5b-in­struct-mlx 0.5B Qwen2 293.99 MB Local

zai-org/​glm-4.7-flash (1 vari­ant) 30B glm4_­moe_lite 24.36 GB Local

EMBEDDING PARAMS ARCH SIZE DEVICE

text-em­bed­ding-nomic-em­bed-text-v1.5 Nomic BERT 84.11 MB Local

Worth not­ing: sev­eral of these mod­els use mix­ture-of-ex­perts ar­chi­tec­tures (Gemma 4, Qwen 3.5, GLM 4.7 Flash). MoE mod­els punch above their weight for lo­cal in­fer­ence be­cause only a frac­tion of pa­ra­me­ters ac­ti­vate per to­ken.

Start a chat ses­sion with stats en­abled to see per­for­mance num­bers:

lms chat google/​gemma-4-26b-a4b –stats ╭─────────────────────────────────────────────────╮

👾 lms chat │

│ Type exit or Ctrl+C to quit │

│ Chatting with google/​gemma-4-26b-a4b │

│ Try one of the fol­low­ing com­mands: │

│ /model - Load a model (type /model to see list) │

│ /download - Download a model │

│ /clear - Clear the chat his­tory │

│ /help - Show help in­for­ma­tion │

With –stats, you get pre­dic­tion met­rics af­ter each re­sponse:

Prediction Stats:

Stop Reason: eosFound

Tokens/Second: 51.35

Time to First Token: 1.551s

Prompt Tokens: 39

Predicted Tokens: 176

Total Tokens: 215

51 to­kens/​sec­ond on a 14” MacBook Pro M4 Pro (48 GB) with a 26B model is solid. Time to first to­ken at 1.5 sec­onds is re­spon­sive enough for in­ter­ac­tive use.

See what is cur­rently loaded:

lms psI­DEN­TI­FIER MODEL STATUS SIZE CONTEXT PARALLEL DEVICE TTL

google/​gemma-4-26b-a4b google/​gemma-4-26b-a4b IDLE 17.99 GB 48000 2 Local 60m / 1h

The model oc­cu­pies 17.99 GB in mem­ory with a 48K con­text win­dow and sup­ports 2 par­al­lel re­quests. The TTL (time-to-live) auto-un­loads the model af­ter 1 hour of idle time, free­ing mem­ory with­out man­ual in­ter­ven­tion.

lms ps –json | jq

* vision”: true and trainedForToolUse”: true - Gemma 4 sup­ports both im­age in­put and tool call­ing

* maxContextLength”: 262144 - the model sup­ports up to 256K con­text, though the de­fault load is 48K

Before load­ing a model, you can es­ti­mate mem­ory re­quire­ments at dif­fer­ent con­text lengths us­ing –estimate-only. I wrote a small script to test across the full range:

The base model takes about 17.6 GiB re­gard­less of con­text. Each dou­bling of con­text length adds roughly 3-4 GiB. At the de­fault 48K con­text, you need about 21 GiB. On my 48 GB MacBook Pro, I can push to the full 256K con­text at 37.48 GiB and still have about 10 GB free for the OS and other apps. A 36 GB Mac could com­fort­ably run 200K con­text with head­room.

lms load google/​gemma-4-26b-a4b –estimate-only –context-length 48000

Model: google/​gemma-4-26b-a4b

Context Length: 48,000

Estimated GPU Memory: 21.05 GiB

Estimated Total Memory: 21.05 GiB

Estimate: This model may be loaded based on your re­source guardrails set­tings.

This is use­ful for ca­pac­ity plan­ning. If you want to run Gemma 4 along­side other ap­pli­ca­tions, check the es­ti­mate at your tar­get con­text length first.

Here is the full script I used to gen­er­ate the table above. You can swap in any model name and con­text length list to pro­file a dif­fer­ent model:

#!/usr/bin/env bash

model=“google/​gemma-4-26b-a4b”

con­texts=(4096 8000 16000 24000 32000 48000 64000 96000 128000 200000 256000)

table_­con­texts=()

table_gpu=()

table_­to­tal=()

for ctx in ${contexts[@]}”; do

out­put=“$(lms load $model” –estimate-only –context-length $ctx” 2>&1)”

parsed_­con­text=“$(printf %s\n’ $output” | awk -F’: ′ /^Context Length:/ {print $2; exit}’)”

parsed_gpu=“$(printf %s\n’ $output” | awk -F’: +′ /^Estimated GPU Memory:/ {print $2; exit}’)”

parsed_­to­tal=“$(printf %s\n’ $output” | awk -F’: +′ /^Estimated Total Memory:/ {print $2; exit}’)”

table_­con­texts+=(“${parsed_­con­text:-$ctx}“)

table_gpu+=(“${parsed_gpu:-N/​A}“)

table_­to­tal+=(“${parsed_­to­tal:-N/​A}“)

done

printf | Model | Context Length | GPU Memory | Total Memory |\n’

printf |–-|–-:|–-:|–-:|\n’

for i in ${!table_contexts[@]}”; do

printf | %s | %s | %s | %s |\n’ \

$model” ${table_contexts[$i]}” ${table_gpu[$i]}” ${table_total[$i]}”

done

...

Read the original on ai.georgeliu.com »

8 304 shares, 14 trendiness

...

Read the original on musicforprogramming.net »

9 300 shares, 128 trendiness

[MODEL] Claude Code is unusable for complex engineering tasks with the Feb updates · Issue #42796 · anthropics/claude-code

* This re­port does NOT con­tain sen­si­tive in­for­ma­tion (API keys, pass­words, etc.)

Claude has re­gressed to the point it can­not be trusted to per­form com­plex en­gi­neer­ing.

Does the op­po­site of re­quested ac­tiv­i­ties

Claude should be­have like it did in January.

Accept Edits was ON (auto-accepting changes)

Yes, every time with the same prompt

Produced by claude based on my ex­ten­sive data - if there’s any is­sues, it’s be­cause an­thropic does­n’t let claude think any­more ;) Unfortunately claude deleted my January logs con­tain­ing a bulk of my work so only sum­mary analy­sis is avail­able - January was what I ex­pect, Febuary started slid­ing, and March was a com­plete and ut­ter loss.

Quantitative analy­sis of 17,871 think­ing blocks and 234,760 tool calls across

6,852 Claude Code ses­sion files re­veals that the roll­out of think­ing con­tent

redac­tion (redact-thinking-2026-02-12) cor­re­lates pre­cisely with a mea­sured

qual­ity re­gres­sion in com­plex, long-ses­sion en­gi­neer­ing work­flows.

The data sug­gests that ex­tended think­ing to­kens are not a nice to have” but

are struc­turally re­quired for the model to per­form multi-step re­search,

con­ven­tion ad­her­ence, and care­ful code mod­i­fi­ca­tion. When think­ing depth is

re­duced, the mod­el’s tool us­age pat­terns shift mea­sur­ably from re­search-first

to edit-first be­hav­ior, pro­duc­ing the qual­ity is­sues users have re­ported.

This re­port pro­vides data to help Anthropic un­der­stand which work­flows are

most af­fected and why, with the goal of in­form­ing de­ci­sions about think­ing

to­ken al­lo­ca­tion for power users.

The qual­ity re­gres­sion was in­de­pen­dently re­ported on March 8 — the ex­act date

redacted think­ing blocks crossed 50%. The roll­out pat­tern (1.5% → 25% → 58% →

100% over one week) is con­sis­tent with a staged de­ploy­ment.

The sig­na­ture field on think­ing blocks has a 0.971 Pearson cor­re­la­tion

with think­ing con­tent length (measured from 7,146 paired sam­ples where both

are pre­sent). This al­lows es­ti­ma­tion of think­ing depth even af­ter redac­tion.

Thinking depth had al­ready dropped ~67% by late February, be­fore redac­tion

be­gan. The redac­tion roll­out in early March made this in­vis­i­ble to users.

These met­rics were com­puted in­de­pen­dently from 18,000+ user prompts be­fore

the think­ing analy­sis was per­formed.

A stop hook (stop-phrase-guard.sh) was built to pro­gram­mat­i­cally catch

own­er­ship-dodg­ing, pre­ma­ture stop­ping, and per­mis­sion-seek­ing be­hav­ior.

It fired 173 times in 17 days af­ter March 8. It fired zero times be­fore.

Analysis of 234,760 tool in­vo­ca­tions shows the model stopped read­ing code

be­fore mod­i­fy­ing it.

The model went from 6.6 reads per edit to 2.0 reads per edit — a 70%

re­duc­tion in re­search be­fore mak­ing changes.

In the good pe­riod, the mod­el’s work­flow was: read the tar­get file, read

re­lated files, grep for us­ages across the code­base, read head­ers and tests,

then make a pre­cise edit. In the de­graded pe­riod, it reads the im­me­di­ate

file and ed­its, of­ten with­out check­ing con­text.

The de­cline in re­search ef­fort be­gins in mid-Feb­ru­ary — the same pe­riod when

es­ti­mated think­ing depth dropped 67%.

Full-file Write us­age dou­bled — the model in­creas­ingly chose to rewrite

en­tire files rather than make sur­gi­cal ed­its, which is faster but loses

pre­ci­sion and con­text aware­ness.

* 191,000 lines merged across two PRs in a week­end dur­ing the good pe­riod

Extended think­ing is the mech­a­nism by which the model:

* Plans multi-step ap­proaches be­fore act­ing (which files to read, what or­der)

* Catches its own mis­takes be­fore out­putting them

* Decides whether to con­tinue work­ing or stop (session man­age­ment)

When think­ing is shal­low, the model de­faults to the cheap­est ac­tion avail­able:

edit with­out read­ing, stop with­out fin­ish­ing, dodge re­spon­si­bil­ity for fail­ures,

take the sim­plest fix rather than the cor­rect one. These are ex­actly the

symp­toms ob­served.

Transparency about think­ing al­lo­ca­tion: If think­ing to­kens are be­ing

re­duced or capped, users who de­pend on deep rea­son­ing need to know. The

redact-think­ing header makes it im­pos­si­ble to ver­ify ex­ter­nally.

A max think­ing” tier: Users run­ning com­plex en­gi­neer­ing work­flows

would pay sig­nif­i­cantly more for guar­an­teed deep think­ing. The cur­rent

sub­scrip­tion model does­n’t dis­tin­guish be­tween users who need 200 think­ing

to­kens per re­sponse and users who need 20,000.

Thinking to­ken met­rics in API re­sponses: Even if think­ing con­tent is

redacted, ex­pos­ing think­ing_­to­kens in the us­age re­sponse would let users

mon­i­tor whether their re­quests are get­ting the rea­son­ing depth they need.

Canary met­rics from power users: The stop hook vi­o­la­tion rate

(0 → 10/day) is a ma­chine-read­able sig­nal that could be mon­i­tored across

the user base as a lead­ing in­di­ca­tor of qual­ity re­gres­sions.

The fol­low­ing be­hav­ioral pat­terns were mea­sured across 234,760 tool calls and

18,000+ user prompts. Each is a pre­dictable con­se­quence of re­duced rea­son­ing

depth: the model takes short­cuts be­cause it lacks the think­ing bud­get to

eval­u­ate al­ter­na­tives, check con­text, or plan ahead.

When the model has suf­fi­cient think­ing bud­get, it reads re­lated files, greps

for us­ages, checks head­ers, and reads tests be­fore mak­ing changes. When

think­ing is shal­low, it skips re­search and ed­its di­rectly.

One in three ed­its in the de­graded pe­riod was made to a file the model had

not read in its re­cent tool his­tory. The prac­ti­cal con­se­quence: ed­its that

break sur­round­ing code, vi­o­late file-level con­ven­tions, splice new code into

the mid­dle of ex­ist­ing com­ment blocks, or du­pli­cate logic that al­ready ex­ists

else­where in the file.

Spliced com­ments are a par­tic­u­larly vis­i­ble symp­tom. When the model ed­its

a file it has­n’t read, it does­n’t know where com­ment blocks end and code

be­gins. It in­serts new de­c­la­ra­tions be­tween a doc­u­men­ta­tion com­ment and the

func­tion it doc­u­ments, break­ing the se­man­tic as­so­ci­a­tion. This never hap­pened

in the good pe­riod be­cause the model al­ways read the file first.

When think­ing is deep, the model re­solves con­tra­dic­tions in­ter­nally be­fore

pro­duc­ing out­put. When think­ing is shal­low, con­tra­dic­tions sur­face in the

out­put as vis­i­ble self-cor­rec­tions: oh wait”, actually,”, let me

re­con­sider”, hmm, ac­tu­ally”, no wait.”

The rate more than tripled. In the worst ses­sions, the model pro­duced 20+

rea­son­ing re­ver­sals in a sin­gle re­sponse — gen­er­at­ing a plan, con­tra­dict­ing

it, re­vis­ing, con­tra­dict­ing the re­vi­sion, and ul­ti­mately pro­duc­ing out­put

that could not be trusted be­cause the rea­son­ing path was vis­i­bly in­co­her­ent.

The word simplest” in the mod­el’s out­put is a sig­nal that it is op­ti­miz­ing

for the least ef­fort rather than eval­u­at­ing the cor­rect ap­proach. With deep

think­ing, the model eval­u­ates mul­ti­ple ap­proaches and chooses the right one.

With shal­low think­ing, it grav­i­tates to­ward what­ever re­quires the least

rea­son­ing to jus­tify.

In one ob­served 2-hour win­dow, the model used simplest” 6 times while

pro­duc­ing code that its own later self-cor­rec­tions de­scribed as lazy and

wrong”, rushed”, and sloppy.” Each time, the model had cho­sen an ap­proach

...

Read the original on github.com »

10 293 shares, 17 trendiness

Advanced Search for YouTube

Try ad­just­ing your search terms or fil­ters

...

Read the original on playlists.at »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.