10 interesting stories served every morning and every evening.
AI Edge Gallery is the premier destination for running the world’s most powerful open-source Large Language Models (LLMs) on your mobile device. Experience high-performance Generative AI directly on your hardware—fully offline, private, and lightning-fast.
Now Featuring: Gemma 4
This update brings official support for the newly released Gemma 4 family. As the centerpiece of this release, Gemma 4 allows you to test the cutting edge of on-device AI. Experience advanced reasoning, logic, and creative capabilities without ever sending your data to a server.
Core Features
- Agent Skills: Transform your LLM from a conversationalist into a proactive assistant. Use the Agent Skills tile to augment model capabilities with tools like Wikipedia for fact-grounding, interactive maps, and rich visual summary cards. You can even load modular skills from a URL or browse community contributions on GitHub Discussions.
- AI Chat with Thinking Mode: Engage in fluid, multi-turn conversations and toggle the new Thinking Mode to peek “under the hood.” This feature allows you to see the model’s step-by-step reasoning process, which is perfect for understanding complex problem-solving. Note: Thinking Mode currently works with supported models, starting with the Gemma 4 family.
- Ask Image: Use multimodal power to identify objects, solve visual puzzles, or get detailed descriptions using your device’s camera or photo gallery.
- Audio Scribe: Transcribe and translate voice recordings into text in real-time using high-efficiency on-device language models.
- Prompt Lab: A dedicated workspace to test different prompts and single-turn use cases with granular control over model parameters like temperature and top-k.
- Mobile Actions: Unlock offline device controls and automated tasks powered entirely by a finetune of FuntionGemma 270m.
- Tiny Garden: A fun, experimental mini-game that uses natural language to plant and harvest a virtual garden using a finetune of FunctionGemma 270m.
- Model Management & Benchmark: Gallery is a flexible sandbox for a wide variety of open-source models. Easily download models from the list or load your own custom models. Manage your model library effortlessly and run benchmark tests to understand exactly how each model performs on your specific hardware.
- 100% On-Device Privacy: All model inferences happen directly on your device hardware. No internet is required, ensuring total privacy for your prompts, images, and sensitive data.
Built for the Community
AI Edge Gallery is an open-source project designed for the developer community and AI enthusiasts alike. Explore our example features, contribute your own skills, and help shape the future of the on-device agent ecosystem.
Check out the source code on GitHub:
https://github.com/google-ai-edge/gallery
Note: This app is in active development. Performance is dependent on your device’s hardware (CPU/GPU). For support or feedback, contact us at google-ai-edge-gallery-android-feedback@google.com.
...
Read the original on apps.apple.com »
This project exists to show that training your own language model is not magic.
No PhD required. No massive GPU cluster. One Colab notebook, 5 minutes, and you have a working LLM that you built from scratch — data generation, tokenizer, model architecture, training loop, and inference. If you can run a notebook, you can train a language model.
It won’t produce a billion-parameter model that writes essays. But it will show you exactly how every piece works — from raw text to trained weights to generated output — so the big models stop feeling like black boxes.
GuppyLM is a tiny language model that pretends to be a fish named Guppy. It speaks in short, lowercase sentences about water, food, light, and tank life. It doesn’t understand human abstractions like money, phones, or politics — and it’s not trying to.
It’s trained from scratch on 60K synthetic conversations across 60 topics, runs on a single GPU in ~5 minutes, and produces a model small enough to run in a browser.
Vanilla transformer. No GQA, no RoPE, no SwiGLU, no early exit. As simple as it gets.
* Experiences the world through water, temperature, light, vibrations, and food
* Is friendly, curious, and a little dumb
60 topics: greetings, feelings, temperature, food, light, water, tank, noise, night, loneliness, bubbles, glass, reflection, breathing, swimming, colors, taste, plants, filter, algae, snails, scared, excited, bored, curious, happy, tired, outside, cats, rain, seasons, music, visitors, children, meaning of life, time, memory, dreams, size, future, past, name, weather, sleep, friends, jokes, fear, love, age, intelligence, health, singing, TV, and more.
Downloads the pre-trained model from HuggingFace and lets you chat. Just run all cells.
pip install torch tokenizers
python -m guppylm chat
from datasets import load_dataset
ds = load_dataset(“arman-bd/guppylm-60k-generic”)
print(ds[“train”][0])
# {‘input’: ‘hi guppy’, ‘output’: ‘hello. the water is nice today.’, ‘category’: ‘greeting’}
Why no system prompt? Every training sample had the same one. A 9M model can’t conditionally follow instructions — the personality is baked into the weights. Removing it saves ~60 tokens per inference.
Why single-turn only? Multi-turn degraded at turn 3-4 due to the 128-token context window. A fish that forgets is on-brand, but garbled output isn’t. Single-turn is reliable.
Why vanilla transformer? GQA, SwiGLU, RoPE, and early exit add complexity that doesn’t help at 9M params. Standard attention + ReLU FFN + LayerNorm produces the same quality with simpler code.
Why synthetic data? A fish character with consistent personality needs consistent training data. Template composition with randomized components (30 tank objects, 17 food types, 25 activities) generates ~16K unique outputs from ~60 templates.
...
Read the original on github.com »
A few years ago I was in a meeting with developers and someone asked a simple question: “What’s the right framework for a new Windows desktop app?”
Dead silence. One person suggested WPF. Another said WinUI 3. A third asked if they should just use Electron. The meeting went sideways and we never did answer the question.
That silence is the story. And the story goes back thirty-plus years.
When a platform can’t answer “how should I build a UI?” in under ten seconds, it has failed its developers. Full stop.
In 1988, Charles Petzold published Programming Windows. 852 pages. Win16 API in C. And for all its bulk, it represented something remarkable: a single, coherent, authoritative answer to how you write a Windows application. In the business, we call that a ‘strategy’.
Win32 that followed was bigger but still coherent. Message loops. Window procedures. GDI. The mental model was a bit whacky, but it was one mental model. Petzold explained it. It was the F=MA of Windows. Simple. Powerful. You learned it. You used it. You were successful.
Clarity is your friend! One OS, one API, one language, one book. There was no committee debating managed-code alternatives. There was just Win32 and Petzold, and it worked. This was Physics not Chemistry (this works but only for this slice of the period table. And only under these pressures. And only within this temperature. And only if the Moon is in the 7th house of Jupiter).
What happened next is a masterclass in how a company with brilliant people and enormous resources can produce a thirty-year boof-a-rama by optimizing for the wrong things. AKA Brillant people doing stupid things.
Win32 had real limitations, so Microsoft did what Microsoft does: it shipped something new for the developer conference. Several somethings.
MFC (1992) wrapped Win32 in C++. If Win32 was inelegant, MFC was Win32 wearing a tuxedo made of other tuxedos. Then came OLE. COM. ActiveX. None of these were really GUI frameworks — they were component architectures — but they infected every corner of Windows development and introduced a level of cognitive complexity that makes Kierkegaard read like Hemingway.
I sat through a conference session in the late nineties trying to understand the difference between an OLE document, a COM object, and an ActiveX control. I looked at the presenter like they had a rat’s tail hanging out of his mouth for the entire hour.
Microsoft wasn’t selling a coherent story. It was selling technology primitives and telling developers to figure out the story themselves. That’s the Conference Keynote Cluster***k — Microsoft optimized for an executive impressing people with their keynote and not the success of the users or developers.
At PDC 2003, Microsoft unveiled Longhorn — genuinely one of the most compelling technical visions the company had ever put in front of developers. Three pillars: WinFS (a relational file system), Indigo (unified communications), and Avalon — later WPF — a GPU-accelerated, vector-based UI subsystem driven by a declarative XML language called XAML. Developers saw the Avalon demos and went nuts. It was the right vision.
It was also, in the words of Jim Allchin’s internal memo from January 2004, “a pig.”
By August 2004, Microsoft announced a complete development reset. Scrapped. Start over from the Server 2003 codebase. And after the reset, leadership issued a quiet directive: no f***ing managed code in Windows. All new code in C++. WPF would ship alongside Vista, but the shell itself would not use it.
The Windows team’s bitterness toward .NET never healed. From their perspective, gambling on a new managed-code framework had produced the most embarrassing failure in the company’s history. That bitterness created a thirteen-year institutional civil war between the Windows team and the .NET team that would ultimately orphan WPF, kill Silverlight, doom UWP, and give us the GUI ecosystem boof-a-rama we have today.
WPF shipped in late 2006. It was remarkable — XAML, hardware-accelerated rendering, real data binding. If Microsoft had made it the definitive answer and invested relentlessly, the story might have ended differently. Instead, in 2007, they launched Silverlight: a stripped-down browser plugin to compete with Flash, cross-platform, elegant, and the foundation for Windows Phone. Around 2010 it looked like the rich client future.
Then at MIX 2010, a Microsoft executive said in a Q&A that Silverlight was not a cross-platform strategy — it was about Windows Phone. HTML5 was now policy. The Silverlight team was not told this was coming. Developers who had bet their LOB applications on Silverlight found out from a conference Q&A.
Silverlight wasn’t killed by technical failure. The technology was fine. It was killed by a business strategy decision, and developers were the last to know.
Remember that pattern. We’ll see it again.
Apple had sold 200 million iPhones. The iPad was eating into PC sales. Microsoft’s answer was Windows 8 and Metro — a touch-first runtime called WinRT that was deliberately not built on .NET. Remember the Windows team’s bitterness? Here it manifests. WinRT was a native C++ runtime. Clean break from WPF, WinForms, and a decade of developer investment in .NET.
There were actually two stories being told simultaneously inside Microsoft. The Windows team was building WinRT. The .NET team was still evangelizing WPF. Different buildings, different VPs, different road maps.
What developers heard at //Build 2012: the future is WinRT, and also HTML+JS is first-class, and also .NET still works, and also C++ is back, and also you should write Metro apps, and also your WPF code still runs fine. That is not a strategy. That is a Hunger Games stage where six teams are fighting for your attention.
Enterprise developers took one look at UWP’s sandboxing, its Store deployment requirement, and its missing Win32 APIs, and walked away. The framework designed to win them into the modern era had been optimized for a tablet app store that never materialized.
Windows 10 brought Universal Windows Platform — write once, run on PC, phone, Xbox, HoloLens. Compelling on paper. The problem: Windows Phone was dying, and Microsoft’s own flagship apps — Office, Visual Studio, the shell itself — weren’t using UWP. The message was clear even if no one said it out loud.
When UWP stalled, the official answer became it depends. Use UWP for new apps, keep WPF for existing ones, add modern APIs via XAML Islands, wait for WinUI 3, but also WinUI 2 exists for UWP specifically, and Project Reunion will fix everything, except we’re renaming it Windows App SDK and it still doesn’t fully replace UWP and…
Project Reunion / WinUI 3 represents genuine progress. But ask yourself why the problem existed at all. UWP’s controls were tied to the OS because the Windows team owned them. The .NET team didn’t. The developer tools team didn’t. Project Reunion was an organizational workaround dressed up as a technical solution.
One developer’s summary, written in 2024: “I’ve been following Microsoft’s constant changes: UAP, UWP, C++/CX replaced by C++/WinRT without tool support, XAML Islands, XAML Direct, Project Reunion, the restart of WinAppSDK, the chaotic switch between WinUI 2.0 and 3.0…” Fourteen years. Fourteen pivots. That person deserves a medal and an apology, in that order.
Here is every GUI technology actually shipping on Windows today:
* Win32 (1985) — Still here. Still used. Petzold’s book still applies.
* MFC (1992) — C++ wrapper on Win32. Maintenance mode. Lives in enterprise and CAD.
* WinForms (2002) — .NET wrapper on Win32. “Available but discouraged.” Still fastest for data-entry forms.
* Electron — Chromium + Node.js. VS Code, Slack, Discord. The most widely deployed desktop GUI technology on Windows right now — and Microsoft had nothing to do with it.
* Avalonia — Open source WPF spiritual successor. Used by JetBrains, GitHub, Unity — developers who stopped waiting for Microsoft.
* Uno Platform — WinUI APIs on every platform. More committed to WinUI than Microsoft is.
* Delphi / RAD Studio — Still alive. Still fast. Still in vertical market software.
* Java Swing / JavaFX — Yes, still in production. The enterprise never forgets.
Seventeen approaches. Five programming languages. Three rendering philosophies. That is not a platform. I might not have a dictionary definition for the term boof-a-rama but I know one when I see it.
Every failed GUI initiative traces back to one of three causes: internal team politics (Windows vs. .NET), a developer conference announcement driving a premature platform bet (Metro, UWP), or a business strategy pivot that orphaned developers without warning (Silverlight). None of these are technical failures. The technology was often genuinely good — WPF was good, Silverlight was good, XAML is good. The organizational failure was the product.
You either have a Plausible Theory of Success that covers the full lifecycle — adoption, investment, maintenance, and migration — or you have a developer conference keynote.
One is a strategy. The other is a thirty-year boof-a-rama.
Charles Petzold wrote six editions of Programming Windows trying to keep up with each new thing Microsoft announced. He stopped after the sixth, which covered WinRT for Windows 8. That was 2012.
...
Read the original on www.jsnover.com »
What Can Be Done
You may have heard about 25 Gbit symmetrical internet in Switzerland. This is often cited as the fastest dedicated (non-shared) residential connection in the world. However, did you ever wonder why Switzerland has such fast internet at a reasonable price while the United States and other countries like Switzerland’s neighbor Germany are falling behind?
What is the fundamental difference between the countries that leads to such a stark difference in internet speeds and prices?
Free markets, regulation, technology, or all three?
Let’s take a closer look at the situation in Switzerland, Germany, and the United States.
This article is written by me and spell checked with AI. Many of the images are generated by AI and are mostly to break up the wall of text.
This Article is also available as a video (My first):
As mentioned, in Switzerland, you can get 25 Gigabit per second fiber internet to your home, symmetric and dedicated. If you don’t need such extreme speed, you can get 1 or 10 Gigabit from multiple competing providers for very little money. All over a connection that isn’t shared with your neighbors. In fact, someone could offer 100 Gigabit or more today; there is nothing preventing this other than the cost of endpoint equipment.
In the United States, if you’re lucky enough to have fiber, you might get 1 Gigabit. But often it’s shared with your neighbors. And you usually have exactly one choice of provider. Maybe two, if you count the cable company that offers slower speeds for the same price.
In Germany, you are in a somewhat similar situation to the United States. Fiber service is limited to one provider and is often shared with your neighbors.
The United States prides itself on free markets. On competition. On letting businesses fight it out. A deregulated market with no brakes.
Germany, on the other hand, is famous for over-regulation, making it difficult for businesses to operate, yet it is in a similar situation to the United States.
Switzerland has a highly regulated telecom sector with strong oversight and government-backed infrastructure projects, but regulations in Switzerland differ from those in Germany.
So why is the country that worships free markets producing stagnation, monopolies, and inferior internet, while the country with heavy regulation is producing hyper-competition, world-leading speeds, and consumer choice?
And at the same time, the country with the most regulation is suffering the same problems as the country with the least.
The answer reveals a fundamental truth about capitalism and regulation that most people get wrong.
To understand the failure, you have to understand what economists call a “natural monopoly.”
A natural monopoly is an industry where the cost of building the infrastructure is so high, and the cost of serving an additional customer is so low, that competition actually destroys value.
Think about water pipes. It would be insane to have three different water companies each digging up your street to lay their own pipes. You’d have three times the construction, three times the disruption, three times the cost. And at the end of it, you’d still only use one of them.
The rational solution is to build the infrastructure once, as a shared, neutral asset, and let different companies compete to provide the service over that infrastructure.
That’s how water works. That’s how electricity works in most places. And in Switzerland, that’s how fiber optic internet works.
But in the United States and Germany, they did the opposite.
In Germany, the “free market” approach meant letting any company dig up the street to lay their own fiber. The result is called “overbuild.” Multiple networks running in parallel trenches, often just meters apart.
Billions of euros spent on redundant concrete and asphalt. Money that could have been spent on faster equipment, lower prices, or connecting rural areas, instead wasted on digging the same hole twice, literally.
But isn’t Germany heavily regulated? Yes, but the regulations focus heavily on infrastructure competition rather than duct sharing enforcement.
Germany champions infrastructure competition, meaning it prefers multiple companies laying their own cables rather than sharing a single network. At the same time, the regulatory system wastes enormous amounts of time on waiting for digging permits and on courtroom battles just to obtain basic information about existing ducts.
Germany also has a large incumbent, Deutsche Telekom, which uses existing regulations to its competitive advantage against smaller ISPs. While Germany does have laws requiring Deutsche Telekom to share its ducts with competitors, in practice smaller ISPs face unreasonable hurdles such as high fees, procedural delays, and legal double burdens that undermine effective access.
Sharing ducts is not as bad as digging two trenches but it is still a waste of resources.
The United States took a different path, but the result is equally bad. Instead of overbuild, they got territorial monopolies, in some places paid for by the federal government.
In most American cities, you don’t have a choice of fiber providers. You have whatever incumbent happens to serve your neighborhood. Comcast has one area. Spectrum has another. AT&T has a third.
This is marketed as competition. But it’s not. It’s a cartel. Each company gets its own protected territory, and consumers get no choice. If you don’t like your provider, your only alternative is often DSL from the 1990s or a cellular hotspot.
This is what happens when you let natural monopolies operate without oversight. They don’t compete on price or quality. They extract rent.
And because these networks are built on the cheap using P2MP, or shared architecture, your “gigabit” connection is shared with your entire neighborhood. At 8 PM, when everyone streams Netflix, that gigabit becomes 200 megabits. Or 100. Or less.
The provider still charges you for “gigabit.” They just don’t tell you that you’re sharing it with 31 other households.
And it gets worse. In the United States, even if a competitor wanted to challenge the incumbent, they often can’t. Because the Point of Presence, the central hub where all the fiber lines from homes converge, is private. It belongs to Comcast or AT&T. Your fiber terminates in their building. A competitor can’t just install equipment there. They would have to build their own network from scratch, digging up the same streets, to reach you.
Now look at Switzerland. Here, the physical infrastructure, the fiber in the ground, is treated as a neutral, shared asset. It’s built once, often by a public or semi-public entity.
Every home gets a dedicated 4-strand fiber line. Point-to-Point. Not shared. Not split 32 ways.
That dedicated fiber terminates in a neutral, open hub. And any internet service provider can connect to that hub.
Init7, Swisscom, Salt, or a tiny local ISP, they all have equal access to the physical line that goes into your home.
This means you, the consumer, have genuine choice. When you sign up with a provider, you simply give them your OTO (Optical Termination Outlet) number, the unique identifier printed on the fiber optic plate in your home. It tells the provider exactly which fiber connection is yours. That’s it. No technician needs to visit. No one needs to dig up your street. You just call, give them the number, and within days (not always the case…), your new service is active.
And because your home has four separate fiber strands, you’re not locked into a single provider. You can have Init7 on one strand, Swisscom on another, and a local utility on a third. You can switch providers with a phone call. You can try a new provider without canceling your old one first. The competition happens on price, speed, and customer service but not on who happens to own the cable in front of your house.
In Switzerland, you can get 25 Gigabit per second fiber to your home. Today. Symmetric. Dedicated. Not shared with your neighbors.
In Switzerland, you have a choice of a dozen or more providers in most cities. Prices are competitive. Customer service matters because you can leave at any time.
In the United States, the majority of households have only one choice for high-speed internet. Speeds are lower. Prices are higher. And the technology is often a decade behind.
The “free market” promised innovation. It delivered rent-seeking. The incumbents have no incentive to upgrade because you have nowhere else to go.
American broadband prices have risen faster than inflation for decades. Speeds have increased only when a competitor, usually a municipal utility, forces the incumbent to respond.
Without competition, there is no innovation. There is only profit extraction.
Switzerland didn’t arrive at this model by accident nor did it happen because telecom companies were feeling generous. It happened because regulators forced it to happen.
Back in 2008, when the industry sat down at the Round Table organized by the Federal Communications Commission, it was Swisscom, the incumbent itself, that pushed for the four-fiber Point-to-Point model. The company argued that a single fiber would create a monopoly and that regulation would be necessary.
So the standard was set. Four fibers per home. Point-to-Point. Open access for competitors on Layer 1 - the physical fiber itself.
Then, in 2020, Swisscom changed course. The company announced a new network expansion strategy, this time using P2MP, the shared model with splitters. On paper, they argued it was cheaper and faster to deploy.
But the effect was clear. Under the P2MP design, competitors would no longer have direct access to the physical fiber. Instead of plugging into their own dedicated fiber strand, they would have to rent access from Swisscom at a higher network layer - effectively becoming resellers of Swisscom’s infrastructure. The open, competitive matrix that had been carefully built over years would disappear.
The small ISP Init7 filed a complaint with Switzerland’s competition authority, COMCO, which later opened an investigation. In December 2020, they issued a precautionary measure: Swisscom could not continue its P2MP rollout unless it guaranteed the same Layer 1 access that the original standard provided.
Swisscom fought this all the way to the Federal Court. They lost. In 2021, the Federal Administrative Court confirmed COMCO’s measures, stating that Swisscom had failed to demonstrate “sufficient technological or economic grounds” to deviate from the established fiber standard. In April 2024, COMCO finalized its ruling, fining Swisscom 18 million francs for violating antitrust law.
Swisscom is 51% owned by the Swiss Confederation. So, in simple terms, 51% state-owned and 49% privately/institutionally owned. Whether this makes the fine “symbolic” is a matter of opinion.
The result? Swisscom was forced to return to the four-fiber, Point-to-Point architecture it had originally championed. Competitors retained their direct, physical access to the fiber network. The walled garden was prevented.
Whether intended or not, the effect of Swisscom’s P2MP shift was clear: competitors would have been locked out of the physical infrastructure.
Swisscom is a bit of a walking contradiction. Being majority state-owned, it’s supposed to be a public service. But it’s also a private company, and maximizing profit benefits the state coffers. But that is something for another blog post.
This is the paradox that confuses so many people.
The American and German approach of letting incumbents build monopolies, allowing wasteful overbuild, and refusing to regulate natural monopolies is often called a ‘free market.’
But it’s not free. And it’s not a market.
True capitalism requires competition. But infrastructure is a natural monopoly. If you treat it like a regular consumer product, you don’t get competition. You get waste, or you get a monopoly.
The Swiss model understands this. They built the infrastructure once, as a shared, neutral asset, and then let the market compete on the services that run over it.
That’s not anti-capitalist. It’s actually better capitalism. It directs competition to where it adds value, not to where it destroys it.
The free market doesn’t mean letting powerful incumbents do whatever they want. It means creating the conditions where genuine competition can thrive.
What Can Be Done
So what can other countries learn from Switzerland? Here are the key policy changes that would help:
Mandate open access to physical infrastructure - require incumbents to share fiber ducts and dark fiber with competitors at cost-based prices. This is not “socialism” - it is how electricity and water work. Enforce Point-to-Point architecture - require that every home gets dedicated fiber strands, not shared splitters. This ensures competitors can access the physical layer, not just resell bandwidth.Create a neutral fiber standard - establish national standards that require multi-fiber deployment to every home, as Switzerland did in 2008.Empower competition authorities - give regulators like COMCO real teeth to enforce these rules. Fines must be large enough to matter.Support municipal fiber - allow cities and towns to build their own fiber networks when incumbents fail to serve residents adequately.
If you care about faster internet and lower prices, push your representatives to support these policies. The technology exists. The money exists. What is missing is the political will to demand real competition.
...
Read the original on sschueller.github.io »
No, I Won’t Download Your App. The Web Version is A-OK.
As someone who prefers using services via their websites, I’ve gotten terribly jaded lately. Almost everyone wants me, and by extension, you, to use their darn apps to consume content and off their web versions.
Whether it’s the obvious social media apps or something as basic as parking, the app is the priority and the site the red-headed stepchild. And they aren’t too subtle in the push either. It might be a modal covering half the web version with links to the App Store, an immediate popup after a bit of scrolling, or a header screaming “the app is 10x better,” but it’s always there and it’s always grating.
Let’s not even go into the cases where the app is the only option to access the service. A minor annoyance for ordering food, but a major hassle when it’s a public service or utility.
On principle, I like control over what I see and how I see it. Apps are super limited; while in a browser, I can do a lot of very nifty things to improve usability.
A service lacks a dark mode? I can use any number of user scripts. Reddit introduced a gaming section in the sidebar? Two-second fix that I bundled into my extension [1]. Between userscripts, ad-blockers, and custom extensions, I’m basically a god, swaggering through my realm.
This control, or lack thereof, also explains the app maker’s adversarial stance towards users. They are often a black hole of dark patterns, and they’d like nothing getting in their way. Apps make it easier for them to push notifications, collect intrusive telemetry, and keep you inside their walled garden. A better user experience is the pitch but securing better user retention is the end goal.
Most apps are just that. Text and media in a never-ending, all-consuming feed or a multi-page form, cleverly disguised by the user interface.
Excluding heavy 3D gaming or utilities that genuinely require deep integration with your phone’s hardware (like accessing the LiDAR scanner for AR), what are we actually left with? A thin client whose main job is to fetch data from an API and render it onto native views.
Why do I need to download a 100+ MB app, give it permission to track my location, and let it run background processes just to browse through a restaurant menu, buy a ticket, or scroll through a list of posts? At the end of the day, it is almost always just JSON being parsed and rendered. Yet, companies insist on rebuilding their basic content as native shells just to claim a permanent square of real estate on my home screen.
If a service is going to pull you out of the browser, it should at least offer a polished, native experience. But more often than not, the app you just downloaded is a compromise.
Anyone who endured the iOS-specific shader compilation jank in early Flutter apps [2] knows exactly how grating this can be (this specific bug was fixed 2023ish fwiw). Before they swapped Skia out for the Impeller engine, I had to capture and ship precompiled shaders with my apps just to stop the UI from stuttering the first time an animation ran.
The result is often the uncanny valley of user interfaces. It’s not broken, but it is subtly different, sometimes janky. The scroll velocity doesn’t quite match the rest of the OS. The swipe back gesture hesitates for a few milliseconds.
Human brains are remarkably good at detecting when a system’s timing is off. This is how the XZ backdoor was caught: an engineer noticed their SSH logins taking a fraction of a second longer than usual. It’s not that unique — my old FPS buddies could tell our server region just by firing a shot and feeling the lag. [3]
These micro interactions matter, because without that final layer of polish, the entire facade of a native experience falls apart. Not every app is like this, obviously, but enough of them are this way that it sours the entire experience.
When that full-screen modal pops up demanding you download the app to read the rest of a thread, users choose the path of least resistance. They download and they move on.
To a PM staring at an analytics dashboard, I’m an acceptable casualty, an inconsequential minority. If degrading the web version successfully funnels 80% of users into the App Store, that PM gets a promotion and a big pay bump. As always, actions follow the incentive. Our demographic is simply too small to factor into their quarterly metrics.
This is the enshittification loop in its full glory, working exactly as intended. A service builds its initial audience on the open web because it’s frictionless and indexable. Once the user base is sufficiently locked in, the web version is deliberately hobbled to force everyone into the native app. Once you’re inside the app, the walls close in: you are now a captive audience for a feed full of ads that your ad-blocker can no longer touch.
There is no financial incentive to maintain a stellar web experience anymore. The browser, once the great universal platform, is increasingly being reduced to a top-of-funnel marketing channel for the App Store. The depressing part of it is that the numbers prove it works.
...
Read the original on www.0xsid.com »
LÖVE is an awesome framework you can use to make 2D games in Lua. It’s free, open-source, and works on Windows, macOS, Linux, Android, and iOS.
We use our wiki for documentation. If you need further help, feel free to ask on our forums, our Discord server, or our subreddit.
We use the ‘main’ branch for development of the next major release, and therefore it should not be considered stable.
There are also branches for currently released major versions, which may have fixes and changes meant for upcoming patch releases within that major version.
We tag all our releases (since we started using mercurial and git), and have binary downloads available for them.
Experimental changes are sometimes developed in a separate love-experiments repository.
Files for releases are in the releases section on GitHub. The site has links to files and additional platform content for the latest release.
There are also unstable/nightly builds:
* Builds for some platforms are automatically created after each commit and are available through GitHub’s CI interfaces.
* For ubuntu linux they are in ppa:bartbes/love-unstable
* For arch linux there’s love-git in the AUR.
The test suite in testing/ covers all the LÖVE APIs, and tests them the same way developers use them. You can view current test coverage from any action.
You can run the suite locally like you would run a normal LÖVE project, e.g.:
love testing
See the readme in the testing folder for more info.
The best places to contribute are through the issue tracker and the official Discord server or IRC channel.
For code contributions, pull requests and patches are welcome. Be sure to read the source code style guide. Changes and new features typically get discussed in the issue tracker or on Discord or the forums before a pull request is made.
Follow the instructions at the megasource repository page.
Because in-tree builds are not allowed, the Makefiles needs to be generated in a separate build directory. In this example, folder named build is used:
Download or clone this repository and copy, move, or symlink the macOS/Frameworks subfolder into love’s platform/xcode/macosx folder and the shared subfolder into love’s platform/xcode folder.
Then use the Xcode project found at platform/xcode/love.xcodeproj to build the love-macosx target.
Download the love-apple-dependencies zip file corresponding to the LÖVE version being used from the Releases page, unzip it, and place the iOS/libraries subfolder into love’s platform/xcode/ios folder and the shared subfolder into love’s platform/xcode folder.
Or, download or clone this repository and copy, move, or symlink the iOS/libraries subfolder into love’s platform/xcode/ios folder and the shared subfolder into love’s platform/xcode folder.
Then use the Xcode project found at platform/xcode/love.xcodeproj to build the love-ios target.
See readme-iOS.rtf for more information.
...
Read the original on github.com »
Cloud AI APIs are great until they are not. Rate limits, usage costs, privacy concerns, and network latency all add up. For quick tasks like code review, drafting, or testing prompts, a local model that runs entirely on your hardware has real advantages: zero API costs, no data leaving your machine, and consistent availability.
Google’s Gemma 4 is interesting for local use because of its mixture-of-experts architecture. The 26B parameter model only activates 4B parameters per forward pass, which means it runs well on hardware that could never handle a dense 26B model. On my 14” MacBook Pro M4 Pro with 48 GB of unified memory, it fits comfortably and generates at 51 tokens per second. Though there’s significant slowdowns when used within Claude Code from my experience.
Google released Gemma 4 as a family of four models, not just one. The lineup spans a wide range of hardware targets:
The “E” models (E2B, E4B) use Per-Layer Embeddings to optimize for on-device deployment and are the only variants that support audio input (speech recognition and translation). The 31B dense model is the most capable, scoring 85.2% on MMLU Pro and 89.2% on AIME 2026.
Why I picked the 26B-A4B. The mixture-of-experts architecture is the key. It has 128 experts plus 1 shared expert, but only activates 8 experts (3.8B parameters) per token. A common rule of thumb estimates MoE dense - equivalent quality as roughly sqrt(total x active parameters), which puts this model around 10B effective. In practice, it delivers inference cost comparable to a 4B dense model with quality that punches well above that weight class. On benchmarks, it scores 82.6% on MMLU Pro and 88.3% on AIME 2026, close to the dense 31B (85.2% and 89.2%) while running dramatically faster.
The chart below tells the story. It plots Elo score against total model size on a log scale for recent open-weight models with thinking enabled. The blue-highlighted region in the upper left is where you want to be: high performance, small footprint.
Gemma 4 26B-A4B (Elo ~1441) sits firmly in that zone, punching well above its 25.2B parameter weight. The 31B dense variant scores slightly higher (~1451) but is still remarkably compact. For context, models like Qwen 3.5 397B-A17B (~1450 Elo) and GLM-5 (~1457 Elo) need 100-600B total parameters to reach similar scores. Kimi-K2.5 (~1457 Elo) requires over 1,000B. The 26B-A4B achieves competitive Elo with a fraction of the parameters, which translates directly into lower memory requirements and faster local inference.
This is what makes MoE models transformative for local use. You do not need a cluster or a high-end GPU rig to run a model that competes with 400B+ parameter behemoths. A laptop with 48 GB of unified memory is enough.
For local inference on a 48 GB Mac, this is the sweet spot. The dense 31B would consume more memory and generate tokens slower because every parameter participates in every forward pass. The E4B is lighter but noticeably less capable. The 26B-A4B gives you 256K max context, vision support (useful for analyzing screenshots and diagrams), native function/tool calling, and reasoning with configurable thinking modes, all at 51 tokens/second on my hardware.
LM Studio has been a popular desktop app for running local models for a while. Version 0.4.0 changed the architecture fundamentally by introducing llmster, the core inference engine extracted from the desktop app and packaged as a standalone server.
The practical result: you can now run LM Studio entirely from the command line using the lms CLI. No GUI required. This makes it usable on headless servers, in CI/CD pipelines, SSH sessions, or just for developers who prefer staying in the terminal.
* llmster daemon: a background service that manages model loading and inference without the desktop app
* Parallel request processing: continuous batching instead of sequential queuing, so multiple requests to the same model run concurrently
* Stateful REST API: a new /v1/chat endpoint that maintains conversation history across requests
# Linux/Mac
curl -fsSL https://lmstudio.ai/install.sh | bash
# Windows
irm https://lmstudio.ai/install.ps1 | iexlms daemon uplms runtime update llama.cpp
lms runtime update mlxlms get google/gemma-4-26b-a4b
The CLI shows you the variant it will download (Q4_K_M quantization by default, 17.99 GB) and asks for confirmation:
↓ To download: model google/gemma-4-26b-a4b - 64.75 KB
└─ ↓ To download: Gemma 4 26B A4B Instruct Q4_K_M [GGUF] - 17.99 GB
About to download 17.99 GB.
? Start download?
❯ Yes
No
Change variant selection
If you already have the model, the CLI tells you and shows the load command:
✔ Start download? yes
Model already downloaded. To use, run: lms load google/gemma-4-26b-a4blms lsYou have 10 models, taking up 118.17 GB of disk space.
LLM PARAMS ARCH SIZE DEVICE
gemma-3-270m-it-mlx 270m gemma3_text 497.80 MB Local
google/gemma-4-26b-a4b (1 variant) 26B-A4B gemma4 17.99 GB Local
gpt-oss-20b-mlx 20B gpt_oss 22.26 GB Local
llama-3.2-1b-instruct 1B Llama 712.58 MB Local
nvidia/nemotron-3-nano (1 variant) 30B nemotron_h 17.79 GB Local
openai/gpt-oss-20b (1 variant) 20B gpt-oss 12.11 GB Local
qwen/qwen3.5-35b-a3b (1 variant) 35B-A3B qwen35moe 22.07 GB Local
qwen2.5-0.5b-instruct-mlx 0.5B Qwen2 293.99 MB Local
zai-org/glm-4.7-flash (1 variant) 30B glm4_moe_lite 24.36 GB Local
EMBEDDING PARAMS ARCH SIZE DEVICE
text-embedding-nomic-embed-text-v1.5 Nomic BERT 84.11 MB Local
Worth noting: several of these models use mixture-of-experts architectures (Gemma 4, Qwen 3.5, GLM 4.7 Flash). MoE models punch above their weight for local inference because only a fraction of parameters activate per token.
Start a chat session with stats enabled to see performance numbers:
lms chat google/gemma-4-26b-a4b –stats ╭─────────────────────────────────────────────────╮
│ 👾 lms chat │
│ Type exit or Ctrl+C to quit │
│ Chatting with google/gemma-4-26b-a4b │
│ Try one of the following commands: │
│ /model - Load a model (type /model to see list) │
│ /download - Download a model │
│ /clear - Clear the chat history │
│ /help - Show help information │
With –stats, you get prediction metrics after each response:
Prediction Stats:
Stop Reason: eosFound
Tokens/Second: 51.35
Time to First Token: 1.551s
Prompt Tokens: 39
Predicted Tokens: 176
Total Tokens: 215
51 tokens/second on a 14” MacBook Pro M4 Pro (48 GB) with a 26B model is solid. Time to first token at 1.5 seconds is responsive enough for interactive use.
See what is currently loaded:
lms psIDENTIFIER MODEL STATUS SIZE CONTEXT PARALLEL DEVICE TTL
google/gemma-4-26b-a4b google/gemma-4-26b-a4b IDLE 17.99 GB 48000 2 Local 60m / 1h
The model occupies 17.99 GB in memory with a 48K context window and supports 2 parallel requests. The TTL (time-to-live) auto-unloads the model after 1 hour of idle time, freeing memory without manual intervention.
lms ps –json | jq
* “vision”: true and “trainedForToolUse”: true - Gemma 4 supports both image input and tool calling
* “maxContextLength”: 262144 - the model supports up to 256K context, though the default load is 48K
Before loading a model, you can estimate memory requirements at different context lengths using –estimate-only. I wrote a small script to test across the full range:
The base model takes about 17.6 GiB regardless of context. Each doubling of context length adds roughly 3-4 GiB. At the default 48K context, you need about 21 GiB. On my 48 GB MacBook Pro, I can push to the full 256K context at 37.48 GiB and still have about 10 GB free for the OS and other apps. A 36 GB Mac could comfortably run 200K context with headroom.
lms load google/gemma-4-26b-a4b –estimate-only –context-length 48000
Model: google/gemma-4-26b-a4b
Context Length: 48,000
Estimated GPU Memory: 21.05 GiB
Estimated Total Memory: 21.05 GiB
Estimate: This model may be loaded based on your resource guardrails settings.
This is useful for capacity planning. If you want to run Gemma 4 alongside other applications, check the estimate at your target context length first.
Here is the full script I used to generate the table above. You can swap in any model name and context length list to profile a different model:
#!/usr/bin/env bash
model=“google/gemma-4-26b-a4b”
contexts=(4096 8000 16000 24000 32000 48000 64000 96000 128000 200000 256000)
table_contexts=()
table_gpu=()
table_total=()
for ctx in “${contexts[@]}”; do
output=“$(lms load “$model” –estimate-only –context-length “$ctx” 2>&1)”
parsed_context=“$(printf ‘%s\n’ “$output” | awk -F’: ′ ‘/^Context Length:/ {print $2; exit}’)”
parsed_gpu=“$(printf ‘%s\n’ “$output” | awk -F’: +′ ‘/^Estimated GPU Memory:/ {print $2; exit}’)”
parsed_total=“$(printf ‘%s\n’ “$output” | awk -F’: +′ ‘/^Estimated Total Memory:/ {print $2; exit}’)”
table_contexts+=(“${parsed_context:-$ctx}“)
table_gpu+=(“${parsed_gpu:-N/A}“)
table_total+=(“${parsed_total:-N/A}“)
done
printf ‘| Model | Context Length | GPU Memory | Total Memory |\n’
printf ‘|–-|–-:|–-:|–-:|\n’
for i in “${!table_contexts[@]}”; do
printf ‘| %s | %s | %s | %s |\n’ \
“$model” “${table_contexts[$i]}” “${table_gpu[$i]}” “${table_total[$i]}”
done
...
Read the original on ai.georgeliu.com »
* This report does NOT contain sensitive information (API keys, passwords, etc.)
Claude has regressed to the point it cannot be trusted to perform complex engineering.
Does the opposite of requested activities
Claude should behave like it did in January.
Accept Edits was ON (auto-accepting changes)
Yes, every time with the same prompt
Produced by claude based on my extensive data - if there’s any issues, it’s because anthropic doesn’t let claude think anymore ;) Unfortunately claude deleted my January logs containing a bulk of my work so only summary analysis is available - January was what I expect, Febuary started sliding, and March was a complete and utter loss.
Quantitative analysis of 17,871 thinking blocks and 234,760 tool calls across
6,852 Claude Code session files reveals that the rollout of thinking content
redaction (redact-thinking-2026-02-12) correlates precisely with a measured
quality regression in complex, long-session engineering workflows.
The data suggests that extended thinking tokens are not a “nice to have” but
are structurally required for the model to perform multi-step research,
convention adherence, and careful code modification. When thinking depth is
reduced, the model’s tool usage patterns shift measurably from research-first
to edit-first behavior, producing the quality issues users have reported.
This report provides data to help Anthropic understand which workflows are
most affected and why, with the goal of informing decisions about thinking
token allocation for power users.
The quality regression was independently reported on March 8 — the exact date
redacted thinking blocks crossed 50%. The rollout pattern (1.5% → 25% → 58% →
100% over one week) is consistent with a staged deployment.
The signature field on thinking blocks has a 0.971 Pearson correlation
with thinking content length (measured from 7,146 paired samples where both
are present). This allows estimation of thinking depth even after redaction.
Thinking depth had already dropped ~67% by late February, before redaction
began. The redaction rollout in early March made this invisible to users.
These metrics were computed independently from 18,000+ user prompts before
the thinking analysis was performed.
A stop hook (stop-phrase-guard.sh) was built to programmatically catch
ownership-dodging, premature stopping, and permission-seeking behavior.
It fired 173 times in 17 days after March 8. It fired zero times before.
Analysis of 234,760 tool invocations shows the model stopped reading code
before modifying it.
The model went from 6.6 reads per edit to 2.0 reads per edit — a 70%
reduction in research before making changes.
In the good period, the model’s workflow was: read the target file, read
related files, grep for usages across the codebase, read headers and tests,
then make a precise edit. In the degraded period, it reads the immediate
file and edits, often without checking context.
The decline in research effort begins in mid-February — the same period when
estimated thinking depth dropped 67%.
Full-file Write usage doubled — the model increasingly chose to rewrite
entire files rather than make surgical edits, which is faster but loses
precision and context awareness.
* 191,000 lines merged across two PRs in a weekend during the good period
Extended thinking is the mechanism by which the model:
* Plans multi-step approaches before acting (which files to read, what order)
* Catches its own mistakes before outputting them
* Decides whether to continue working or stop (session management)
When thinking is shallow, the model defaults to the cheapest action available:
edit without reading, stop without finishing, dodge responsibility for failures,
take the simplest fix rather than the correct one. These are exactly the
symptoms observed.
Transparency about thinking allocation: If thinking tokens are being
reduced or capped, users who depend on deep reasoning need to know. The
redact-thinking header makes it impossible to verify externally.
A “max thinking” tier: Users running complex engineering workflows
would pay significantly more for guaranteed deep thinking. The current
subscription model doesn’t distinguish between users who need 200 thinking
tokens per response and users who need 20,000.
Thinking token metrics in API responses: Even if thinking content is
redacted, exposing thinking_tokens in the usage response would let users
monitor whether their requests are getting the reasoning depth they need.
Canary metrics from power users: The stop hook violation rate
(0 → 10/day) is a machine-readable signal that could be monitored across
the user base as a leading indicator of quality regressions.
The following behavioral patterns were measured across 234,760 tool calls and
18,000+ user prompts. Each is a predictable consequence of reduced reasoning
depth: the model takes shortcuts because it lacks the thinking budget to
evaluate alternatives, check context, or plan ahead.
When the model has sufficient thinking budget, it reads related files, greps
for usages, checks headers, and reads tests before making changes. When
thinking is shallow, it skips research and edits directly.
One in three edits in the degraded period was made to a file the model had
not read in its recent tool history. The practical consequence: edits that
break surrounding code, violate file-level conventions, splice new code into
the middle of existing comment blocks, or duplicate logic that already exists
elsewhere in the file.
Spliced comments are a particularly visible symptom. When the model edits
a file it hasn’t read, it doesn’t know where comment blocks end and code
begins. It inserts new declarations between a documentation comment and the
function it documents, breaking the semantic association. This never happened
in the good period because the model always read the file first.
When thinking is deep, the model resolves contradictions internally before
producing output. When thinking is shallow, contradictions surface in the
output as visible self-corrections: “oh wait”, “actually,”, “let me
reconsider”, “hmm, actually”, “no wait.”
The rate more than tripled. In the worst sessions, the model produced 20+
reasoning reversals in a single response — generating a plan, contradicting
it, revising, contradicting the revision, and ultimately producing output
that could not be trusted because the reasoning path was visibly incoherent.
The word “simplest” in the model’s output is a signal that it is optimizing
for the least effort rather than evaluating the correct approach. With deep
thinking, the model evaluates multiple approaches and chooses the right one.
With shallow thinking, it gravitates toward whatever requires the least
reasoning to justify.
In one observed 2-hour window, the model used “simplest” 6 times while
producing code that its own later self-corrections described as “lazy and
wrong”, “rushed”, and “sloppy.” Each time, the model had chosen an approach
...
Read the original on github.com »
Try adjusting your search terms or filters
...
Read the original on playlists.at »
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.