10 interesting stories served every morning and every evening.

Just a moment...

www.midjourney.com

Just a moment...

www.midjourney.com

Local Qwen isn't a worse Opus, it's a different tool

blog.alexellis.io

We’ve all heard peo­ple say that lo­cal Qwen 27B or 35-A3B is near-Opus level”, but I have re­ceipts from a soft­ware busi­ness and open source pro­jects, and am here to be trans­par­ent with you.

This post is long-form for a rea­son. It’s not a cur­sory glance, an un­sub­stan­ti­ated claim on X about can­celling Claude Max, or a hob­by­ist re­port from a model run­ning at sin­gle-digit to­kens per sec­ond with a 32K con­text win­dow. It is­n’t writ­ten by a fa­mous CEO tweet­ing about cod­ing from an air­plane. It’s my jour­ney as a founder in a small soft­ware busi­ness, where lo­cal mod­els have pro­duced real, caveated value. I have skin in the game, but no in­cen­tive to push ei­ther cloud or lo­cal mod­els, and a strong de­sire for lo­cal mod­els to be­come ca­pa­ble and re­li­able.

This post is long-form for a rea­son. It’s not a cur­sory glance, an un­sub­stan­ti­ated claim on X about can­celling Claude Max, or a hob­by­ist re­port from a model run­ning at sin­gle-digit to­kens per sec­ond with a 32K con­text win­dow. It is­n’t writ­ten by a fa­mous CEO tweet­ing about cod­ing from an air­plane.

It’s my jour­ney as a founder in a small soft­ware busi­ness, where lo­cal mod­els have pro­duced real, caveated value. I have skin in the game, but no in­cen­tive to push ei­ther cloud or lo­cal mod­els, and a strong de­sire for lo­cal mod­els to be­come ca­pa­ble and re­li­able.

I’ll cover how the card paid for it­self in the first two or three months, how it keeps serv­ing our spe­cific busi­ness use case, why I still can’t trust it un­su­per­vised, and Qwen’s worst trait: the in­fi­nite loops and hal­lu­ci­na­tion risk. These show up most when you quan­tize it down to fit a con­sumer GPU.

Figuring out the power con­nec­tors for the RTX 6000 Pro

Figuring out the power con­nec­tors for the RTX 6000 Pro

On my use case for AI

My jour­ney as a main­tainer and founder started with OpenFaaS - built com­pletely by hand, as was all soft­ware in 2016 up un­til re­cently. That meant lay­ing down the core of the pro­ject on my own, then invit­ing oth­ers to par­tic­i­pate through com­mu­nity - not be­cause I could­n’t do it on my own, but be­cause my goal was to build a suc­cess­ful open source pro­ject. Around 2017 I tried to fund my time by join­ing VMware, and in 2019 af­ter changes in the mar­ket, I needed a way to fund the work my­self, so moved to­wards open-core and built a boot­strapped com­pany. Today our small team main­tains OpenFaaS, SlicerVM - AI sand­boxes and the miss­ing API for Linux”, Actuated.com - self-hosted CI run­ners for GitHub/GitLab, and Inlets.com - self-hosted HTTP/TCP tun­nels.

These prod­ucts are built around low-level in­fra­struc­ture and Linux prim­i­tives: con­tain­ers, Firecracker mi­croVMs, net­work pro­to­cols, tun­nels, CLIs, and Kubernetes. If you squint, they’re all opin­ion­ated in­fra­struc­ture prod­ucts fo­cused on: ef­fi­ciency, user-ex­pe­ri­ence, con­trol and au­ton­omy. They’re writ­ten in Go, and some have React-based UI com­po­nents, land­ing pages, docs, agent skills, and CLIs. Along with the code, we also pro­vide the best-in-class sup­port, be­cause we are lean and will­ing to do things that don’t scale to help cus­tomers.

I’ve been us­ing AI tools for as long as they’ve been avail­able - from tab com­ple­tion in VS Code in the early days, through to get­ting ChatGPT to gen­er­ate chunks of code, or find bugs, to liv­ing in tmux 12 hours per day. I found my­self in tmux so much of the time that I wrote a free tool Superterm.dev to keep track of my ses­sions, notes, and to get vi­sual feed­back from cod­ing agents. Over that time, I’ve seen the ca­pa­bil­i­ties go from reduce boil­er­plate” to design, ar­chi­tect, and test end to end”. It’s Claude or Codex that do the ma­jor­ity of my work, and whilst I in­sist on do­ing my own writ­ing, I rarely write code by hand - as much as it pains me to say that.

A turn­ing point for fron­tier in­tel­li­gence

I’d say it was roughly be­tween November 2025 and January 2026 that we saw a turn­ing point. Many de­vel­op­ers on X started to es­pouse Claude Opus as hav­ing changed and how it was now ca­pa­ble of do­ing all of their work. Manual cod­ing turned bad as quickly as milk sours left out the fridge. The costs of the top-end cod­ing plans set­tled at roughly 200 USD / mo for in­di­vid­u­als. A real num­ber, but tol­er­a­ble for the value they gen­er­ated. Even to­day, if you avoid too much un­at­tended work, you can make it last through the 5 hour limit, and weekly limit if you’re care­ful.

What makes lo­cal mod­els in­ter­est­ing

There’s an ar­gu­ment that says: Why use any­thing less than the best you can af­ford?”

There’s an ar­gu­ment that says: Why use any­thing less than the best you can af­ford?”

The year of 2026 cer­tainly is a new fron­tier: we find our­selves in a place where any idea can be cloned overnight by some­one you’ve never heard of with a sub­scrip­tion in a de­vel­op­ing na­tion. I’ve seen it hap­pen to our SlicerVM prod­uct (originally writ­ten by hand in 2022) and Superterm (new in 2026, 100% writ­ten by cod­ing agents). It’s not to say that a vibecoded clone is a 100% equiv­a­lent of a well en­gi­neered and ar­chi­tected so­lu­tion with an ex­pe­ri­enced team sup­port­ing it, but a mar­ket where the cost of soft­ware went to nil - free and good enough can be all that mat­ters.

So in such a com­pet­i­tive land­scape, why limit your­self to some­thing that’s worse? Isn’t that an op­por­tu­nity cost? Isn’t that risk­ing your liveli­hood?

There are es­ti­mates that the lead­ing mod­els con­tain be­tween 0.5 – 2T pa­ra­me­ters. That’s not just marginally more” or a few times more” than the best in class for lo­cal hard­ware - that’s on a dif­fer­ent level. The pa­ra­me­ter count is a rough proxy for ca­pac­ity, knowl­edge, and rea­son­ing abil­ity. Yet some­how, even a tiny dense model like Qwen 3.6 27B is able to score a rep­utable bench­mark of 77.2 on SWE-Bench Verified vs 88.6% from Claude Opus 4.8.

So you could be for­given for tak­ing to X and shout­ing loudly that local is only 12% be­hind SOTA and many have, in­clud­ing en­gag­ing one-shot­ted demos of space in­vaders. You may go as far as claim­ing that a sin­gle 6-year old GPU can re­place your 200 USD / mo ChatGPT Pro sub­scrip­tion, and in­deed many have made that claim.

Benchmaxxing

Benchmarks are a mov­ing tar­get, and since they’re widely avail­able, it’s pos­si­ble to ed­u­cate and tune a model to ob­tain a higher score than they would oth­er­wise on these tests. The clas­sic SWE-Bench Verified bench­mark is based upon a set of Python is­sues across a num­ber of Open Source pro­jects. Python has threads, and async, how­ever most code you run into is sin­gle-threaded and syn­chro­nous. In con­trast, we write dis­trib­uted sys­tems in Go, where chan­nels, con­texts, and structs span across a large ex­e­cu­tion do­main.

Cost

There’s a very pop­u­lar take local mod­els aren’t about cost” and that comes from a po­si­tion of priv­i­lege. Individuals can use cod­ing plans that pro­vide high amounts of us­age through a work­ing day for 200 USD / mo. On that ba­sis, you are get­ting SOTA level in­tel­li­gence, the best chance of some­thing work­ing and be­ing of qual­ity, of find­ing that bug, or gen­er­at­ing that land­ing page.

Coding plans are clearly sub­sidised, just look at what hap­pened to GitHub Copilot plans. They started off by giv­ing away 1500 re­quests for 39 USD / mo and you could make that last a very long time for pen­nies. Something that was undis­closed changed at GitHub/Microsoft/Azure, and they moved every­one over to to­ken-based pric­ing and the back­lash was huge. The true cost had been hid­den for so long, we’d be­come ac­cus­tomed to it.

Now, if you’re pay­ing for to­kens on API rates, the break­ing point comes sooner than many of us re­alise. Recently, Uber capped spend to 1500 USD / mo per de­vel­oper per tool. The me­dian salary at Uber is 330k USD an­nu­ally, so if a de­vel­oper used two tools to the max­i­mum ex­tent, it’s roughly 12% of their an­nual com­pen­sa­tion.

So for heavy use, loops, agen­tic analy­sis, in-prod­uct ca­pa­bil­i­ties de­ployed through SaaS sys­tems, open weight, or lo­cal mod­els can pro­vide se­ri­ous value. It’s not fair to rule out cost, but for many it’s not about that.

Sovereignty and pri­vacy

We work with var­i­ous en­ter­prise cus­tomers that take data con­trols very se­ri­ously. If you squint at our prod­uct line, we’re all about pri­vacy and sov­er­eignty. OpenFaaS runs func­tions on your in­fra­struc­ture, with your lim­its and pre­ferred lan­guages, and events. SlicerVM runs mi­croVMs not on some ab­stracted cloud-based bare-metal, but on your own kit, even your MacBook. Inlets runs tun­nels where you can con­trol the tun­nel client and server with 100% pri­vacy. Actuated takes the ar­du­ous parts of GitHub Actions away and says install an agent on your ma­chines and for­get about it”.

So nat­u­rally, we are drawn to lo­cal mod­els - both from our core val­ues and be­liefs about how the Internet should be, but through oblig­a­tions.

You may not hold these be­liefs, you may not han­dle any cus­tomer data, but if you live out­side of the US, the re­moval of Anthropic’s Fable 5 model overnight might have come as a shock. In other words, there is se­ri­ous ven­dor risk, and many of us are ad­dicted to the source.

Local mod­els are the so­lu­tion to What if the fron­tier labs do X?”

Tempering the blade

I said that lo­cal mod­els are not the same tool as SOTA. What did I mean by that?

I build fur­ni­ture us­ing hand tools, and oc­ca­sion­ally just like I’ll re­lease an open source pro­ject to scratch an itch, I’ll make an edge tool like a chisel, a groov­ing plane blade, a scratch awl, a Sloyd knife for carv­ing.

Tempering a Japanese style mark­ing knife on the back of a heated file, un­til it hits straw colour.

Tempering a Japanese style mark­ing knife on the back of a heated file, un­til it hits straw colour.

There are two ways to work with steel de­pend­ing on how much you can in­vest. Forging is tak­ing a raw piece of steel, heat­ing it up and smash­ing it with a ham­mer into the form you need. It’s seen as the most pure and ho­n­ourable way to work - the real way”. Then for smaller items, stock re­moval” is much more ap­proach­able. It in­volves tak­ing sheet steel, cut­ting out a shape and grind­ing in a bevel or a point.

But that’s just the shap­ing. You then have to heat the steel up, and quench it in oil or wa­ter. This makes the steel be­come ex­tremely hard, so hard that if you dropped it - it would shat­ter into pieces. So we have to scrub off the black scum, and heat it up again, watch­ing for a rain­bow of colours. If we go one shade past where we need, we have to start the heat treat­ing all over again.

Our team’s ex­pe­ri­ence of lo­cal mod­els is ex­actly like miss­ing the tem­per colours. The model is run­ning so hot, that it shoots past the goal and starts loop­ing. Nothing can fix it, other than clos­ing down the har­ness and hop­ing the cleared con­text will give a dif­fer­ent re­sult.

I’d never leave a blade tem­per­ing un­at­tended, just like I’d never leave Qwen 3.6 27B work­ing on a long hori­zon task. For steel the workaround is us­ing a kiln, or tem­per­a­ture con­trolled oven to re­move vari­abil­ity.

That Sloyd knife we forged could be used to knock in nails, but you’re likely to cut your hands and ruin the edge at the same time. Let’s go back to the start, if it’s a dif­fer­ent tool, what is it good for?

What I was look­ing for

I was look­ing for all of the things we cov­ered in the pre­vi­ous sec­tion: pri­vacy, fixed costs and pro­tec­tion against ven­dor risk. Where I got and con­tinue to get let down is where I treat a lo­cal model in­side open­code in the same way I treat Claude or Codex. It’s al­most creepy how long they can work fully un­at­tended whilst mak­ing real progress to­wards a goal.

I can paste in some­thing like: Eoin told me he has been run­ning Slicer VMs in a loop and ran out of FDs. He sus­pects VSock” and then af­ter a cou­ple of min­utes Claude replies Now I see the full pic­ture: You’re do­ing X, you need to do Y”. I say do it and test it end to end on my mini PC and af­ter any pe­riod of time - 5 or 15 min­utes, I can raise a PR, have it code re­viewed au­to­mat­i­cally, and then tell Claude to read it and it­er­ate again.

It’s a won­der­fully ef­fi­cient loop for a small team like us that man­ages mul­ti­ple prod­ucts and works very closely with en­ter­prise and com­mu­nity users.

Sharp lessons from a 3090

I started off with a sin­gle 3090 card in 2023, and quickly re­alised I needed an­other to be able to load mod­els and have suf­fi­cient con­text. Nothing about lo­cal mod­els from 2023 is worth cov­er­ing here, other than they were so hard to use that I gave up on them. Qwen 3.5 was the first time I saw real work be­ing done by agents.

I could load a model into ei­ther card in Q4 quan­ti­za­tion with 200k con­text (also quan­tized) and get it to do small tasks, when guided. I still re­mem­ber how quickly that went south. I told the model Explore this ma­chine from every an­gle, com­plete a foren­sic re­port on the ma­chine and how it’s used” - Claude would have shrugged that off. Qwen started read­ing every sin­gle file on my ma­chine one by one, filled its con­text, then hal­lu­ci­nated the file­names and even tool calls ~/faas-netes be­came ~/faaned. Stepping back, I was able to get a re­ally lu­cid re­port by scop­ing the task Take a quick look around this ma­chine, tell me who uses it and what for” and that ran at roughly 40 – 50 to­kens per sec­ond (generation).

A 27B model sim­ply does­n’t fit at full fi­delity into 1x 3090 card, so the knobs and di­als are: com­pres­sion level of the mod­el’s weights (quantization), length of the con­text, and com­pres­sion level of the keys and val­ues of the con­text.

There’s a well known rule of thumb that bad things start hap­pen­ing at Q4_0 on the keys part of the KV cache. The most ag­gres­sive I’ve ever been is Q8_0 for keys and Q4_0 for val­ues.

The 3090s were a con­stant source of headaches - I had to quan­tize well be­low where I was com­fort­able. One of the cards would only show up if I crossed my fin­gers when turn­ing it on. Even re­boots would­n’t cure it - I had to A/C power off and re­move the power ca­ble each time for 30 sec­onds.

My lat­est ex­per­i­ment was set­ting up vLLM (the gold stan­dard for pro­duc­tion and con­cur­rent serv­ing) and even with an NVLink (175GBP) and ten­sor par­al­lelism turned on, it was 3 to­kens/​sec­ond slower than llama.cpp dur­ing gen­er­a­tion for an equiv­a­lent setup. With vLLM, we still saw loop­ing, and load­ing the weights took a few min­utes rather than sin­gle-digit sec­onds.

vLLM is the right choice for pro­duc­tion-scale serv­ing with con­tin­u­ous batch­ing and many con­cur­rent users. But in a pro­sumer setup like ours, the trade-off is more nu­anced. We’re not try­ing to re­place Claude Max sub­scrip­tions for a team of five; we’re try­ing to get fast, re­li­able in­fer­ence for a small num­ber of known work­flows, where startup time, sim­plic­ity, and sin­gle-user la­tency mat­ter more than ag­gre­gate through­put.

I was spend­ing more time on mak­ing them work than the re­sults.

Big spender

We of­fer sup­port con­tracts to en­ter­prise com­pa­nies us­ing our prod­ucts, and when a ticket comes in we are in­cen­tivised to re­solve it as soon as rea­son­ably pos­si­ble. I thought that get­ting a card that would make all the nig­gles go away would fix lo­cal mod­els, and cus­tomer sup­port was worth the risk.

We dropped around 12000 USD on an RTX 6000 Pro Blackwell edi­tion with 96GB of VRAM. Even a cou­ple of months on, the price has in­creased to around 15400 USD so adding a sec­ond be­comes much harder to jus­tify. You can’t just slot an­other card in” to a con­sumer ma­chine. There are many con­cerns from PCI lanes, to band­width, to card spac­ing, and the draw on the PSU.

It was a cal­cu­lated bet, and it has paid off, but not be­cause it re­places our Claude sub­scrip­tions - it can’t do that.

Painless cus­tomer sup­port, with­out leak­ing cus­tomer data

Many op­er­a­tors at en­ter­prise com­pa­nies are highly ca­pa­ble and skilled, but they’re held back by man­ual pro­ce­dures and prac­tices. Sometimes you’re lucky and some­one will work through every point in a trou­bleshoot­ing guide and tell you what they got wrong. Other times, you’re 150 replies deep into an email chain and they’ve still not run that one com­mand that would an­swer it all.

So we wrote diag” a CLI tool that is easy for op­er­a­tors to run and that cap­tures a com­plete snap­shot of an OpenFaaS in­stal­la­tion on Kubernetes. They can then email this dump to us and we can run it through an air­gapped lo­cal model, in an ephemeral VM cre­ated by Slicer. You can read more about the is­sues we found in Introducing: Painless sup­port and hands-off ar­chi­tec­ture re­views over on the OpenFaaS blog.

Revenue re­cov­ery

A re­newal came up re­cently, and only be­cause I fed the teleme­try data­base into a lo­cal model, did we find out they’d been un­der-re­port­ing li­censes and un­der-pay­ing by about 4 – 5x for over 12 months. That rev­enue re­cov­ery alone paid for the card.

There’s no way I would have in good con­science ran the teleme­try dump or a cus­tomer’s diag out­put through any cloud plan, re­gard­less of their stance on data re­ten­tion. This is a good time for me to cover near- and far-east cod­ing plans - caveat emp­tor - I’m yet to find one that does­n’t take a priv­i­leged po­si­tion on your IP - train­ing and own­er­ship rights for in­puts and out­puts. ChatGPT Pro and Claude Max can be con­fig­ured for a 30 day re­ten­tion pe­riod, but even that level likely in­val­i­dates your con­tracts with cus­tomers.

Sometimes I’ve given GPT or Opus the schema for the teleme­try table and had it write an AGENTS.md that the lo­cal model is most likely to fol­low. Our data is re­ported sev­eral times per day, from mul­ti­ple high-avail­abil­ity repli­cas, so it can’t just be summed up across a 24 hour pe­riod. With ear­lier it­er­a­tions of the model, I saw it fail at arith­metic - 27.3K counted as 273,000. It was only be­cause I was thor­oughly check­ing its work that I caught it out.

Another time, the model in­ferred a cus­tomer was likely to churn be­cause they had a small num­ber of func­tions. It com­pletely ig­nored that the cus­tomer ran that smaller num­ber of func­tions many times per day. So of­ten it’s bet­ter to have them fo­cus on analy­sis, not in­ter­pre­ta­tion.

Our cur­rent setup

I’m a big sup­porter of folks like Jack Rong and Kyle Hessling who have worked on fine-tunes of open weight mod­els like Qwen. Qwopus at­tempts to layer Chain of Thought traces on top of Qwen to make it bet­ter at rea­son­ing and cod­ing. They do this to help the com­mu­nity and be­cause of a deep be­lief in lo­cal AI.

In our team we run both the lat­est gen­er­a­tion of Qwopus, and the base 27B Qwen 3.6 model on the RTX 6000 rig. Over time this changes - as new fine­tunes come out, as new point re­leases of Qwen drop and as we land upon new edge-cases and lim­i­ta­tions. Up un­til very re­cently, we ran with think­ing turned off com­pletely, and have only re­cently added it back in which co­in­cided with see­ing more loop­ing.

The mod­els are served by two in­de­pen­dent llama.cpp in­stances, which means they re­tain full con­text length. The de­fault an­swer to concurrency” is to run –parallel 2 but this halves the avail­able con­text.

$ nvidia-smi Wed Jun 17 11:56:03 2026 +––––––––––––––––––––––––––––––––––––––––––––-+ | NVIDIA-SMI 590.48.01 Driver Version: 590.48.01 CUDA Version: 13.1 | +––––––––––––––––––––-+––––––––––––+–––––––––––+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA RTX PRO 6000 Blac… Off | 00000000:01:00.0 Off | Off | | 30% 32C P8 15W /  600W | 85937MiB /  97887MiB | 0% Default | | | | N/A | +––––––––––––––––––––-+––––––––––––+–––––––––––+

+––––––––––––––––––––––––––––––––––––––––––––-+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 2265 C …ma.cpp/​build/​bin/​llama-server 31198MiB | | 0 N/A N/A 2544 C …ma.cpp/​build/​bin/​llama-server 54718MiB | +––––––––––––––––––––––––––––––––––––––––––––-+

llama.cpp is built from source and kept up to date weekly, or as re­quired. The build from source is re­quired in or­der to add sup­port for Nvidia GPUs.

Here’s our com­mand for a sin­gle in­stance of Qwen with full con­text length and full qual­ity con­text.

#!/bin/bash ~/llama.cpp/​build/​bin/​llama-server \ -hf un­sloth/​Qwen3.6 – 27B-MTP-GGUF:UD-Q8_K_XL \ –alias Qwen3.6 – 27B-Base \ –host 0.0.0.0 \ –port 8085 \ -ngl 99 \ -c 262144 \ –cache-type-k f16 \ –cache-type-v f16 \ –flash-attn on \ –parallel 1 \ –threads 16 \ -b 4096 \ -ub 2048 \ –jinja \ –reasoning-budget 2048 \ –temperature 0.6 \ –top-p 0.95 \ –top-k 20 \ –min-p 0.0 \ –presence-penalty 1.1 \ –reasoning on \ –spec-type draft-mtp \ –spec-draft-n-max 6 \ –chat-template-kwargs {“preserve_thinking”: true}’ \ –chat-template-file chat_tem­plate.jinja \ –reasoning-budget-message reasoning bud­get con­sumed, time to an­swer now”

We get about a 93% ac­cep­tance rate on our spec­u­la­tive de­cod­ing from MTP, and the speed in­creases from a sta­ble 67 tok/​s to 130 – 200 tok/​s sus­tained over long pe­ri­ods. It feels faster than us­ing a cloud model.

It’s im­por­tant to fol­low the in­struc­tions from the model card when tun­ing llama.cpp. There are of­ten rea­sons why a cer­tain tem­per­a­ture has been se­lected by the lab. For in­stance, with the Qwopus fine-tune, it works best with think­ing turned off and the tem­per­a­ture re­ally hot at 0.85 – 1.0.

About that loop­ing

Recently I’ve been tun­ing it to try to avoid loop­ing, goes back to that tem­per­ing anal­ogy. You can’t just leave this model to work on long hori­zon tasks.

I asked Qwen what com­mands we should add to faas-cli, and it came back with some rea­son­able sug­ges­tions, but got stuck and kept re­peat­ing them over and over, burn­ing 600W of my elec­tric­ity for a good half an hour.

58. faas-cli func­tion im­port - Import func­tions from a YAML file or URL. 59. faas-cli func­tion ex­port - Export de­ployed func­tions back to a stack.yaml file. 60. faas-cli func­tion scale - Manually scale func­tion repli­cas with­out re­de­ploy­ing. 61. faas-cli func­tion re­name - Rename a func­tion in-place. 62. faas-cli func­tion diff - Compare lo­cal stack.yaml with what’s de­ployed - show dif­fer­ences.

63. faas-cli func­tion im­port - Import func­tions from a YAML file or URL. 64. faas-cli func­tion ex­port - Export de­ployed func­tions back to a stack.yaml file. 65. faas-cli func­tion scale - Manually scale func­tion repli­cas with­out re­de­ploy­ing. 66. faas-cli func­tion re­name - Rename a func­tion in-place. 67. faas-cli func­tion diff - Compare lo­cal stack.yaml with what’s de­ployed - show dif­fer­ences.

68. faas-cli func­tion im­port - Import func­tions from a YAML file or URL. 69. faas-cli func­tion ex­port - Export de­ployed func­tions back to a stack.yaml file. 70. faas-cli func­tion scale - Manually scale func­tion repli­cas with­out re­de­ploy­ing. 71. faas-cli func­tion re­name - Rename a func­tion in-place. 72. faas-cli func­tion diff - Compare lo­cal stack.yaml with what’s de­ployed - show dif­fer­ences.

Build · Qwen3.6 – 27B-Base toil­gate

The same thing hap­pened when I asked it to add –json to all get and list com­mands” - it was con­vinc­ing for the first one or two and even wrote tests.

Then be­cause –json is ma­chine read­able, faas-cli needed to stop print­ing warn­ings about in­se­cure TLS when us­ing a http:// re­mote end­point. Qwen could­n’t work out how to do this so I told it to write a re­verse proxy in Python and call that in­stead. The first ver­sion looked plau­si­ble but had bad in­dent­ing. When it re­alised the is­sue, it cor­rupted the file, and kept com­plain­ing that it did­n’t know how to fix it and was stuck in a dif­fer­ent kind of loop. It just would­n’t give up, but went pro­gres­sively off the rails.

Han from my team has re­ported very sim­i­lar loop­ing - mostly the sec­ond kind. The model or agent is stuck, at the edge of its abil­ity and won’t ask for help. For me, I’ve mainly hit the for­mer, which is ar­guably worse and means I rarely trust it be­yond the teleme­try and diag work for cus­tomer sup­port/​re­newals.

Measuring and dis­trib­ut­ing ac­cess

To be­gin with, I set up a sin­gle in­lets tun­nel and hoped the agents would­n’t clash. Two agents hit­ting the same llama.cpp in­stance with un­re­lated con­texts means each re­quest in­val­i­dates the oth­er’s cached pre­fix — so the full prompt gets re-processed from scratch every time, a thrash­ing la­tency you don’t want to feel of­ten. We were still do­ing most work on cod­ing plans then, so it was­n’t yet a real prob­lem.

Distributing that setup was sim­ple: edit open­code.json and add the URL and to­ken, then copy that file onto your var­i­ous ma­chines or Slicer VMs.

But as soon as an­other per­son uses the model, it stops be­ing a pro­to­type. Who’s on which llama.cpp in­stance? How much are they us­ing? Which model? What has that cost us in elec­tric­ity? What hap­pens if that per­son leaves the team? How do we add in an­other model for the team?

Toilgate is 100% vibe-coded and too much work to open source. If you like the idea, feel free to make your own.

Toilgate is 100% vibe-coded and too much work to open source. If you like the idea, feel free to make your own.

Rather than man­u­ally edit­ing my open­code.json file, and send­ing that to var­i­ous team mates, I de­cided to write a provider for open­code. It would man­age the avail­able mod­els from the sta­ble base through to more ex­per­i­men­tal Qwopus vari­ants that were quan­tized. Just run open­code - go to the model picker and se­lect toil­gate then what­ever you want to use.

Two Shelly Plus Plugs are mon­i­tor­ing the power con­sump­tion at the wall to give me a bet­ter idea of ac­tual costs. The RTX 6000 Pro will pull 600W dur­ing in­fer­ence and is rel­a­tively quiet, the two 3090s are closer to 750W com­bined and ex­tremely noisy.

The wrong com­par­i­son

The trap once you can mea­sure is com­par­ing the in­put/​out­put costs per mil­lion to­kens to OpenAI’s API pric­ing for GPT-5.5. That’s the wrong com­par­i­son for the cur­rent ca­pa­bil­ity. It’s more about un­der­stand­ing the on­go­ing costs, which I’m bear­ing per­son­ally since the ma­chine is in my house, for work that’s not suit­able for a cloud model.

This is where local AI turns into an op­er­a­tions prob­lem. You need iden­tity, ac­cess con­trol, me­ter­ing, quo­tas, model rout­ing and power mon­i­tor­ing. The harder part we keep com­ing back to is the re­li­a­bil­ity of the agent/​model com­bi­na­tion, keep­ing up with in­no­va­tions like MTP, and en­sur­ing enough up­time for peo­ple who have started to de­pend on the model be­ing avail­able.

Wrapping up

Only 16 percent of Americans think AI will have a positive impact on society, a new study shows

techcrunch.com

Despite the fact that AI in­creas­ingly dom­i­nates our econ­omy (it’s a hot IPO sum­mer and we’re all just along for the ride), most Americans are not par­tic­u­larly op­ti­mistic about the tech­nol­o­gy’s long-term im­pact on the coun­try, a new study from Pew Research re­veals.

In fact, al­though a whole lot of Americans in­creas­ingly use AI in their daily lives, most of them have neu­tral to neg­a­tive views about it, the re­search re­veals.

Only 16% of Americans think that AIs im­pact on so­ci­ety dur­ing the next 20 years will be pos­i­tive, Pew says, while around 40% say that it will have a neg­a­tive im­pact.

A vast ma­jor­ity of peo­ple (67%) don’t be­lieve that the U.S. gov­ern­ment will do any­thing to mean­ing­fully reg­u­late AI. A sim­i­larly skep­ti­cal co­hort (59%) don’t trust com­pa­nies to de­velop it safely.

Young peo­ple — that is, those peo­ple un­der 30 — are the ones with the most neg­a­tive feel­ings about AI. Pew says that only 14% of this co­hort be­lieve the tech will have a pos­i­tive im­pact on so­ci­ety.

On top of all this, a vast ma­jor­ity of Americans — nearly two-thirds — also think that AIs de­vel­op­ment is oc­cur­ring too quickly.

Despite all of the skep­ti­cism, a whole lot of Americans also re­port us­ing AI in their daily lives on an in­creas­ingly reg­u­lar ba­sis. About a quar­ter of Americans say they use AI chat­bots on a daily ba­sis. Those who do are typ­i­cally us­ing the chat­bots for re­search pur­poses or for work, Pew says.

A vast ma­jor­ity of peo­ple us­ing AI are us­ing ChatGPT. Pew writes that 44% of U.S. adults now say they use OpenAI’s chat­bot, a fig­ure that’s more than dou­bled since 2023.

The next most pop­u­lar chat­bot is Gemini (24%), fol­lowed by Copilot (17%) and Meta AI (14%), with Grok (8%), Claude (6%), and Character.ai (3%) lag­ging be­hind.

There is a bit of a gen­der di­vide. While chat­bot use is grow­ing for both men and women, men still use AI more and are more en­thu­si­as­tic about it, while women are more skep­ti­cal, Pew says. Men are more likely to say they use AI chat­bots in their daily lives (27% ver­sus 20% for women) and while equal shares of men and women re­port us­ing ChatGPT, men more com­monly re­port us­age of other brands, such as Copilot and Grok.

The re­port also high­lights how AI is chang­ing the ways Americans con­sume in­for­ma­tion. Six in 10 sur­vey re­spon­dents told Pew that they rou­tinely read AI-generated in­ter­net sum­maries (indeed, on Google, they’re pretty much un­avoid­able). A much smaller num­ber re­port us­ing AI to get in­for­ma­tion on fit­ness and di­et­ing.

There are also still a whole lot of peo­ple — about half of the coun­try — that say they do not use AI in their daily lives. The peo­ple who do not use AI tend to be older, while those un­der 50 are more likely to say that they use it. Nearly 75% of Americans aged 65 or older say that they never use AI chat­bots.

Those peo­ple who don’t use chat­bots say they don’t be­cause they’re not in­ter­ested in them, and add that they have no in­ten­tion of us­ing them in the fu­ture.

When you pur­chase through links in our ar­ti­cles, we may earn a small com­mis­sion. This does­n’t af­fect our ed­i­to­r­ial in­de­pen­dence.

Lucas is a se­nior writer at TechCrunch, where he cov­ers ar­ti­fi­cial in­tel­li­gence, con­sumer tech, and star­tups. He pre­vi­ously cov­ered AI and cy­ber­se­cu­rity at Gizmodo.

You can con­tact Lucas by email­ing lu­cas.ropek@techcrunch.com.

View Bio

The request could not be satisfied

chat.deepseek.com

403 ERROR

Generated by cloud­front (CloudFront) Request ID: AujlEP-3ZFb_uSvqWbC5yj33f1kCEwKjnecB3Ur8b08zDUr2wB9koQ==

AMD silently removes memory encryption from consumer Ryzen CPUs, leaving users unaware that they may be vulnerable — security feature vanishes after newer AGESA firmware, AMD engineers go radio silent when pressed about the change

www.tomshardware.com

According to a re­port by Ars Technica, AMD has qui­etly stripped a crit­i­cal se­cu­rity fea­ture from its lower-end CPUs, leav­ing un­aware users po­ten­tially vul­ner­a­ble to phys­i­cal at­tacks. Following a months-long in­ves­ti­ga­tion tracked on GitHub, Ben Kilpatrick con­firmed that the Transparent Secure Memory Encryption (TSME) fea­ture — which pro­tects CPUs against phys­i­cal ex­ploits that siphon data from con­nected mem­ory chips — was sud­denly no longer avail­able on AMD CPUs out­side the com­pa­ny’s Pro lineup.

As the ex­haus­tive in­quiry, which in­volved con­ver­sa­tions with AMD en­gi­neers, board ven­dors, and other CPU users, was com­ing to a head, an AMD en­gi­neer abruptly cut dis­cus­sions short, stat­ing, My apolo­gies, but I don’t have any more in­for­ma­tion to share on this topic.” As of this re­port, AMD has nei­ther of­fi­cially ac­knowl­edged nor ex­plained the dis­ap­pear­ance of the se­cu­rity fea­ture.

TSME is a pro­tec­tion fea­ture that en­crypts the data stored in mem­ory, mak­ing it un­us­able to phys­i­cal at­tack­ers. AMD ini­tially added this fea­ture to its high-end CPUs, then later ex­tended it to lower-end CPUs. Eventually, the fea­ture be­came a given, leav­ing lower-end chip users as­sured in its avail­abil­ity as part of the chip pack­age. However, with­out prior no­tice, AMD ap­pears to have scrapped the se­cu­rity fea­ture in these proces­sors.

According to the Ars re­port, the com­pa­ny’s only of­fi­cial re­ac­tion to the mat­ter — not count­ing the GitHub dis­cus­sions — is an email re­sponse stat­ing that TSME is a se­cu­rity fea­ture only ap­plied to PRO CPUs as part of AMD PRO Technologies,” no­tably the first time the com­pany has pub­licly stated such a re­stric­tion, de­spite the fea­ture hav­ing worked on con­sumer chips for years. However, it re­mains un­clear whether the dis­ap­pear­ance is an in­ten­tional pol­icy de­ci­sion by AMD to re­serve TSME for Pro chips or an un­in­ten­tional re­gres­sion that was in­tro­duced in AGESA 1.2.7.0, a newer firmware re­lease.

Another con­cern­ing as­pect of the re­moval is that the fea­ture’s dis­ap­pear­ance is com­pletely un­de­tectable on Windows ma­chines and re­quires sig­nif­i­cant tech­ni­cal work to iden­tify on Linux. That means the se­cu­rity fea­ture was re­moved, leav­ing users un­aware that any­thing had changed.

Kilpatrick, a self-de­scribed privacy-conscious Linux hob­by­ist” who first re­ported the change, was in­stalling a new op­er­at­ing sys­tem on his ma­chine run­ning a Ryzen 7 9700X from the Zen 5 ar­chi­tec­ture. To con­firm that all his se­cu­rity pro­tec­tions were en­abled, he ran Host Security ID (HSI), an au­dit­ing fea­ture that eval­u­ates a sys­tem’s firmware and hard­ware se­cu­rity con­fig­u­ra­tions. To his sur­prise, HSI re­ported that TSME was no longer sup­ported — even though he had en­abled it in his BIOS set­tings all along. The con­tra­dic­tion sent him search­ing for an­swers.

His first in­stinct was to reach out to MSI, his moth­er­board’s man­u­fac­turer, but the com­pany did­n’t ini­tially pro­vide a de­fin­i­tive ex­pla­na­tion. He also filed a bug re­port on AMDs pub­lic en­gi­neer­ing GitHub repos­i­tory, where two AMD en­gi­neers even­tu­ally re­sponded: Tom Lendacky, an AMD fel­low soft­ware en­gi­neer, and Mario Limonciello, an AMD se­nior prin­ci­pal soft­ware en­gi­neer.

Get Tom’s Hardware’s best news and in-depth re­views, straight to your in­box.

Interestingly, nei­ther en­gi­neer ap­peared to have a clear an­swer for why the fea­ture had dis­ap­peared. Their ad­vice was ba­si­cally the same: dis­able and re-en­able the op­tion in the BIOS, and if that did­n’t work, take it up with the moth­er­board man­u­fac­turer, mak­ing it clear that peo­ple di­rectly at AMD were just as in the dark as the user re­port­ing it.

It was only af­ter this that Kilpatrick pressed MSI harder, even­tu­ally con­vinc­ing its en­gi­neers to run con­trolled tests. They found that con­sumer Ryzen chips had TSME en­abled un­der an older firmware ver­sion but showed it as not sup­ported” un­der a newer one (AGESA 1.2.7.0), while Pro ver­sions of the CPU sup­ported the fea­ture re­gard­less of the firmware or moth­er­board used.

This leaves the big ques­tion of whether AMD de­lib­er­ately re­stricted TSME to its Pro chips, or whether the change was an ac­ci­den­tal re­gres­sion — a firmware bug in­tro­duced in that newer AGESA ver­sion. Either way, the sil­i­con ap­pears to have been ca­pa­ble of run­ning the fea­ture. The dif­fer­ence is whether users are look­ing at a bug that AMD should fix or a quiet prod­uct-seg­men­ta­tion de­ci­sion that AMD has not prop­erly ex­plained.

Kilpatrick took these MSI find­ings back to the AMD en­gi­neers and re­sumed the dis­cus­sion six weeks later. MSIs prod­uct mar­ket­ing team, he re­ported, had been told di­rectly by AMD that TSME is ex­clu­sively sup­ported on Pro se­ries proces­sors. He also re­layed MSIs test re­sults: an in­ter­nal AGESA flag that con­trols whether TSME ac­ti­vates dur­ing boot re­turned FALSE on con­sumer chips re­gard­less of the BIOS set­ting, but TRUE on Pro proces­sors when the fea­ture was en­abled.

Kilpatrick then brought up some­thing es­pe­cially awk­ward. He re­minded Lendacky of a com­ment that the en­gi­neer had made back in 2020, con­firm­ing that a Ryzen 3700X, a con­sumer CPU, should sup­port TSME.” In a later 2025 com­ment in the same dis­cus­sion, Lendacky again rec­om­mended us­ing TSME, while not­ing that the moth­er­board BIOS provider had to ex­pose the op­tion. So there it was, AMDs own en­gi­neer, years ear­lier, ac­knowl­edg­ing the fea­ture work­ing on ex­actly the kind of lower-end chip now stripped of it, prov­ing that Ryzen sup­port was not some fan­tasy users in­vented.

After some more back-and-forth, Kilpatrick asked bluntly whether the flag be­ing set to FALSE on con­sumer chips was a sil­i­con-level lim­i­ta­tion or a firmware pol­icy de­ci­sion — since one is per­ma­nent and the other is po­ten­tially re­versible. Limonciello’s re­ply ef­fec­tively closed the chap­ter. My apolo­gies, but I don’t have any more in­for­ma­tion to share on this topic,” he wrote.

To be fair to AMD, there is no clear in­di­ca­tion that the com­pany ever pub­licly ad­ver­tised TSME as a con­sumer Ryzen fea­ture. AMD has long said that a re­lated mem­ory pro­tec­tion, Secure Memory Encryption (SME), is avail­able only in the Pro and EPYC CPU tiers. SME is OS-managed, us­ing a sin­gle key and al­low­ing the OS to se­lec­tively en­crypt in­di­vid­ual mem­ory pages. TSME, by con­trast, is firmware-man­aged, en­crypt­ing all RAM with no OS in­volve­ment. When ac­tive, it guards against phys­i­cal at­tacks such as cold-boot ex­ploits, DRAM in­ter­face snoop­ing, and mem­ory mod­ule re­moval, and it ac­ti­vates silently once en­abled in the BIOS, mak­ing it the more prac­ti­cally use­ful of the two pro­tec­tions.

For now, AMD has said noth­ing of­fi­cial. It has­n’t con­firmed what hap­pened, why it hap­pened, whether any­thing ac­tu­ally changed, or what users of its con­sumer chips should now ex­pect. Given the years of TSME qui­etly do­ing its job on these lower-cost proces­sors — and the AMD en­gi­neers’ sup­posed own past com­ments treat­ing it as sup­ported — users had every rea­son to re­gard it as part of the pack­age.

For most con­sumer Ryzen users, the prac­ti­cal im­pact of the change is nar­row. TSME pro­tects against phys­i­cal at­tacks, mean­ing sce­nar­ios in which some­one has phys­i­cal ac­cess to the ma­chine or its mem­ory hard­ware and at­tempts to ex­tract se­crets di­rectly from RAM. The fea­ture is more im­por­tant for peo­ple car­ry­ing sen­si­tive lap­tops, han­dling con­fi­den­tial work, re­ly­ing on full-disk en­cryp­tion, or op­er­at­ing in en­vi­ron­ments where seizure, theft, or hard­ware tam­per­ing is a re­al­is­tic con­cern. Anyone who gen­uinely needs mem­ory en­cryp­tion on AMD hard­ware now ap­pears to need a Ryzen Pro or EPYC sys­tem, un­less AMD clar­i­fies the sit­u­a­tion or re­stores sup­port.

Follow Tom’s Hardware on Google News, or add us as a pre­ferred source, to get our lat­est news, analy­sis, & re­views in your feeds.

Etiido Uko is a news con­trib­u­tor for Tom’s Hardware cov­er­ing the lat­est up­dates in big tech and the PC in­dus­try. He is a me­chan­i­cal en­gi­neer and se­nior tech­ni­cal writer with over nine years of ex­pe­ri­ence in doc­u­men­ta­tion and re­port­ing. He is deeply pas­sion­ate about all things en­gi­neer­ing and tech­nol­ogy, and is an ex­pert in gad­gets, man­u­fac­tur­ing, ro­bot­ics, au­to­mo­tive, and aero­space.

Tesco moving 40,000 server workloads off VMware amid Broadcom's “abusive conduct”

arstechnica.com

Tesco is also deal­ing with mi­gra­tion chal­lenges re­lated to data se­cu­rity be­cause its new, un­named vir­tu­al­iza­tion soft­ware is in­com­pat­i­ble with the Veeam and Zerto prod­ucts it uses.

Manifestly un­fair and ex­ces­sive” price hike

Tesco ini­tially re­quested at least 100 mil­lion pounds (about $133.6 mil­lion) in dam­ages each from Broadcom, VMware, and re­seller Computacenter, plus in­ter­est.

In its re­cent fil­ings, Tesco said it turned down at least four of­fers from Broadcom to con­tinue us­ing VMware and Broadcom’s main­frame tech. One of­fer charged $23.5 mil­lion (about 17.6 mil­lion pounds) for VMware Cloud Foundation 9.0 and main­frame soft­ware and sup­port ser­vices for a year, The Register re­ported. Tesco said that was around 175 per­cent” more ex­pen­sive than what it be­lieves it should have had to pay for VMware and a 350 per­cent price hike for the main­frame of­fer­ings. The prices were manifestly un­fair and ex­ces­sive,” one of Tesco’s fil­ings said, ac­cord­ing to The Register.

In an amended de­fense, Broadcom de­nied that the price hike was un­fair, The Register re­ported. Additionally, Broadcom ar­gued that it should­n’t have to pay dam­ages in re­la­tion to Tesco strug­gling to find VMware and Broadcom al­ter­na­tives be­fore Tesco’s sup­port ex­pired, as the re­tail firm has since found re­place­ment prod­ucts.

The case is ex­pected to go to court be­tween November 1, 2027, and February 25, 2028, The Register re­ported. Afterward, it could go to trial.

Although the com­pa­nies will con­tinue their dis­pute in UK courts, the dis­agree­ment mir­rors frus­tra­tions that VMware cus­tomers and part­ners around the world have ex­pressed since Broadcom bought VMware. With users of­ten be­ing heav­ily de­pen­dent on VMware prod­ucts, many have de­layed or avoided mi­gra­tion or are only mov­ing some work­loads, due to com­pli­ca­tions around cost, time, sup­port, and com­pat­i­bil­ity.

Still, vir­tu­al­iza­tion ri­vals, like Hewlett Packard Enterprise and Nutanix, have been mak­ing ag­gres­sive pushes to at­tract dis­grun­tled VMware users.

Simultaneously, Broadcom has stuck to its VMware strat­egy and has re­ported fi­nan­cial suc­cess, es­pe­cially among its tar­get of large en­ter­prises. It has also dealt with other pub­lic le­gal dis­putes with large cus­tomers, in­clud­ing AT&T, with which it reached an undis­closed set­tle­ment, and Siemens, which Broadcom ac­cused of soft­ware pi­rat­ing in an on­go­ing case in the US District Court for the District of Delaware.

Microsoft’s new Outlook takes 10 seconds to do what Outlook Classic does instantly on Windows

www.windowslatest.com

Microsoft’s Outlook for Windows has a no­ti­fi­ca­tion prob­lem that is hard to ig­nore. Clicking a Windows 11 no­ti­fi­ca­tion for a new email is sup­posed to take you straight to that mes­sage. Instead, the new Outlook makes you wait, and the num­bers are em­bar­rass­ing.

Windows 11 ships with two ver­sions of Outlook. There is Outlook Classic, the long-run­ning Win32 desk­top app built for power users, and there is the new Outlook, which Microsoft is push­ing as the fu­ture of email on Windows. The newer one is built on WebView2 and is, in essence, a browser win­dow that loads Outlook.com. If you have ever used both side by side, you al­ready know which one feels faster and which one does not.

Outlook has had a com­pli­cated rep­u­ta­tion for years. The orig­i­nal Win32 app be­came in­fa­mous for be­ing bloated and dif­fi­cult to con­fig­ure. Microsoft’s an­swer was to ditch na­tive code and re­build from the web up. The re­sult, called the new Outlook, re­placed the light­weight UWP Mail and Calendar apps that some Windows users had grown used to. Windows Latest re­ported back in 2023 how users protested when Microsoft an­nounced plans to re­tire those UWP apps in fa­vor of a web wrap­per. The com­pany pushed ahead any­way, and by late 2024, the Mail and Calendar apps were of­fi­cially shut down.

Microsoft has also been push­ing the new Outlook at en­ter­prises, though it post­poned the forced opt-out dead­line to March 2027 from the orig­i­nally planned April 2026. A de­lay of a full year shows that even Microsoft knows the app is not fully ready for every work­load. New Outlook has im­proved in real ways since launch, but the per­for­mance story is still a mixed one, and nowhere is that more ap­par­ent than in how it han­dles no­ti­fi­ca­tions.

New Outlook takes 10 sec­onds to go from no­ti­fi­ca­tion to the re­spec­tive mail

Before get­ting to the frus­trat­ing part, credit where it’s due, New Outlook used to be no­tice­ably slow to launch from scratch, but not any­more.

Outlook (classic) vs New Outlook:

https://​www.win­dowslat­est.com/​wp-con­tent/​up­loads/​2026/​06/​Out­look-clas­sic-vs-New-Out­look-open­ing-speed-com­par­i­son.mp4

New Outlook now opens al­most as fast as Outlook Classic, which is still slightly quicker of the two. But I would say both are neck and neck, at least when it comes to open­ing speeds.

However, when a new email ar­rives in Windows 11, a no­ti­fi­ca­tion ban­ner ap­pears at the bot­tom right of your screen, and that’s where the prob­lem starts. Clicking that ban­ner, or from the Notification Center, is sup­posed to take you di­rectly to the email that trig­gered it.

With Outlook Classic, it opens that spe­cific email al­most in­stantly.

https://​www.win­dowslat­est.com/​wp-con­tent/​up­loads/​2026/​06/​Out­look-Clas­sic-opens-from-no­ti­fi­ca­tions-in­stantly.mp4

With the new Outlook, click­ing the no­ti­fi­ca­tion opens the app, loads the full in­box, and then takes around 10 sec­onds be­fore the spe­cific email from the no­ti­fi­ca­tion shows up on screen.

https://​www.win­dowslat­est.com/​wp-con­tent/​up­loads/​2026/​06/​Out­look-takes-10-sec­onds-to-show-the-e-mail-from-no­ti­fi­ca­tion.mp4

What makes this even more ab­surd is that if you ig­nore the no­ti­fi­ca­tion ban­ner and in­stead open Outlook di­rectly from the Start menu, you can find and click the new email from within the app and be done with it, all be­fore the no­ti­fi­ca­tion ban­ner even dis­ap­pears from the screen.

https://​www.win­dowslat­est.com/​wp-con­tent/​up­loads/​2026/​06/​Open­ing-new-e-mail-di­rectly-from-Out­look-in­stead-of-no­ti­fi­ca­tion.mp4

Five sec­onds to open Outlook and click an email man­u­ally. Ten sec­onds wait time to see that same email if we click the no­ti­fi­ca­tion di­rectly. This is ridicu­lous, even for Microsoft.

And, as it turns out, this is­n’t a prob­lem Microsoft can eas­ily fix with an app up­date.

Outlook is based on WebView2, and like all web apps, it’s slow

New Outlook is built on Microsoft Edge’s WebView2 run­time, which is a Chromium-based ren­der­ing en­gine. Every time you in­ter­act with the app, in­clud­ing click­ing a no­ti­fi­ca­tion, a browser-like process chain has to do the work. The app has to ini­tial­ize or re­sume its web layer, au­then­ti­cate, load the rel­e­vant mail thread, and ren­der it, all through that web en­gine.

As Windows Latest re­ported in December 2025, Microsoft ac­knowl­edged this slow­ness and was test­ing a new API called Delayed Message Timing” to help di­ag­nose per­for­mance is­sues in WebView2 apps. However, we haven’t seen any use of that API while click­ing Outlook no­ti­fi­ca­tions.

New Outlook runs as 10 sep­a­rate processes in Task Manager, com­pared to Outlook Classic, which runs as a sin­gle com­pact process. The list in­side the new Outlook in­cludes WebView2 Manager, mul­ti­ple WebView2 Utility processes, a WebView2 GPU Process, a WebView2 Service Worker, and more. Each of those is es­sen­tially a browser com­po­nent. They all con­sume mem­ory in­di­vid­u­ally, and they all take time to re­sume from a sus­pended state when you click a no­ti­fi­ca­tion.

Speaking of mem­ory, the new Outlook uses be­tween 490 MB and 636 MB of RAM while idle, with in­di­vid­ual ses­sions vary­ing based on mail­box size. Outlook Classic, do­ing the same job, uses around 117 MB to 148 MB at idle. A roughly four­fold dif­fer­ence.

As for CPU, new Outlook uses around 4% at idle while Outlook Classic uses less than 1%. These num­bers are from my own mea­sure­ments us­ing Task Manager with both apps open si­mul­ta­ne­ously.

Of course, these is­sues are com­mon to all web apps. As we re­ported, WhatsApp now con­sumes 1.2 GB of RAM do­ing noth­ing af­ter Meta re­placed its na­tive WinUI app with a WebView2 wrap­per.

Microsoft has been aware of the of­fline and per­for­mance lim­i­ta­tions of new Outlook for some time. The com­pany spent much of 2024 try­ing to make the app work prop­erly with­out an in­ter­net con­nec­tion, some­thing Outlook Classic han­dles na­tively by caching mail lo­cally. A web app, by de­sign, is al­ways reach­ing out to a server.

New Outlook is im­prov­ing, but the gap with Classic is­n’t clos­ing any­time soon

In fair­ness, the new Outlook has come a long way since its rocky de­but. The March 2026 up­date added bet­ter folder search op­tions and im­proved shared mail­box ac­cess, two ar­eas where the app lagged be­hind Classic for a long time. The May 2026 up­date brought au­tomapped cal­en­dar sup­port, so switch­ing from Classic to new Outlook no longer drops your shared cal­en­dars. Teammate cal­en­dars now show up au­to­mat­i­cally in the nav­i­ga­tion pane.

More re­cently, Microsoft con­firmed a June 2026 up­date with five no­table ad­di­tions, in­clud­ing an all-ac­counts in­box view (also called Unified Inbox) ar­riv­ing in August 2026, im­proved mail merge, and ex­panded .PST sup­port. The .PST im­port up­date in July 2026 will let users bring in cal­en­dar items and con­tacts from lo­cal archive files, which was a long-stand­ing pain point for any­one switch­ing from Classic.

The push to get peo­ple to switch is get­ting louder too. Microsoft listed 15 pro­duc­tiv­ity fea­tures in early June 2026 as rea­sons to make the move from Classic. The list in­cludes of­fline ac­cess, richer Copilot in­te­gra­tion, faster search, im­proved cal­en­dar con­trols, and more. Many of those fea­tures are things Classic users have had for years, which makes the fram­ing a bit odd, but the di­rec­tion is clear. Microsoft wants new Outlook to be­come the de­fault ex­pe­ri­ence for every­one.

We were also told in late 2025 that a cal­en­dar agenda view in the Notification Center was com­ing, bring­ing back a Windows 10 fea­ture that went miss­ing with Windows 11. The agenda view, when it ar­rives, will also be pow­ered by WebView2. Whether it in­tro­duces sim­i­lar de­lays re­mains to be seen.

A web app can­not fix per­for­mance is­sues

Microsoft cel­e­brated growth for new Outlook in 2024, but a sig­nif­i­cant por­tion of that growth came from forced mi­gra­tion. People did not choose the web app be­cause it was faster. They were moved to it be­cause the apps they pre­vi­ously used (Mail + Calendar) were shut down.

New Outlook open­ing fast from the Start menu is a real im­prove­ment. The work be­ing done on new fea­tures shows that Microsoft is lis­ten­ing to com­plaints. But un­til the no­ti­fi­ca­tion ex­pe­ri­ence matches what Outlook Classic has been do­ing with­out is­sue for years, the new Outlook is still work­ing around a fun­da­men­tal con­straint im­posed by the WebView2 ar­chi­tec­ture.

The only so­lu­tion, as you might’ve guessed, is the move to WinUI. We al­ready re­ported that Microsoft is now fully com­mit­ted to WinUI, with Rudy Huyn prepar­ing a team to make na­tive Windows apps, and so we may see a na­tive Outlook too…

For now, if fast no­ti­fi­ca­tion han­dling is im­por­tant to your work­flow, Outlook Classic is the more re­li­able choice. Classic Outlook is still avail­able to down­load and is sup­ported un­til April 2029. The new Outlook will keep im­prov­ing, but some of its lim­i­ta­tions are baked into how it is built, and those are harder to fix with a fea­ture up­date.

Leaked financial docs show OpenAI is losing billions of dollars a year

arstechnica.com

As OpenAI files SEC pa­per­work ahead of an ex­pected ini­tial pub­lic stock of­fer­ing, newly leaked fi­nan­cial doc­u­ments show a com­pany with quickly grow­ing rev­enues that are cur­rently be­ing over­whelmed by even larger ex­penses.

The au­dited fi­nan­cial state­ments, ob­tained by in­de­pen­dent jour­nal­ist Ed Zitron, show OpenAI’s re­ported rev­enue grow­ing from $3.7 bil­lion in 2024 to $13.07 bil­lion in 2025. The Financial Times, which re­viewed the same doc­u­ments, writes that the com­pa­ny’s monthly rev­enues had grown to nearly $2 bil­lion by the end of 2025, sug­gest­ing that its on­go­ing rev­enue rates con­tin­ued to grow through­out the year.

R&D ex­penses alone still eas­ily out­pace OpenAI’s quickly grow­ing rev­enues.

Credit: Ars Technica

R&D ex­penses alone still eas­ily out­pace OpenAI’s quickly grow­ing rev­enues.

Credit:

Ars Technica

But the com­pa­ny’s fast-grow­ing rev­enues are still dwarfed by its even more sig­nif­i­cant ex­penses. OpenAI’s to­tal rev­enues in both of the last two years were out­paced by re­search and de­vel­op­ment alone, which grew from a $7.81 bil­lion line item in 2024 to a mas­sive $19.18 bil­lion cost in 2025. Those num­bers seem to re­flect the sig­nif­i­cant costs OpenAI in­curred in train­ing new mod­els and in­clude $10.59 bil­lion in R&D costs paid to Microsoft alone in 2025.

On top of that, OpenAI’s cost of rev­enue” (i.e., the money spent pro­duc­ing and dis­trib­ut­ing the prod­uct) in­creased from $2.65 bil­lion in 2024 to $7.5 bil­lion in 2025. This cost line likely re­flects the sig­nif­i­cant com­pute costs in­curred at inference time” as the com­pa­ny’s mod­els re­spond to a grow­ing num­ber of user prompts. Costs as­so­ci­ated with sales and mar­ket­ing also grew from $1.11 bil­lion in 2024 to $5.73 bil­lion in 2025.

OpenAI’s op­er­at­ing loss is shrink­ing as a per­cent­age of rev­enue, but there’s a long way to go be­fore it be­comes a profit.

Credit: Ars Technica

OpenAI’s op­er­at­ing loss is shrink­ing as a per­cent­age of rev­enue, but there’s a long way to go be­fore it be­comes a profit.

Credit:

Ars Technica

All told, OpenAI’s day-to-day loss from op­er­a­tions” in­creased from $8.78 bil­lion in 2024 to $20.92 bil­lion in 2025, a con­cern­ing di­rec­tion for a com­pany that is telling in­vestors it hopes to be prof­itable by 2030. But mea­sured as a per­cent­age of rev­enues, the com­pa­ny’s op­er­at­ing losses slightly im­proved year to year, from 237 per­cent in 2024 to 160 per­cent in 2025.

I discovered a large-scale malware distribution campaign on GitHub

orchidfiles.com

18 June 2026

This is the story of how I found 10,000 repos­i­to­ries on GitHub that dis­trib­ute Trojan mal­ware. They are all from dif­fer­ent con­trib­u­tors, have dif­fer­ent names, and are not forks of other repos­i­to­ries. But they share a com­mon pat­tern, which is what al­lowed me to write a script to find such repos­i­to­ries.

Introduction

I have a pro­ject on GitHub, and I wanted to check whether search en­gines had in­dexed it. I typed the pro­ject name into Google, and my repos­i­tory ap­peared in the re­sults. I en­tered the same query into Bing, and some­one else’s repos­i­tory ap­peared in the re­sults, with the ex­act same name and de­scrip­tion. It was a copy of my repos­i­tory with all the com­mits, and I was listed as a con­trib­u­tor. But an hour ago, an­other com­mit was pushed with a change to the readme. A link to a zip archive has been added to it.

I was choos­ing ap­pro­pri­ate tags for an­other one of my pro­jects on GitHub. I clicked through those tags to look at sim­i­lar pro­jects. In the list, I found a repos­i­tory whose name and de­scrip­tion matched ex­actly those of an­other repos­i­tory on that list. It turned out that it also con­tained copies of all the com­mits from that repos­i­tory, and two hours ago, a link to a zip archive has been added to the readme.

After mon­i­tor­ing these two repos­i­to­ries, I dis­cov­ered that every few hours they delete the pre­vi­ous com­mit and push the ex­act same com­mit again. This com­mit con­tains only one change: adding a link to the archive in the readme file.

I sub­mit­ted a re­quest to GitHub sup­port ask­ing them to delete these repos­i­to­ries. Two weeks passed and noth­ing has changed; GitHub sup­port has­n’t re­sponded. I dis­cussed with an AI what else could be done about this, but it did­n’t of­fer any use­ful ad­vice. I opened a thread on GitHub, and three peo­ple replied with the same AI slop that was of no use at all.

Another month later, GitHub sup­port sent me an email say­ing that they had re­moved these repos­i­to­ries.

You can open other sim­i­lar repos­i­to­ries, look at the lat­est com­mit, and see that a link to a zip archive was added to the readme a few hours ago:https://​github.com/​lu­cash­eriq4374/​welinkhttps://​github.com/​lu­ci­olo­prey/​OcyShield-Frame­workhttps://​github.com/​luigi1973/​As­setRip­per-CLI

The zip archive con­tains 4 files:

Application.cmd or Launcher.cmd

loader.exe or lu­a­jit.exe or an­oth­er_­name.exe

ran­dom_­name.cso or ran­dom_­name.txt

lu­a51.dll

If you sub­mit a link to the archive to VirusTotal, it will find 0 viruses.If you sub­mit the zip file it­self, it will de­tect a Trojan in­side it.

Continued

It seemed like I had al­ready for­got­ten about this event, but my sub­con­scious had­n’t. And my sub­con­scious of­ten throws in­ter­est­ing ideas at me when I’m sleep­ing or wak­ing up. Recently, I woke up and in the very same sec­ond re­al­ized what I needed to do. I need to come up with a gen­eral pat­tern and then write a script that will an­a­lyze all GitHub repos­i­to­ries and find the ones that match that pat­tern.

Search pat­tern:

Every few hours the pre­vi­ous com­mit is deleted and a new one is pushed

Only the readme file is up­dated in the com­mit

The readme file con­tains a link to a zip archive

The com­mits are copied from an­other repos­i­tory

This is a new repos­i­tory, not a fork

All repos­i­to­ries have dif­fer­ent con­trib­u­tors and dif­fer­ent names

From the last two points, it be­comes clear that even if we find one such repos­i­tory, we won’t be able to find other sim­i­lar repos­i­to­ries us­ing it. But there are 500 mil­lion repos­i­to­ries on GitHub. How can we an­a­lyze all of them? GitHub al­lows 5,000 re­quests per hour with a sin­gle to­ken. For each repos­i­tory, we need to make sev­eral re­quests to get the list of com­mits, mod­i­fied files, and the con­tent of the readme file. I did­n’t want to wait a year for the script to an­a­lyze all the repos­i­to­ries.

But we don’t need all the repos­i­to­ries, we only need the ones that are up­dated every few hours. I found a ser­vice called gharchive, which lets you down­load all GitHub events for any given day. So we need to down­load the event archives for the last few days, fil­ter them to in­clude only com­mit push events, and iden­tify the repos­i­to­ries that are up­dated be­tween 2 and 10 times every 10 hours.

Over the past 5 days, there have been 16 mil­lion com­mit pushes. Of these, only 3,000 are repos­i­to­ries that are up­dated every few hours.

However, the events do not in­clude in­for­ma­tion about which spe­cific files were mod­i­fied. This means that for each rel­e­vant repos­i­tory, we need to make ad­di­tional re­quests to the GitHub API.

After run­ning the script, it re­turned a large num­ber of repos­i­to­ries. I added sev­eral pa­ra­me­ters to the fil­ters:

The com­mit must be from a user, not a bot

More than a month has passed be­tween the last com­mit and the one be­fore that

The repos­i­to­ries have more than one con­trib­u­tor

After that, only 14 repos­i­to­ries were found that fully matched the pat­tern. And I could­n’t stop won­der­ing: why were there so few repos­i­to­ries? What are the odds that I stum­bled upon these repos­i­to­ries two months ago and there are only 14 of them on the en­tire GitHub? There should be many more. Imagine what the head­line of this ar­ti­cle would have been if I’d found a mil­lion such repos­i­to­ries or even just a thou­sand.

But I ac­cepted the fact that there were only 14 of them and started writ­ing this ar­ti­cle. I de­cided to dou­ble-check them one more time so I would­n’t ac­ci­den­tally in­clude any un­nec­es­sary repos­i­to­ries in the ar­ti­cle. Imagine my sur­prise when I saw that they had all been up­dated 20 hours ago. So the updated every few hours” pa­ra­me­ter was com­pletely wrong. The fil­ter had dis­carded all repos­i­to­ries that are up­dated in­fre­quently.

During my man­ual check, I also no­ticed repos­i­to­ries that con­tained a link to a zip archive and had a re­cent com­mit, but that com­mit had zero changes. The fil­ter, how­ever, only con­sid­ered repos­i­to­ries where a sin­gle readme file had been mod­i­fied in the lat­est com­mit.

I also no­ticed that the last com­mit in all of these repos­i­to­ries had the same name: Update README.md”.

I changed the fil­ter. Now the script searched for repos­i­to­ries that were up­dated be­tween 1 and 24 times every 24 hours. It found 40,000 such repos­i­to­ries.

There were 10,000 repos­i­to­ries that ex­actly matched the pat­tern. That’s 25% of the to­tal.

Each of these repos­i­to­ries con­tains a zip archive with a Trojan.

These repos­i­to­ries have been around for many months, some even for over a year, and GitHub does not au­to­mat­i­cally de­tect and delete them.

I’ve pub­lished a com­plete list of these repos­i­to­ries on GitHub.A script for find­ing such repos­i­to­ries: Git Malware Finder

Open Questions

Why do they only clone new repos­i­to­ries, rather than pop­u­lar ones?

Why do they delete a com­mit and push a new one every few hours?

Why does­n’t GitHub au­to­mat­i­cally de­tect such repos­i­to­ries?

What ex­actly does the ex­e­cutable exe file from the archive do?

What is the ac­tual scale of this cam­paign?

My Hypotheses

The hack­ers’ goal is to un­der­stand how the sys­tem works, find its lim­i­ta­tions and vul­ner­a­bil­i­ties, and ex­ploit that in­for­ma­tion. If over­writ­ing com­mits helps by­pass GitHub’s se­cu­rity al­go­rithms, then that’s what they did. Perhaps that’s also why every com­mit is named Update README.md”.

The sec­ond goal is to spread the virus. How do they get peo­ple to find and down­load it? I think they do this by cloning only new repos­i­to­ries, which im­me­di­ately ap­pear at the top of search en­gine re­sults for low-vol­ume search terms. They also add these repos­i­to­ries to pop­u­lar GitHub tags to in­crease the chances of in­dex­ing and to help peo­ple find them through those tags.

But why do they copy all the com­mits and con­trib­u­tors? After all, they could have just copied the en­tire source code. This is likely done to build trust. When some­one vis­its a repos­i­tory, they see the con­trib­u­tors, can click through to their pro­files, and see that these aren’t one-day ac­counts. And the com­mit his­tory is pre­served so it’s clear that the repos­i­tory did­n’t just ap­pear yes­ter­day. But per­haps this is also done to by­pass GitHub’s al­go­rithms.

These are just my as­sump­tions, but the re­al­ity may be com­pletely dif­fer­ent.

Conclusion

I was sub­ject to GitHub’s API limit of 5,000 re­quests per hour. I op­ti­mized the script to search only for rel­e­vant repos­i­to­ries, and I think that be­cause of the fil­ter, the script found only a small per­cent­age of repos­i­to­ries. The GitHub team does not have such lim­i­ta­tions. They can an­a­lyze all 500 mil­lion repos­i­to­ries, find any archives or ex­e­cutable files within them, and scan them for viruses.

This time, I won’t be send­ing a re­quest to GitHub. There are sim­ply too many repos­i­to­ries. If any of you have di­rect con­tact with GitHub’s se­cu­rity team, please send them a link to this ar­ti­cle.

* UpdateI found this ar­ti­cle from April 18: How 109 Fake GitHub Repositories Delivered SmartLoader and StealCIt ex­plains in de­tail how this Trojan mal­ware works. At that time, the au­thor had found 109 such repos­i­to­ries.

* Update 2GitHub has started delet­ing all the repos­i­to­ries that the script found. Most of these repos­i­to­ries have al­ready been deleted.

I also pub­lish all new es­says and notes on Telegram, Bluesky and X.

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

Visit pancik.com for more.