10 interesting stories served every morning and every evening.




1 644 shares, 26 trendiness

I Measured Claude 4.7's New Tokenizer. Here's What It Costs You.

I Measured Claude 4.7′s New Tokenizer. Here’s What It Costs You. I Measured Claude 4.7′s New Tokenizer. Here’s What It Costs You.The docs said 1.0–1.35x more to­kens. On real con­tent, I mea­sured 1.47x.Anthropic’s Claude Opus 4.7 mi­gra­tion guide says the new to­k­enizer uses roughly 1.0 to 1.35x as many to­kens” as 4.6. I mea­sured 1.47x on tech­ni­cal docs. 1.45x on a real CLAUDE.md file. The top of Anthropic’s range is where most Claude Code con­tent ac­tu­ally sits, not the mid­dle.Same sticker price. Same quota. More to­kens per prompt. Your Max win­dow burns through faster. Your cached pre­fix costs more per turn. Your rate limit hits sooner.So Anthropic must be trad­ing this for some­thing. What? And is it worth it?I ran two ex­per­i­ments. The first mea­sured the cost. The sec­ond mea­sured what Anthropic claimed you’d get back. Here’s where it nets out.What does it cost?To mea­sure the cost, I used POST /v1/messages/count_tokens — Anthropic’s free, no-in­fer­ence to­ken counter. Same con­tent, both mod­els, one num­ber each per model. The dif­fer­ence is purely the to­k­enizer.First: seven sam­ples of real con­tent a Claude Code user ac­tu­ally sends — a CLAUDE.md file, a user prompt, a blog post, a git log, ter­mi­nal out­put, a stack trace, a code diff. Second: twelve syn­thetic sam­ples span­ning con­tent types — English prose, code, struc­tured data, CJK, emoji, math sym­bols — to see how the ra­tio varies by kind.The core loop is three lines of Python:Seven sam­ples pulled from real files a Claude Code user ac­tu­ally sends:Weighted ra­tio across all seven: 1.325x (8,254 → 10,937 to­kens).What changed in the to­k­eniz­erThree pat­terns in the data:CJK, emoji, and sym­bol con­tent moved 1.005–1.07x. A whole­sale new vo­cab­u­lary would shift these more uni­formly. That did­n’t hap­pen. Consistent with the non-Latin por­tions of the vo­cab­u­lary chang­ing less than the Latin. Token counts don’t prove which spe­cific slots were pre­served.Eng­lish and code moved 1.20–1.47x on nat­ural con­tent. Consistent with 4.7 us­ing shorter or fewer sub-word merges for com­mon English and code pat­terns than 4.6 did.Code is hit harder than unique prose (1.29–1.39x vs 1.20x). Code has more re­peated high-fre­quency strings — key­words, im­ports, iden­ti­fiers — ex­actly the pat­terns a Byte-Pair Encoding trained on code would col­lapse into long merges.Chars-per-to­ken on English dropped from 4.33 to 3.60. TypeScript dropped from 3.66 to 2.69. The vo­cab­u­lary is rep­re­sent­ing the same text in smaller pieces.That’s a hy­poth­e­sis, not a proof. Counting to­kens does­n’t tell you which spe­cific en­tries in Anthropic’s pro­pri­etary vo­cab­u­lary changed.60-min video les­son + CLAUDE.md starter kit. Yours when you sub­scribe.Why ship a to­k­enizer that uses more to­ken­sAn­throp­ic’s mi­gra­tion guide: more lit­eral in­struc­tion fol­low­ing, par­tic­u­larly at lower ef­fort lev­els. The model will not silently gen­er­al­ize an in­struc­tion from one item to an­other.“Smaller to­kens force at­ten­tion over in­di­vid­ual words. That’s a doc­u­mented mech­a­nism for tighter in­struc­tion fol­low­ing, char­ac­ter-level tasks, and tool-call pre­ci­sion. Partner re­ports (Notion, Warp, Factory) de­scribe fewer tool er­rors on long runs.The to­k­enizer is one plau­si­ble con­trib­u­tor. Weights and post-train­ing also changed. Token counts can’t sep­a­rate them.Does 4.7 ac­tu­ally fol­low in­struc­tions bet­ter?That’s the cost, mea­sured. Now the ques­tion: what did Anthropic trade for it?Their pitch is more lit­eral in­struc­tion fol­low­ing.” Plausible, but the to­ken-count data does­n’t prove it. I ran a di­rect test.IFE­val (Zhou et al., Google, 2023) is a bench­mark of prompts with ver­i­fi­able con­straints. Respond in ex­actly N words.” Include the word X twice.” No com­mas.” All up­per­case.” Each con­straint has a Python grader. Binary pass/​fail.IFE­val ships 541 prompts. I sam­pled 20 with a fixed seed, ran each through both mod­els, and graded with IFEval’s pub­lished checker.A small but di­rec­tion­ally con­sis­tent im­prove­ment on strict in­struc­tion fol­low­ing. Loose eval­u­a­tion is flat. Both mod­els al­ready fol­low the high-level in­struc­tions — the strict-mode gap comes down to 4.6 oc­ca­sion­ally mis­han­dling ex­act for­mat­ting where 4.7 does­n’t.Only one in­struc­tion type moved ma­te­ri­ally: change_­case:eng­lish_­cap­i­tal (0/1 → 1/1). Everything else tied. The one prompt that ac­tu­ally sep­a­rated the mod­els was a four-con­straint chain where 4.6 fum­bled one and 4.7 got all four.N=20. IFEval has 541 prompts. A 20-prompt sam­ple is enough to see di­rec­tion, not enough to be con­fi­dent about size. A +5pp delta at N=20 is con­sis­tent with any­thing from no real dif­fer­ence” to real +10pp im­prove­ment.“This mea­sures the net ef­fect of 4.6 → 4.7. Tokenizer, weights, and post-train­ing all changed. I can’t iso­late which one drove the +5pp. The causal link be­tween smaller to­kens” and better in­struc­tion fol­low­ing” re­mains a hy­poth­e­sis.Sin­gle gen­er­a­tion per prompt. Multiple runs per prompt would tighten the es­ti­mate.So: 4.7 fol­lows strict in­struc­tions a few points bet­ter than 4.6 on this sub­set. Small ef­fect, small sam­ple. Not the dramatic im­prove­ment” fram­ing Anthropic’s part­ners used in launch quotes — at least not on this bench­mark.The ex­tra to­kens bought some­thing mea­sur­able. +5pp on strict in­struc­tion-fol­low­ing. Small. Real. So: is that worth 1.3–1.45x more to­kens per prompt? Here’s the cost, ses­sion by ses­sion.Imag­ine a long Claude Code ses­sion — 80 turns of back-and-forth on a bug fix or refac­tor.The setup (what’s in your con­text each turn):One thing to ex­plain up­front: the av­er­age cached pre­fix across the 80 turns is ~86K to­kens, not 6K. The sta­tic 6K is tiny; the av­er­age his­tory across all turns (0 at turn 1, 160K at turn 80, av­er­age ~80K) dom­i­nates. Since most of the cache-read cost hap­pens in late turns where the his­tory is huge, that ~86K av­er­age is what ac­tu­ally gets billed per turn.Every to­ken in the pre­fix scales by its con­tent ra­tio:Con­ver­sa­tion his­tory (mostly English and code): 1.325x → 160K be­comes 212K by turn 80, av­er­ag­ing ~106K across the ses­sion­Aver­age cached pre­fix on 4.7: ~115K to­kens (up from 86K). Output to­kens are a wild­card — roughly the same as 4.6, up to ~30% higher if Claude Code’s new xhigh de­fault pro­duces more think­ing to­kens.The per-to­ken price did­n’t change. The per-ses­sion cost did, be­cause the same ses­sion packs more to­kens.For Max-plan users hit­ting rate lim­its in­stead of dol­lars: your 5-hour win­dow ends sooner by roughly the same ra­tio on English-heavy work. A ses­sion that ran the full win­dow on 4.6 prob­a­bly does­n’t on 4.7.How this hits the prompt cache­P­rompt caching is the ar­chi­tec­ture Claude Code runs on. The 4.7 to­k­enizer change in­ter­acts with caching in three ways:First 4.7 ses­sion starts cold. Anthropic’s prompt cache is par­ti­tioned per model — switch­ing from 4.6 to 4.7 in­val­i­dates every cached pre­fix, the same way switch­ing be­tween Opus and Sonnet does. The to­k­enizer change does­n’t cause this, but it makes the cold-start more ex­pen­sive: the pre­fix you’re writ­ing to the new cache is 1.3–1.45x larger than the 4.6 equiv­a­lent.Cache vol­ume grows by the to­ken ra­tio. 1.445x more to­kens in the CLAUDE.md por­tion means 1.445x more to­kens pay­ing cache-write once, and 1.445x more pay­ing cache-read every turn af­ter. The mech­a­nism still works. There’s just more of it to pay for.Same tran­script, dif­fer­ent count. Re-run a 4.6 ses­sion on 4.7 and your logs show a dif­fer­ent num­ber. If you base­line billing or ob­serv­abil­ity off his­tor­i­cal to­ken counts, ex­pect a step-change the day you flip the model ID.“Input is mostly cache reads. The per-to­ken cost barely changed.“Le­git­i­mate. In a ses­sion that stays within the 5-minute TTL, 96% of in­put is cache reads at $0.50/MTok — al­ready 90% off nom­i­nal. A 1.325x ra­tio on the cached por­tion is a smaller dol­lar im­pact than on fresh in­put.But Max plans count all to­kens to­ward rate lim­its, not dol­lars. And sev­eral pat­terns hit un­cached ter­ri­tory: first ses­sion af­ter a TTL ex­piry, every cache-bust event (CLAUDE.md ed­its, tool-list changes, model switches), and every com­paction event that rewrites the pre­fix. On those turns you pay the full ra­tio on the cache-write. The steady-state is a bright spot. The edges got nois­ier.Agreed. The real-world weighted ra­tio (1.325x) lands near the top of their range. Individual file types ex­ceed it — CLAUDE.md at 1.445x, tech­ni­cal docs at 1.473x. That’s the use­ful find­ing: the top of the doc­u­mented range is where most Claude Code con­tent sits, not the mid­dle. Plan around the up­per range, not the av­er­age.So: to­kens are 1.3–1.45x more ex­pen­sive on English and code. Anthropic bought you +5pp on strict in­struc­tion fol­low­ing. The sticker price did­n’t change. The ef­fec­tive per-ses­sion cost did.Is it worth it? That de­pends on what you send. You’re pay­ing ~20–30% more per ses­sion for a small but real im­prove­ment in how lit­er­ally the model fol­lows your prompt.

starter kit. Yours when you sub­scribe.

...

Read the original on www.claudecodecamp.com »

2 383 shares, 16 trendiness

smol-machines/smolvm: Tool to build & run portable, lightweight, self-contained virtual machines.

Ship and run soft­ware with iso­la­tion by de­fault.

This is a CLI tool that lets you:

Pack a state­ful vir­tual ma­chine into a sin­gle file (.smolmachine) to re­hy­drate on any sup­ported plat­form.

# in­stall (macOS + Linux)

curl -sSL https://​smol­ma­chines.com/​in­stall.sh | bash

# for cod­ing agents — in­stall + dis­cover all com­mands

curl -sSL https://​smol­ma­chines.com/​in­stall.sh | bash && smolvm –help

# run a com­mand in an ephemeral VM (cleaned up af­ter exit)

smolvm ma­chine run –net –image alpine — sh -c echo Hello world from a mi­croVM’ && un­ame -a”

# in­ter­ac­tive shell

smolvm ma­chine run –net -it –image alpine — /bin/sh

# in­side the VM: apk add sl && sl && exit

Sandbox un­trusted code — run un­trusted pro­grams in a hard­ware-iso­lated VM. Host filesys­tem, net­work, and cre­den­tials are sep­a­rated by a hy­per­vi­sor bound­ary.

# net­work is off by de­fault — un­trusted code can’t phone home

smolvm ma­chine run –image alpine — ping -c 1 1.1.1.1

# fails — no net­work ac­cess

# lock down egress — only al­low spe­cific hosts

smolvm ma­chine run –net –image alpine –allow-host reg­istry.npmjs.org — wget -q -O /dev/null https://​reg­istry.npmjs.org

# works — al­lowed host

smolvm ma­chine run –net –image alpine –allow-host reg­istry.npmjs.org — wget -q -O /dev/null https://​google.com

# fails — not in al­low list

Pack into portable ex­e­cuta­bles — turn any work­load into a self-con­tained bi­nary. All de­pen­den­cies are pre-baked — no in­stall step, no run­time down­loads, boots in

smolvm pack cre­ate –image python:3.12-alpine -o ./python312

./python312 run — python3 –version

# Python 3.12.x — iso­lated, no pyenv/​venv/​conda needed

smolvm ma­chine cre­ate –net myvm

smolvm ma­chine start –name myvm

smolvm ma­chine exec –name myvm — apk add sl

smolvm ma­chine exec –name myvm -it — /bin/sh

# in­side: sl, ls, un­ame -a — type exit’ to leave

smolvm ma­chine stop –name myvm

Use git and SSH with­out ex­pos­ing keys — for­ward your host SSH agent into the VM. Private keys never en­ter the guest — the hy­per­vi­sor en­forces this. Requires an SSH agent run­ning on your host (ssh-add -l to check).

smolvm ma­chine run –ssh-agent –net –image alpine — sh -c apk add -q openssh-client && ssh-add -l”

# lists your host keys, but they can’t be ex­tracted from in­side the VM

smolvm ma­chine exec –name myvm — git clone git@github.com:org/private-repo.git

im­age = python:3.12-alpine”

net = true

[network]

al­low_hosts = [“api.stripe.com”, db.example.com”]

[dev]

init = [“pip in­stall -r re­quire­ments.txt”]

vol­umes = [”./src:/app”]

[auth]

ssh_a­gent = true

smolvm ma­chine cre­ate myvm -s Smolfile

smolvm ma­chine start –name myvm

Each work­load gets real hard­ware iso­la­tion — its own ker­nel on Hypervisor.framework (macOS) or KVM (Linux). libkrun VMM with cus­tom ker­nel: libkrunfw. Pack it into a .smolmachine and it runs any­where the host ar­chi­tec­ture matches, with zero de­pen­den­cies.

Images use the OCI for­mat — the same open stan­dard Docker uses. Any im­age on Docker Hub, ghcr.io, or other OCI reg­istries can be pulled and booted as a mi­croVM. No Docker dae­mon re­quired.

Defaults: 4 vC­PUs, 8 GiB RAM. Memory is elas­tic via vir­tio bal­loon — the host only com­mits what the guest ac­tu­ally uses and re­claims the rest au­to­mat­i­cally. vCPU threads sleep in the hy­per­vi­sor when idle, so over-pro­vi­sion­ing has near-zero cost. Override with –cpus and –mem.

* Network is opt-in (–net on ma­chine cre­ate). TCP/UDP only, no ICMP.

* ma­cOS: bi­nary must be signed with Hypervisor.framework en­ti­tle­ments.

* –ssh-agent re­quires an SSH agent run­ning on the host (SSH_AUTH_SOCK must be set).

...

Read the original on github.com »

3 297 shares, 12 trendiness

NASA Force

Skip to main con­tent

An of­fi­cial web­site of the United States gov­ern­ment

NASA Force is a new hir­ing ini­tia­tive—de­vel­oped in part­ner­ship with the U. S. Office of Personnel Management—designed to bring ex­cep­tional tech­ni­cal tal­ent into mis­sion-crit­i­cal roles that sup­port NASAs ex­plo­ration, re­search, and ad­vanced tech­nol­ogy pri­or­i­ties. Highly skilled early- to mid- ca­reer en­gi­neers, tech­nol­o­gists, and in­no­va­tors join NASA for fo­cused term ap­point­ments, typ­i­cally 1–2 years with the pos­si­bil­ity of ex­ten­sion, to solve com­plex chal­lenges and help main­tain U.S. lead­er­ship in air and space. Through NASA Force, you will con­tribute to mis­sions that ad­vance hu­man space­flight, aero­nau­tics, and sci­en­tific dis­cov­ery while help­ing ex­pand hu­man­i­ty’s un­der­stand­ing of the uni­verse. You will take a sys­tems ap­proach to solv­ing prob­lems, work­ing across teams and dis­ci­plines from con­cept to ex­e­cu­tion. Your work will de­mand tech­ni­cal ex­cel­lence, crit­i­cal think­ing, and con­tin­u­ous learn­ing, and every con­tri­bu­tion will di­rectly sup­port NASAs mis­sion. Work on flight sys­tems, lu­nar in­fra­struc­ture, and ad­vanced tech­nolo­gies that go from con­cept to ex­e­cu­tion and sup­port real mis­sions be­yond Earth.Work on flight sys­tems, lu­nar in­fra­struc­ture, and ad­vanced tech­nolo­gies that go from con­cept to ex­e­cu­tion and sup­port real mis­sions be­yond Earth.Collaborate di­rectly with en­gi­neers, sci­en­tists, and part­ners shap­ing the fu­ture of space, aero­nau­tics, and na­tional ca­pa­bil­ity.Ex­pand your tech­ni­cal depth by solv­ing com­plex, real-world prob­lems where the stan­dard is per­for­mance, not the­ory.Share knowl­edge, men­tor oth­ers, and con­tribute to a cul­ture that com­pounds ca­pa­bil­ity across NASAs work­force. HOW YOU WILL ENTER THE MISSION You will join a col­lab­o­ra­tive, mis­sion-dri­ven team where ideas are val­ued, con­tri­bu­tions are rec­og­nized, and in­no­va­tion is part of every­day work. NASA Force of­fers an op­por­tu­nity to grow across pro­jects and dis­ci­plines, build your ex­per­tise, and take on new chal­lenges while work­ing along­side some of the world’s lead­ing minds. Propulsion sys­tems sup­port across the Commercial Crew Program, Launch Services Program, and Artemis If You Want Your Work to Operate Beyond Earth, This is Where it Begins.

...

Read the original on nasaforce.gov »

4 256 shares, 13 trendiness

I'm Coding by Hand

ai is here. so i’m spend­ing 3 months cod­ing the old wayI de­cided to move to Brooklyn for a cod­ing re­treat. There were some per­sonal rea­sons that brought me back to the US. But rather than head­ing im­me­di­ately back to work, I wanted to take some time to fo­cus on cod­ing things mostly with­out AI — at pre­cisely the time when many suc­cess­ful pro­gram­mers are say­ing pro­gram­ming is a solved prob­lem. Given that I’m now six weeks through this re­treat, I’ll also take some time to ex­plain what I’ve been do­ing in that time. For the past two years, I’ve been build­ing AI agents at Aily Labs in Barcelona along­side some su­per tal­ented en­gi­neers. One of my first pro­jects was build­ing a web search agent we could use in­ter­nally in early 2024… al­most 6 months be­fore Anthropic’s Building Effective AI Agents ar­ti­cle came out and a year be­fore OpenAI’s DeepResearch came out! We were also early on Cursor, early on us­ing LLMs to make knowl­edge graphs, and con­stantly test­ing out new ap­proaches for our use cases. One of my fa­vorite parts of work­ing at Aily was lead­ing a weekly jour­nal club. I chose to pre­sent pa­pers that de­scribed how open source LLMs were built, in­clud­ing DeepSeek R1, Ai2’s Olmo 3, and Meta’s Llama 3 pa­per. All of these helped us un­der­stand the evolv­ing trade­offs be­tween train­ing mod­els in­ter­nally or build­ing work­flows around SOTA closed mod­els. I was al­ready hooked on LLMs since the first time I tried them in 2023, but I found my cu­rios­ity kept bring­ing me back to learn­ing about how they worked and how to ap­ply them.At the same time as I was learn­ing about LLMs and agents, I was also us­ing them to code. I learned that when writ­ing code by hand” I was ac­tu­ally do­ing two things: writ­ing what I wanted and learn­ing the code base. When I used a cod­ing agent how­ever, I would get ex­actly what I spec­i­fied in my prompt, for bet­ter or worse. By this I mean that if I did­n’t know what I wanted ex­actly, cod­ing agents would be happy to make many as­sump­tions for me. This al­most al­ways meant that I did­n’t learn as much, and that I would­n’t have a good grasp of the code­base.At the ex­act same time, cod­ing agents helped me it­er­ate quickly and ship soft­ware that worked well (after some du­ti­ful test­ing, of course). They were also, I found, ex­cel­lent tu­tors. Cal Newport, a com­puter sci­ence pro­fes­sor and writer of Deep Work and other pop­u­lar pro­duc­tiv­ity books, re­cently wrote about this trade­off in a way that res­onated with me. In the ar­ti­cle, he makes an anal­ogy be­tween the re­la­tion­ship of ex­er­cise to health, and the re­la­tion­ship of think­ing to craft: Your writ­ing should be your own. The strain re­quired to craft a clear memo or re­port is the men­tal equiv­a­lent of a gym work­out by an ath­lete; it’s not an an­noy­ance to be elim­i­nated but a key el­e­ment of your craft.I think the same ap­plies to writ­ing code. At Aily, the peo­ple I worked with who were amaz­ing pro­gram­mers were in most cases also amaz­ing users of AI. Their deeper knowl­edge sim­ply gave them more lever­age over this tool. In the day to day of ship­ping agents into pro­duc­tion, I did­n’t stop learn­ing. But I did have a grow­ing list of cod­ing and com­puter con­cepts that I was al­ways too busy to learn about. So when I needed to head back to the US, I re­al­ized it was the per­fect time to fo­cus on this at the Recurse Center.What is a code re­treat any­way? Recurse Center (RC) is a self-di­rected, full-time pro­gram­ming re­treat in Brooklyn. After an ap­pli­ca­tion and a cod­ing in­ter­view, Recursers ar­rive with ideas for what they want to pro­gram, and then spend 6 or 12 weeks pro­gram­ming. One of the high­lights of RC is that it is col­lab­o­ra­tive: you en­ter with a co­hort of other pro­gram­mers, many with decades of ex­pe­ri­ence, and with rad­i­cally dif­fer­ent ex­per­tises. Another high­light: it’s free! Coming into RC, my goals were the fol­low­ing: Train an LLM from scratch. This in­cludes pre- and post-train­ing, and I want to do this mostly from scratch; not just fork a pre­made code­base but write a Transformer my­self. Get bet­ter at writ­ing Python by hand. I’ve been work­ing in Python for a few years now but I know there’s still so much for me to learn. I want to get to the point where I need to ref­er­ence doc­u­men­ta­tion or ask LLMs as lit­tle as pos­si­ble, and have good in­tu­ition for how to set up var­i­ous pro­jects.Un­der­stand com­put­ers bet­ter. Admittedly a broad goal, I know that com­put­ers are ex­tremely com­pli­cated ma­chines that op­er­ate at many lev­els of ab­strac­tion. Given that I never had a for­mal Computer Science ed­u­ca­tion I want to build a bet­ter men­tal model of these lay­ers and how they work to­gether. I don’t have a su­per con­crete plan here, but I think RC will be the per­fect place for this.So how is it go­ing? I’ve done the first as­sign­ment from Stanford’s CS336: Language Modeling from Scratch course, with­out cod­ing help from an LLM. For con­text, it was a 50-page as­sign­ment, but work­ing with an­other Recurser, we wrote an op­ti­mized to­k­enizer in Python, and then built out an up­graded GPT-2 style ar­chi­tec­ture in PyTorch. We ran mul­ti­ple ab­la­tions to tune hy­per­pa­ra­me­ters on the Tiny Stories datasets, and then used those hy­per­pa­ra­me­ters on the ~9 bil­lion to­kens of the OpenWebText dataset.Pa­ra­me­ter sweep of dif­fer­ent learn­ing rates for the 17M pa­ra­me­ter model we wrote by hand; high learn­ing rates lead to in­sta­bil­ity. This was on the Tiny Stories dataset, and took about an hour to train on an A100. My plan is to do the other as­sign­ments in CS336 as well: op­ti­miz­ing our lan­guage model, es­ti­mat­ing and com­put­ing scal­ing laws, con­vert­ing raw text data into pre-train­ing data, and fi­nally post-train­ing a model. I’ve al­ready started the sec­ond as­sign­ment which in­volves pro­fil­ing GPUs and im­ple­ment­ing FlashAttention2 in Triton. There’s a lot to do, but ide­ally I can run through the meat of these as­sign­ments and then post-train my own model.2. Getting Better at Writing Python from ScratchI’ve been writ­ing a lot of small agents and neural net­works in Python or PyTorch to prac­tice. But by far the most help­ful thing was pair pro­gram­ming with peo­ple who have been work­ing in Python for 10+ years, and just watch­ing them work or hav­ing them watch me work. For ex­am­ple, a nice thing I picked up from some­one I pair pro­grammed with: when this guy was writ­ing code and did­n’t quite re­mem­ber the syn­tax or op­er­a­tions, he would of­ten just quickly open up a ter­mi­nal and type a su­per sim­ple ex­am­ple to rapidly it­er­ate. He was usu­ally able to work it out and ver­ify if it worked cor­rectly in less than a minute, and he did­n’t have to google any­thing and comb through search re­sults or ask an LLM. This tech­nique might seem ob­vi­ous to some, but mak­ing this process mus­cle mem­ory has helped me be­come un­stuck much faster. I want to keep mov­ing in this di­rec­tion, do­ing sim­ple pro­jects or even just prob­lems like Advent of Code while pair pro­gram­ming. Working with some­one else live was ini­tially a bit nerve-rack­ing, but pre­cisely be­cause of this I’ve no­ticed a lot of progress. Here are a few ex­am­ples of things I’ve done which I’d clas­sify as help­ing me un­der­stand com­put­ers bet­ter:I wrote the clas­sic pro­gram­ming func­tion fizzbuzz in BASIC on an Apple IIe com­puter from 1983. It was cool see­ing how dif­fer­ently com­put­ers worked back then, for ex­am­ple how man­ual the code edit­ing and ex­e­cu­tion process was, but also how it was ba­si­cally the same. One thing I’ve al­ways felt a bit self-con­scious about are my Unix/terminal skills. So I joined CTF Fridays, a weekly ses­sion de­voted to work­ing through Bandit and other war games.” These are Unix and com­puter se­cu­rity re­lated chal­lenges played through the ter­mi­nal, with the ob­jec­tive of col­lect­ing pass­words and lev­el­ing up. Now I have a pretty good sense for what Claude Code is try­ing to run on my com­puter!One day I hand-coded a sin­gle layer per­cep­tron I saw when flip­ping through an AI text­book… com­pletely in Vim. It was es­pe­cially te­dious at first, but I got some pro tips from an­other Recurser and learned a few short­cuts. This has ac­tu­ally been in­cred­i­bly use­ful now when I’m run­ning train­ing jobs on cloud GPUs and I need to last-minute edit files. I joined a Clojure work­shop given by some­one who has 15+ years of ex­pe­ri­ence us­ing Clojure. The topic it­self was in­ter­est­ing be­cause Clojure is a func­tional pro­gram­ming lan­guage and I don’t have much ex­pe­ri­ence with func­tional lan­guages. The teach­ing method­ol­ogy was also great: af­ter a brief in­tro we did a round of mob pro­gram­ming, where we solved a prob­lem col­lec­tively, go­ing around the table with each per­son get­ting a minute or two to ad­vance the so­lu­tion. The weekly tech­ni­cal pre­sen­ta­tions are great ex­po­sure to an in­cred­i­ble ar­ray of top­ics. These are a set of 5-minute talks, so they are short enough that you don’t get bored but fast enough that you can learn some­thing mean­ing­ful. A sam­ple of ti­tles: Running Rust Code”, GPUs for Dummies”, Typesafe APIs for Type B Personalities”, Some Useless Agents” (this one was mine!), and more. I’ve given two so far: one on sim­ple agent ar­chi­tec­tures, one on scal­ing MCP tools ef­fi­ciently; and will give an­other this week on dif­fer­ent ways to op­ti­mize GPUs. Even just hear­ing from peo­ple about their pro­jects and ca­reers has been in­cred­i­bly valu­able in help­ing me un­der­stand the space of prob­lems com­put­ers can solve.Soon I’ll be ship­ping agents to prod and run­ning evals with a whole new bag of tricks and skills. But for now I’ve got 6 more weeks left at RC, which I’m be­gin­ning to worry is not enough time to fin­ish every­thing on my list. And it won’t be. But that’s what makes RC so great: it’s not as much about cross­ing every­thing off my list but about spend­ing time cod­ing.

...

Read the original on miguelconner.substack.com »

5 255 shares, 14 trendiness

Are the Costs of AI Agents Also Rising Exponentially? — Toby Ord

Are the Costs of AI Agents Also Rising Exponentially?

There is an ex­tremely im­por­tant ques­tion about the near-fu­ture of AI that al­most no-one is ask­ing. We’ve all seen the graphs from METR show­ing that the length of tasks AI agents can per­form has been grow­ing ex­po­nen­tially over the last 7 years. While GPT-2 could only do soft­ware en­gi­neer­ing tasks that would take some­one a few sec­onds, the lat­est mod­els can (50% of the time) do tasks that would take a hu­man a few hours.

As this trend shows no signs of stop­ping, peo­ple have nat­u­rally taken to ex­trap­o­lat­ing it out, to fore­cast when we might ex­pect AI to be able to do tasks that take an en­gi­neer a full work-day; or week; or year. But we are miss­ing a key piece of in­for­ma­tion — the cost of per­form­ing this work. Over those 7 years AI sys­tems have grown ex­po­nen­tially. The size of the mod­els (parameter count) has grown by 4,000x and the num­ber of times they are run in each task (tokens gen­er­ated) has grown by about 100,000x. AI re­searchers have also found mas­sive ef­fi­cien­cies, but it is em­i­nently plau­si­ble that the cost for the peak per­for­mance mea­sured by METR has been grow­ing — and grow­ing ex­po­nen­tially.This might not be so bad. For ex­am­ple, if the best AI agents are able to com­plete tasks that are 3x longer each year and the costs to do so are also in­creas­ing by 3x each year, then the cost to have an AI agent per­form tasks would re­main the same mul­ti­ple of what it costs a hu­man to do those tasks. Or if the costs have a longer dou­bling time than the time-hori­zons, then the AI-systems would be get­ting cheaper com­pared with hu­mans. But what if the costs are grow­ing more quickly than the time hori­zons? In that case, these cut­ting-edge AI sys­tems would be get­ting less cost-com­pet­i­tive with hu­mans over time. If so, the METR time-hori­zon trend could be mis­lead­ing. It would be show­ing how the state of the art is im­prov­ing, but part of this progress would be due to more and more lav­ish ex­pen­di­ture on com­pute so it would be di­verg­ing from what is eco­nom­i­cal. It would be be­com­ing more like the Formula 1 of AI per­for­mance — show­ing what is pos­si­ble, but not what is prac­ti­cal. So in my view, a key ques­tion we need to ask is:        How is the hourly’ cost of AI agents chang­ing over time?By hourly’ cost I mean the fi­nan­cial cost of us­ing an LLM to com­plete a task right at the mod­el’s 50% time hori­zon di­vided by the length of that time hori­zon. So as with the METR time hori­zons them­selves, the du­ra­tions are mea­sured not by how long it takes the model, but how long it typ­i­cally takes hu­mans to do that task. For ex­am­ple, Claude 4.1 Opus’s 50% time hori­zon is 2 hours: it can suc­ceed in 50% of tasks that take hu­man soft­ware en­gi­neers 2 hours. So we can look at how much it costs for it to per­form such a task and di­vide by 2, to find its hourly rate for this work. I’ve found that very few peo­ple are ask­ing this ques­tion. And when I ask peo­ple what they think is hap­pen­ing to these costs over time, their opin­ions vary wildly. Some as­sume the to­tal cost of a task is stay­ing the same, even as the task length in­creases ex­po­nen­tially. That would im­ply an ex­po­nen­tially de­clin­ing hourly rate. Others as­sume the to­tal cost is also grow­ing ex­po­nen­tially — af­ter all, we’ve seen dra­matic in­creases in the costs to ac­cess cut­ting-edge mod­els. And most peo­ple (myself in­cluded) had lit­tle idea of how much it cur­rently costs for AI agents to do an hour’s soft­ware en­gi­neer­ing work. Are we talk­ing cents? Dollars? Hundreds of dol­lars? An AI agent can’t cost more per hour than a hu­man to com­plete these tasks can it? Can it?A cou­ple of months ago I asked METR if they could share the cost data for their bench­mark­ing. I fig­ured it would be easy — just take the cost of run­ning their bench­mark for each model, plot it against re­lease date and see how it is grow­ing. Or plot the cost of each model vs its time hori­zon and see the re­la­tion­ship.But they help­fully pointed out that it is­n’t so easy at all. Their head­line time-hori­zon num­bers are meant to show the best pos­si­ble per­for­mance that can be at­tained with a model (regardless of cost). So they run their mod­els in­side an agent scaf­fold un­til the per­for­mance has plateaued. Since they re­ally want to make sure it has plateaued, they use a lot of com­pute on this and don’t worry too much about whether they’ve used too much. After all, if you are just try­ing to find the even­tual height of a plateau, there is no prob­lem in go­ing far into the flat part of the graph. But if you are try­ing to find out when the plateau be­gins, there is a prob­lem with this strat­egy. Their to­tal spend for each model is some­times just enough to get onto the plateau and some­times many times more than is needed. So to­tal spend can’t be used as di­rect es­ti­mate of the costs of achiev­ing that per­for­mance. Fortunately, they re­leased a chart that can be used to shed some light on the key ques­tion of how hourly costs of LLM agents are chang­ing over time:

This chart (from METRs page for GPT-5) shows how per­for­mance in­creases with cost. The cost in ques­tion is the cost of us­ing more and more to­kens to com­plete the task (and thus more and more com­pute).The yel­low curve is the best hu­man per­for­mance for each task. It steadily marches on­wards and up­wards, trans­form­ing more wages into longer tasks. Since it is hu­man per­for­mance that is used to de­fine the ver­ti­cal axis for METRs time hori­zon work, it is­n’t sur­pris­ing that this curve is fairly lin­ear — it costs about 8 times as much to get a hu­man soft­ware en­gi­neer to per­form an 8-hour task as a 1-hour task.The other colours are the curves for a se­lec­tion of LLM-based agents. Unlike the hu­mans, they all show di­min­ish­ing re­turns, with the time hori­zon each one can achieve even­tu­ally stalling out and plateau­ing as more and more com­pute is added. The short upticks at the end of some of these curves are an arte­fact of some mod­els not be­ing pre­pared to give an an­swer un­til the last avail­able mo­ment. This sug­gests that the model must have been still mak­ing progress dur­ing the ap­par­ent flat­line be­fore the uptick (just not show­ing it). Indeed, this chart was orig­i­nally dis­played on METRs page for GPT-5 to show that they may have stopped its run be­fore it’s per­for­mance had truly plateaued. These upticks do make analy­sis harder and hope­fully fu­ture ver­sions of this chart will be able to avoid these glitches.So what can this chart tell us about our key ques­tion con­cern­ing the hourly cost of AI agents?To tease out the lessons that lie hid­den in the chart, we’ll need to add a num­ber of an­no­ta­tions. The first step is to add lines of con­stant hourly cost. On a log-log plot like this, every con­stant hourly cost will be a straight line with slope 1. Lower hourly costs will ap­pear as lines that are lo­cated fur­ther to the left.

For each curve I’ve added a line of con­stant hourly cost that just grazes it. That is the cheap­est hourly cost the model achieves. We can call the point where the line touches the curve the sweet spot for that model. Before a mod­el’s sweet spot, its time hori­zon is grow­ing su­per-lin­early in cost — it is get­ting in­creas­ing mar­ginal re­turns. The sweet spot is ex­actly the point at which di­min­ish­ing mar­ginal re­turns set in (which would cor­re­spond to the point of in­flec­tion if this was re­plot­ted on lin­ear axes). It is thus a key point on any mod­el’s per­for­mance curve.We can see that the hu­man soft­ware en­gi­neer is at best \$120 per hour, while the sweet spots for the AI agents range from \$40 per hour for o3, all the way down to 40 cents per hour for Grok 4 and Sonnet 3.5. That’s quite a range of costs. While dif­fer­ences in hori­zon length be­tween these mod­els vary by about a fac­tor of 15 (judged at ei­ther the end-points or at the sweet-spots) their sweet-spot costs vary by a fac­tor of 100.And these are the best hourly rates for these mod­els. On many task lengths (including those near their plateau) they cost 10 to 100 times as much per hour. For in­stance, Grok 4 is at \$0.40 per hour at its sweet spot, but \$13 per hour at the start of its fi­nal plateau. GPT-5 is about \$13 per hour for tasks that take about 45 min­utes, but \$120 per hour for tasks that take 2 hours. And o3 ac­tu­ally costs \$350 per hour (more than the hu­man price) to achieve tasks at its full 1.5 hour task hori­zon. This is a lot of money to pay for an agent that fails at the task you’ve just paid for 50% of the time — es­pe­cially in cases where fail­ure is much worse than not hav­ing tried at all.How­ever, I do want to note that I’m a bit puz­zled by how much higher the costs are here for the rea­son­ing mod­els from OpenAI com­pared to mod­els from Anthropic and xAI. The METR page sug­gests that the price data for those mod­els was still an es­ti­mate at that point (based on o1 costs), so I would­n’t be sur­prised if these curves should re­ally be shifted some­what to the left, mak­ing them sev­eral times cheaper. We there­fore should­n’t lean too heav­ily on the fact that they cost as much or more than hu­man labour at their full time-hori­zon.As well as the sweet spot, ide­ally we could add a sat­u­ra­tion point for each curve — a point to rep­re­sent the lo­ca­tion where the plateau be­gins. We can’t sim­ply use the end of the curve since some have run longer into the plateau than oth­ers. What I’ll do is find the point where the slope has di­min­ished to 1/10th that of the sweet spot. This is the point at which it re­quires a 10% in­crease in cost just to in­crease the time hori­zon by 1%. Or equiv­a­lently, the time hori­zon is only grow­ing as the 1/10th power of com­pute. Of course the num­ber 1/10 is some­what ar­bi­trary, but un­like for the sweet spot, any de­f­i­n­i­tion of a sat­u­ra­tion point will be ar­bi­trary to some de­gree. As you can see be­low, this de­f­i­n­i­tion of sat­u­ra­tion point does roughly cor­re­spond with the in­tu­itive lo­ca­tion, though it is still not quite clear how best to deal with the fi­nal upticks.

Armed with our sweet spots and sat­u­ra­tion points, we can start to tease out the re­la­tion­ship be­tween time hori­zon and cost.

We can see that there is a weak, but clear, pos­i­tive cor­re­la­tion be­tween task du­ra­tion and cost in this dataset. Moreover, we see that higher task du­ra­tions (at the sweet spot) are as­so­ci­ated with higher hourly costs (and re­call that these hourly costs at the sweet spot are the best hourly cost achiev­able with that model).What about if we in­stead look at the mod­els’ sat­u­ra­tion points, which are a lit­tle ar­bi­trary in their de­f­i­n­i­tion, but closer to what METR is mea­sur­ing in their head­line re­sults about time hori­zons:

Again, there is a cor­re­la­tion be­tween time hori­zon and cost, and again the hourly costs seem to be in­creas­ing with time hori­zon too. Indeed it sug­gests we are near­ing the point where the mod­els’ peak per­for­mance comes at an im­prac­ti­cally high cost. If this re­la­tion­ship were to con­tinue, then fore­cast­ing when cer­tain time hori­zons will be avail­able from the head­line METR trend will be mis­lead­ing, as the mod­els would be im­prac­ti­cally ex­pen­sive when they first reach those ca­pa­bil­i­ties. We would need to wait some ad­di­tional pe­riod of time for them to come down suf­fi­ciently in cost.That said, there are some sig­nif­i­cant lim­i­ta­tions to the analy­sis above. Ideally one would want to:in­clude curves for a larger and more rep­re­sen­ta­tive set of mod­els­find a way of ad­dress­ing the uptick prob­lem­check if there is an is­sue with the costs of the OpenAI mod­els­For­tu­nately, it should be fairly easy for METR to per­form such analy­sis, and I hope they will fol­low up on this.Too few peo­ple are ask­ing about how the costs of AI agents are grow­ingThe key ques­tion is How is the hourly’ cost of LLM agents chang­ing over time?We can use METRs chart to shed some light on this.We need to add lines of con­stant hourly cost, sweet spots, and sat­u­ra­tion points.This pro­vides mod­er­ate ev­i­dence that:the costs to achieve the time hori­zons are grow­ing ex­po­nen­tially,even the hourly costs are ris­ing ex­po­nen­tially,the hourly costs for some mod­els are now close to hu­man costs.Thus, there is ev­i­dence that:the METR trend is partly dri­ven by un­sus­tain­ably in­creas­ing in­fer­ence com­puteth­ere will be a di­ver­gence be­tween what time hori­zon is pos­si­ble in-prin­ci­ple and what is eco­nom­i­cally fea­si­ble­real-world ap­pli­ca­tions of AI agents will lag be­hind the METR time-hori­zon trend by in­creas­ingly large amountsMETR has a sim­i­lar graph on their page for GPT-5.1 codex. It in­cludes more mod­els and com­pares them by to­ken counts rather than dol­lar costs:

the cor­re­la­tion be­tween time hori­zon and cost holds for these other mod­els toore­a­son­ing mod­els with more RL post-train­ing don’t al­ways dom­i­nate their pre­de­ces­sors (e.g. o1 is bet­ter at small to­ken bud­gets than o3 or GPT-5)the hor­i­zon­tal gap be­tween the OpenAI rea­son­ing mod­els and the rest is smaller, sup­port­ing the idea that their costs were a bit high in the main chart

$\setCounter{0}$

February 04, 2026

Hazard Rates for AI Agents Decline as a Task Goes On

...

Read the original on www.tobyord.com »

6 252 shares, 5 trendiness

Israel escalates attacks on medics in Lebanon with deadly ‘quadruple tap’

When they re­ceived the call to re­spond to an Israeli airstrike in the city of Mayfadoun, in south­ern Lebanon, most of the para­medics held back, hav­ing pre­vi­ously seen col­leagues killed by dou­ble-tap at­tacks tar­get­ing res­cuers. But the medics from the Islamic Health Association (IHA) rushed to the scene.

By the time the other emer­gency work­ers ar­rived at the site, they found the IHA medics had in­deed been caught in a sec­ond strike. They started evac­u­at­ing their wounded col­leagues, only for their am­bu­lances to be hit in two fur­ther at­tacks.

One of the para­medics cov­ered his ears and screamed, con­vuls­ing in pain as shrap­nel shat­tered the back win­dow of the am­bu­lance.

The res­cue mis­sion on Wednesday af­ter­noon had turned into a night­mare as Israel car­ried out three con­sec­u­tive strikes on three sets of am­bu­lances and med­ical work­ers.

In to­tal, the at­tacks killed four medics and wounded six more, from three dif­fer­ent am­bu­lance corps, ac­cord­ing to med­ical sources. Three of the medics were from the Hezbollah-affiliated IHA and Amal-affiliated med­ical corps, while one was from the Nabatieh emer­gency ser­vices or­gan­i­sa­tion. Under in­ter­na­tional law, all medics are pro­tected and are con­sid­ered non-com­bat­ants, re­gard­less of po­lit­i­cal af­fil­i­a­tion.

Rescuers in Lebanon have long been wary of the dou­ble-tap at­tack, when Israeli forces tar­get a lo­ca­tion, wait un­til peo­ple gather to help sur­vivors, and then strike again. Wednesday’s three-wave at­tack af­ter the ini­tial one prompted the coin­ing of a fear­some new term: the quadru­ple tap.

In a video taken by one of the para­medics at the site, res­cuers are seen load­ing two wounded peo­ple into their am­bu­lances when a bomb lands next to their ve­hi­cle. Paramedics rush to ex­tract the dri­ver, who is mo­tion­less and limp as they pull him from the am­bu­lance, which is splashed with blood. Oh God, oh God,” the man film­ing can be heard say­ing. They carry two more blood-cov­ered medics out of their ve­hi­cle and on to stretch­ers.

Among the para­medics killed was Fadel Sarhan, 43, who is sur­vived by his eight-year-old daugh­ter.

Fadel was a very loved per­son. He had a bold per­son­al­ity, but at the same time, he was emo­tional. He was well liked and re­spon­si­ble,” said Ali Nasr al-Deen, the head of the Mayfadoun civil de­fence cen­tre who grew up with Sarhan.

He used to feed the cats and dogs. He would bring pet food from Beirut so they would­n’t go hun­gry. He was that kind of per­son, car­ing and at­ten­tive. It’s a huge loss for us,” said Nasr al-Deen.

Medics mourned their col­leagues on Thursday at fu­ner­als in Nabatieh, a city near Mayfadoun. Such events have be­come in­creas­ingly com­mon, with health­care work­ers killed by Israeli bomb­ings on a near daily ba­sis.

Mohammed Suleiman, whose 16-year-old son, Joud, was killed while on duty as a para­medic by an Israeli strike weeks ear­lier, joined his peers in bury­ing an­other of his friends on Thursday. A few hours af­ter the fu­ner­als, Israel car­ried out an­other wave of airstrikes on Nabatieh.

Israel has so far killed 91 health­care work­ers and wounded 214 more in Lebanon since the Israel-Hezbollah war started on 2 March. It has given lit­tle jus­ti­fi­ca­tion for its re­peated at­tacks on med­ical in­fra­struc­ture and work­ers, apart from ac­cus­ing Hezbollah of us­ing am­bu­lances and hos­pi­tals to trans­port fight­ers and weapons, with­out pro­vid­ing ev­i­dence for the claim.

The Lebanese min­istry of health ac­cused Israel of de­lib­er­ately tar­get­ing am­bu­lance crews. Paramedics have be­come di­rect tar­gets, pur­sued re­lent­lessly in a bla­tant vi­o­la­tion that con­firms a to­tal dis­re­gard for all norms and prin­ci­ples es­tab­lished by in­ter­na­tional hu­man­i­tar­ian law,” the min­istry said in a state­ment.

The Israeli mil­i­tary did not im­me­di­ately re­spond to a re­quest for com­ment.

In the video taken of the quadru­ple tap on Wednesday, the frame was frozen on the in­te­rior of the am­bu­lances, as the Nabatieh emer­gency ser­vices high­lighted that the ve­hi­cle clearly con­tained no weapons.

A few hours af­ter Israel hit the am­bu­lances out­side Nabatieh, it bombed the vicin­ity of the gov­ern­men­tal hos­pi­tal in Tebnine, south Lebanon. It was the sec­ond time in two days that Israeli bomb­ings dam­aged the health­care fa­cil­ity, which is the only re­main­ing pub­lic hos­pi­tal in the area. The strikes in­jured 11 hos­pi­tal work­ers and dam­ag­ing the emer­gency de­part­ment, ac­cord­ing to the World Health Organization (WHO).

A video of Tebnine hos­pi­tal from 14 April showed work­ers try­ing to clear shat­tered con­crete and de­bris from the emer­gency de­part­ment af­ter a strike blew in the win­dows.

Commenting on the strike in Tebnine, the head of the WHO, Tedros Adhanom Ghebreyesus, said: I re­it­er­ate the call for the im­me­di­ate pro­tec­tion of health­care fa­cil­i­ties, health work­ers, am­bu­lances and pa­tients. There must be safe, sus­tained and un­hin­dered hu­man­i­tar­ian ac­cess across Lebanon.”

An am­bu­lance in Tebnine was also struck on Thursday, lead­ing to the crit­i­cal in­jury of two medics, ac­cord­ing to the Lebanese min­istry of health. As health­care work­ers watched their col­leagues and friends be­ing killed by Israel, the men­tal toll was be­com­ing al­most too much to bear.

We have to go to places to res­cue peo­ple, but then we get dou­ble tapped,” said Abbas Atwi, the head of the IHAs emer­gency de­part­ment in Nabatieh, shortly af­ter a med­ical cen­tre was tar­geted in March, killing his friends and col­leagues. But we will stay and keep go­ing, we will not leave.”

...

Read the original on www.theguardian.com »

7 236 shares, 12 trendiness

Even "cat readme.txt" is not safe

In a pre­vi­ous post about AI-discovered bugs in Vim and Emacs, we looked at how seem­ingly harm­less work­flows could cross a sur­pris­ing line into code ex­e­cu­tion. This time we wanted to push that idea even fur­ther: is cat readme.txt safe?

It turns out that it is NOT, if you use iTerm2.

That looks in­sane un­til you un­der­stand what iTerm2 is try­ing to do for a le­git­i­mate fea­ture, how it uses the PTY, and what hap­pens when ter­mi­nal out­put is able to im­per­son­ate one side of that fea­ture’s pro­to­col.

We’d like to ac­knowl­edge OpenAI for part­ner­ing with us on this pro­ject.

iTerm2 has an SSH in­te­gra­tion fea­ture that gives it a richer un­der­stand­ing of re­mote ses­sions. To make that work, it does not just blindly type com­mands” into a re­mote shell. Instead, it boot­straps a tiny helper script on the re­mote side called the con­duc­tor.

iTerm2 sends a re­mote boot­strap script, the con­duc­tor, over the ex­ist­ing SSH ses­sion. That re­mote script be­comes the pro­to­col peer for iTerm2.iTerm2 and the re­mote con­duc­tor ex­change ter­mi­nal es­cape se­quences to co­or­di­nate things like:

The im­por­tant point is that there is no sep­a­rate net­work ser­vice. The con­duc­tor is just a script run­ning in­side the re­mote shell ses­sion, and the pro­to­col is car­ried over nor­mal ter­mi­nal I/O.

A ter­mi­nal used to be a real hard­ware de­vice: a key­board and screen con­nected to a ma­chine, with pro­grams read­ing in­put from that de­vice and writ­ing out­put back to it.

A ter­mi­nal em­u­la­tor like iTerm2 is the mod­ern soft­ware ver­sion of that hard­ware ter­mi­nal. It draws the screen, ac­cepts key­board in­put, and in­ter­prets ter­mi­nal con­trol se­quences.

But the shell and other com­mand-line pro­grams still ex­pect to talk to some­thing that looks like a real ter­mi­nal de­vice. That is why the OS pro­vides a PTY, or pseudoter­mi­nal. A PTY is the soft­ware stand-in for the old hard­ware ter­mi­nal, and it sits be­tween the ter­mi­nal em­u­la­tor and the fore­ground process.

* ssh for­wards those bytes to the re­mote ma­chine

* the re­mote con­duc­tor reads them from its stdin

So when iTerm2 wants to send a com­mand to the re­mote con­duc­tor,” what it ac­tu­ally does lo­cally is write bytes to the PTY.

The SSH in­te­gra­tion pro­to­col uses ter­mi­nal es­cape se­quences as its trans­port.

* DCS 2000p is used to hook the SSH con­duc­tor

* OSC 135 is used for pre-framer con­duc­tor mes­sages

At source level, DCS 2000p causes iTerm2 to in­stan­ti­ate a con­duc­tor parser. Then the parser ac­cepts OSC 135 mes­sages like:

So a le­git­i­mate re­mote con­duc­tor can talk back to iTerm2 en­tirely through ter­mi­nal out­put.

The bug is a trust fail­ure. iTerm2 ac­cepts the SSH con­duc­tor pro­to­col from ter­mi­nal out­put that is not ac­tu­ally com­ing from a trusted, real con­duc­tor ses­sion. In other words, un­trusted ter­mi­nal out­put can im­per­son­ate the re­mote con­duc­tor.

That means a ma­li­cious file, server re­sponse, ban­ner, or MOTD can print:

and iTerm2 will start act­ing like it is in the mid­dle of a real SSH in­te­gra­tion ex­change. That is the ex­ploit prim­i­tive.

iTerm2 ren­ders the file, but the file is not just text. It con­tains:

Once the hook is ac­cepted, iTerm2 starts its nor­mal con­duc­tor work­flow. In up­stream source, Conductor.start() im­me­di­ately sends get­shell(), and af­ter that suc­ceeds it sends python­ver­sion().

So the ex­ploit does not need to in­ject those re­quests. iTerm2 is­sues them it­self, and the ma­li­cious out­put only has to im­per­son­ate the replies.

The fake OSC 135 mes­sages are min­i­mal but pre­cise.

They do this:

Return lines that look like shell-dis­cov­ery out­put

This is enough to push iTerm2 down its nor­mal fall­back path. At that point, iTerm2 be­lieves it has com­pleted enough of the SSH in­te­gra­tion work­flow to move on to the next step: build­ing and send­ing a run(…) com­mand.

The forged DCS 2000p hook con­tains sev­eral fields, in­clud­ing at­tacker-con­trolled sshargs.

That value mat­ters be­cause iTerm2 later uses it as com­mand ma­te­r­ial when it con­structs the con­duc­tor’s run … re­quest.

The ex­ploit chooses sshargs so that when iTerm2 base64-en­codes:

the last 128-byte chunk be­comes:

That string is not ar­bi­trary. It is cho­sen be­cause it is both:

In a le­git­i­mate SSH in­te­gra­tion ses­sion, iTerm2 writes base64-en­coded con­duc­tor com­mands to the PTY, and ssh for­wards them to the re­mote con­duc­tor. In the ex­ploit case, iTerm2 still writes those com­mands to the PTY, but there is no real SSH con­duc­tor. The lo­cal shell re­ceives them as plain in­put in­stead.

That is why the ses­sion looks like this when recorded:

* the last chunk is ace/​c+al­i­FIo

Earlier chunks fail as non­sense com­mands. The fi­nal chunk works if that path ex­ists lo­cally and is ex­e­cutable.

You can re­pro­duce the orig­i­nal file-based PoC with gen­poc.py:

* readme.txt, a file con­tain­ing the ma­li­cious DCS 2000p and OSC 135 se­quences

The first fools iTerm2 into talk­ing to a fake con­duc­tor. The sec­ond gives the shell some­thing real to ex­e­cute when the fi­nal chunk ar­rives.

For the ex­ploit to work, run cat readme.txt from the di­rec­tory con­tain­ing ace/​c+al­i­FIo, so the fi­nal at­tacker-shaped chunk re­solves to a real ex­e­cutable path.

* Mar 30: We re­ported the bug to iTerm2.

* Mar 31: The bug was fixed in com­mit a9e745993c2e2cb­b30b884a16617cd5495899f86.

* At the time of writ­ing, the fix has not yet reached sta­ble re­leases.

When the patch com­mit landed, we tried to re­build the ex­ploit from scratch us­ing the patch alone. The prompts used for that process are in prompts.md, and the re­sult­ing ex­ploit is gen­poc2.py, which works very sim­i­larly to gen­poc.py.

...

Read the original on blog.calif.io »

8 222 shares, 20 trendiness

Interval Calculator

This is a cal­cu­la­tor that works over unions of in­ter­vals rather than just real num­bers. It is an im­ple­men­ta­tion of Interval

Union Arithmetic.

An in­ter­val [a, b] rep­re­sents the set of all num­bers be­tween and in­clud­ing a and b. An in­ter­val union: [a, b] U [c, d] is a dis­joint set of in­ter­vals.

Interval union arith­metic is an ex­ten­sion of reg­u­lar in­ter­val arith­metic that is vastly su­pe­rior, mostly be­cause it re­mains closed while sup­port­ing di­vi­sion by in­ter­vals con­tain­ing zero:

➤ 2 / [-2, 1]

[-∞, -1] U [2, +∞]

The in­ter­est­ing thing about in­ter­val union arith­metic is the in­clu­sion prop­erty, which means that if you pick any real num­ber from every in­put union and com­pute the same ex­pres­sion over the re­als, the re­sult is guar­an­teed to be in the out­put union.

You can use it to rep­re­sent un­cer­tainty:

➤ 50 * (10 + [-1, 1])

[450, 550]

You can also com­pute more com­plex in­ter­val ex­pres­sions, us­ing the in­ter­val union op­er­a­tor U:

➤ ( [5, 10] U [15, 16] ) / [10, 100]

[0.05, 1.6]

Operations can re­sult in dis­joint unions of in­ter­vals:

➤ 1 / [-2, 1]

[-∞, -0.5] U [1, +∞]

➤ tan([pi/​3, 2*pi/3])

[-∞, -1.732] U [1.732, +∞]

In full pre­ci­sion mode, you can use it as a reg­u­lar cal­cu­la­tor, and ob­tain in­ter­val re­sults that are guar­an­teed to con­tain the true value, de­spite float­ing point pre­ci­sion is­sues:

➤ 0.1 + 0.2

[0.29999999999999993, 0.3000000000000001]

Note: you can in­put in­ter­vals with the bracket syn­tax: [1, 2], or bare num­bers with­out brack­ets: 3.14. Bare num­bers are in­tepreted as a nar­row in­ter­val, i.e. [3.14, 3.14] (with sub­tleties re­lated to full pre­ci­sion mode). This en­ables bare num­bers and in­ter­vals to be mixed nat­u­rally:

➤ 1.55 + [-0.002, 0.002]

[1.548, 1.552]

A sur­pris­ing con­se­quence of the cal­cu­la­tor gram­mar is that in­ter­vals can be nested and you can write things like:

➤ [0, [0, 100]]

[0, 100]

This is be­cause all num­bers, in­clud­ing those in­side an in­ter­val bracket which de­fine a bound, are in­ter­preted as in­ter­vals. When nest­ing two in­ter­vals as above, an in­ter­val used as an in­ter­val bound is the same as tak­ing its up­per bound. This de­sign choice en­ables us­ing arith­metic on in­ter­val bounds them­selves:

➤ [0, cos(2*pi)]

[0, 1]

Outward round­ing is im­ple­mented over IEEE 754 dou­ble pre­ci­sion floats (javascript’s num­ber type), so re­sult in­ter­vals are guar­an­teed to con­tain the true value that would be ob­tained by com­put­ing the same ex­pres­sion over the re­als with in­fi­nite pre­ci­sion. For ex­am­ple, try the fa­mous sum 0.1 + 0.2 in the cal­cu­la­tor. Interval arith­metic com­putes an in­ter­val that is guar­an­teed to con­tain 0.3, even though 0.3 is not rep­re­sentable as a dou­ble pre­ci­sion float.

* Numbers in­put by the user are in­ter­preted as the small­est

in­ter­val that con­tains the IEEE 754 value clos­est to the in­put

dec­i­mal rep­re­sen­ta­tion but where nei­ther bounds are equal to

it

* Output num­bers are dis­played with all avail­able dec­i­mal

dig­its (using Number.toString())

* Numbers in­put by the user are in­ter­preted as the

de­gen­er­ate in­ter­val (width zero) where both bounds are equal

to the IEEE 754 value clos­est to the in­put dec­i­mal

rep­re­sen­ta­tion

* Output num­bers are dis­played with a max­i­mum of 4 dec­i­mal

dig­its (using Number.toPrecision())

While I’ve been very care­ful, I’m sure there are still some bugs in the cal­cu­la­tor. Please re­port

any is­sue on GitHub.

Interval

Calculator and not-so-float

(the en­gine pow­er­ing the cal­cu­la­tor) are open-source. If you you like my open-source work, please con­sider spon­sor­ing me

on GitHub. Thank you ❤️

* Split full pre­ci­sion mode into two con­trols: in­put

in­ter­pre­ta­tion and dis­play pre­ci­sion

...

Read the original on victorpoughon.github.io »

9 219 shares, 9 trendiness

paniclock/paniclock: Instantly disable Touch ID and lock your Mac with one click or keyboard shortcut.

PanicLock is ma­cOS menu bar util­ity that in­stantly dis­ables Touch ID and locks the screen with a sin­gle click or clos­ing your lap­top lid.

PanicLock fills a gap ma­cOS leaves open: there is no built-in way to in­stantly dis­able Touch ID when it mat­ters. Biometrics are con­ve­nient day-to-day, and some­times prefer­able when you need speed or want to avoid your pass­word be­ing ob­served. But in sen­si­tive sit­u­a­tions, law en­force­ment and bor­der agents in many coun­tries can com­pel a bio­met­ric un­lock in ways they can­not with a pass­word. PanicLock gives you a one-click menu bar but­ton, a cus­tomiz­able hotkey, or an au­to­matic lock-on-lid-close op­tion that im­me­di­ately dis­ables Touch ID and locks your screen, restor­ing pass­word-only pro­tec­tion with­out killing your ses­sion or shut­ting down.

* One-click panic lock — Click the menu bar icon or press a hotkey to in­stantly lock

* Lock on Close — Optionally lock and dis­able Touch ID when you close the lid

* Launch at lo­gin — Start au­to­mat­i­cally when you log in

brew in­stall pan­i­clock/​tap/​pan­i­clock

Download the lat­est DMG from the re­leases page.

When en­abled in Preferences, clos­ing your Mac’s lid will au­to­mat­i­cally dis­able Touch ID and lock your screen. Touch ID stays dis­abled un­til you re-lo­gin with your pass­word. If your screen locks for other rea­sons (screensaver, dis­play sleep, etc.), Touch ID will still work as nor­mal.

On first use, you’ll be prompted for your ad­min pass­word to in­stall the priv­i­leged helper. This is a one-time setup.

Set your Development Team in both tar­gets (PanicLock and PanicLockHelper)

brew unin­stall pan­i­clock

sudo launchctl bootout sys­tem/​com.pan­i­clock.helper

sudo rm -f /Library/PrivilegedHelperTools/com.paniclock.helper

sudo rm -f /Library/LaunchDaemons/com.paniclock.helper.plist

rm -rf /Applications/PanicLock.app

PanicLock uses a priv­i­leged helper (installed via SMJobBless) to mod­ify Touch ID time­out set­tings:

Sets time­out to 1 sec­ond via bi­outil -w -s -o 1

* No net­work ac­tiv­ity — App is 100% of­fline, no teleme­try or an­a­lyt­ics

Note: PanicLock only dis­ables Touch ID. If you have other un­lock meth­ods en­abled, Apple Watch un­lock, se­cu­rity keys, etc., your Mac can still be un­locked us­ing those.

./scripts/release.sh

* Signs with Developer ID for dis­tri­b­u­tion out­side the App Store

* Submits to Apple for no­ta­riza­tion (can take min­utes to hours)

* Supports par­al­lel no­ta­riza­tions — each ver­sion gets its own build/​re­lease/ di­rec­tory

Run again later to check sta­tus and con­tinue when ap­proved

Contributions wel­come! Please open an is­sue or pull re­quest.

...

Read the original on github.com »

10 215 shares, 8 trendiness

Risky Bulletin: NIST gives up enriching most CVEs

This newslet­ter is brought to you by Corelight. You can sub­scribe to an au­dio ver­sion of this newslet­ter as a pod­cast by search­ing for Risky Business” in your pod­catcher or sub­scrib­ing via this RSS feed. You can also add the Risky Business newslet­ter as a Preferred Source to your Google search re­sults by go­ing here.

The US National Institute of Standards and Technology an­nounced on Wednesday a new pol­icy re­gard­ing the US National Vulnerability Database, which the agency has been strug­gling to keep up­dated with de­tails for every new vul­ner­a­bil­ity added to the sys­tem.

Going for­ward, NIST says its staff will only add data—in a process called en­rich­ment—only for im­por­tant vul­ner­a­bil­i­ties.

This will in­clude three types of se­cu­rity flaws, which the agency says are crit­i­cal to the safe op­er­a­tion of US gov­ern­ment net­works and its pri­vate sec­tor.

* CVE en­tries for vul­ner­a­bil­i­ties listed in CISA KEV, a data­base of ac­tively ex­ploited bugs;

* CVEs in soft­ware known to be used by US fed­eral agen­cies;

* and CVEs in what the agency clas­si­fies as critical soft­ware.”

This lat­ter cat­e­gory sounds re­stric­tive, but is in fact quite broad and in­cludes all the ma­jor soft­ware you’d ex­pect and want to have prop­erly en­riched CVEs for. Stuff like op­er­at­ing sys­tems, web browsers, se­cu­rity soft­ware, fire­walls, backup soft­ware, and VPNs; they are all on the list [PDF], which you can also see be­low this post.

NIST has been strug­gling to en­rich CVEs for more than two years due to an ex­plo­sion in bug dis­cov­er­ies and mount­ing costs, also made worse by the Trump ad­min­is­tra­tion’s re­cent cuts to var­i­ous DHS and CISA bud­gets.

Its prob­lems started in early 2024, when a hand­ful of 2,100+ CVE en­tries that were left with­out en­riched meta­data turned into al­most 30,000 by the end of the year. Despite ef­forts to catch up and add de­tails to all CVEs pub­lished in the NVD, the agency is still tens of thou­sands of bugs be­hind.

The NIST an­nounce­ment is a ca­pit­u­la­tion, with the agency ad­mit­ting it won’t ever catch up due to its cur­rent bud­getary cir­cum­stances.

It is a smart de­ci­sion. Even though this sounds as a blas­phemy for the in­fosec peo­ple in the vul­ner­a­bil­ity man­age­ment space, the only way for­ward for NIST was to fo­cus on the im­por­tant bugs only and giv­ing up on all the CVE chaff.

Each year, there are tens of thou­sands of vul­ner­a­bil­i­ties be­ing re­ported in all kinds of no-name soft­ware you have never heard of, in all the tiny li­braries that barely have 100 stars on GitHub, and all the IoT gear and their firmware com­po­nents.

The an­nounce­ment is not what the vul­ner­a­bil­ity man­age­ment com­pa­nies wanted, since many of them re­lied on pack­ag­ing the NVD out­put into their own vul­ner­a­bil­ity scan­ners, dash­boards, and re­port­ing tools.

With some of that out­put set to dis­ap­pear for good, they will have to find other places to get the data, or en­rich it them­selves. Aikido Security’s Sooraj Shah has an ex­cel­lent take on what this means for the in­dus­try

The cy­ber­se­cu­rity in­dus­try was ex­pect­ing this to hap­pen. At a January quar­terly meet­ing, NIST of­fi­cials talked about rethinking” the agen­cy’s role in an­a­lyz­ing soft­ware vul­ner­a­bil­i­ties, and hinted at a plan to only triage the im­por­tant bugs.

NIST says that be­sides fo­cus­ing on en­rich­ing only the big bugs, it will also stop pro­vid­ing its own CVSS sever­ity scores for NVD en­tries, and will now show the sever­ity score ini­tially as­signed by the or­ga­ni­za­tion that is­sued the CVE.

This opens the door for a lot of in­fosec drama. Some of the or­ga­ni­za­tions that is­sue CVE num­bers are also the mak­ers of the reported” soft­ware, and these com­pa­nies are ex­tremely likely to is­sue low sever­ity scores and down­play their own bugs.

This has been hap­pen­ing for decades, and if you read enough vul­ner­a­bil­ity write-ups, you’ll of­ten find se­cu­rity re­searchers ac­cus­ing com­pa­nies of bla­tantly down­grad­ing CVSS scores and mis­char­ac­ter­iz­ing their own bugs to down­play the bug’s im­pact, over and over again.

More than 48,000 vul­ner­a­bil­i­ties re­ceived a CVE num­ber last year and NIST is giv­ing up right be­fore ex­perts an­tic­i­pate this num­ber will ex­plode with the broad adop­tion of AI cy­ber­se­cu­rity agents de­signed to help im­prove vul­ner­a­bil­ity dis­cov­ery.

The in­te­gra­tion of AI vul­ner­a­bil­ity scan­ners is likely to yield a few ma­jor bugs, but they’re also ex­pected to pro­duce moun­tains of CVE chaff that no hu­man team at NIST would have been able to keep up with any­way.

NISTs new en­rich­ment pol­icy en­tered into ef­fect this week, on Wednesday, April 15.

The main Risky Business pod­cast is now on YouTube with video ver­sions of our re­cent episodes. Below is our lat­est weekly show with Pat and Adam at the helm!

Russian hack­ers tar­geted a Swedish ther­mal plant: A pro-Russ­ian hack­tivist group tried to dis­rupt a Swedish ther­mal power ​plant last year. The at­tack tar­geted a power plant in west­ern ​Sweden last spring. The in­tru­sion was caught by the plan­t’s built-in safe­guards. Swedish of­fi­cials linked the group to Russia’s se­cu­rity ser­vices. [EnergyWatch // SVT]

Russia hacked Ukrainian pros­e­cu­tors: Russian hack­ers have bro­ken into the emails of more than 170 Ukrainian pros­e­cu­tors. The cam­paign sought to gain ac­cess to in­ves­tiga­tive in­for­ma­tion. The at­tacks were linked to APT28, a cy­ber unit in­side Russia’s mil­i­tary in­tel­li­gence agency, the GRU. The same cam­paign also breached mil­i­taries in Greece, Romania, and Serbia. The hacks are part of a cam­paign spot­ted last month by Ctrl-Alt-Intel. [Reuters]

Grinex shuts down af­ter hack: Russian cryp­tocur­rency ex­change Grinex has shut­tered op­er­a­tions fol­low­ing a theft this week. The com­pany claims Western in­tel­li­gence agen­cies” broke into its wal­lets and stole $13 mil­lion (1 bil­lion rubles) worth of as­sets. The ex­change was sanc­tioned by US au­thor­i­ties last August for help­ing Russia evade sanc­tions and laun­der­ing ran­somware pay­ments. A TRM Labs re­port found that Grinex was a re­brand of an older Russian crypto ex­change Garantex, also sanc­tioned for the same things. [Wayback Machine]

Zerion blames North Korea for crypto-heist: Crypto-wallet provider Zerion has blamed a re­cent heist of $100,000 on North Korean hack­ers.

Autovista ran­somware at­tack: A ran­somware group has hit au­to­mo­tive data an­a­lyt­ics com­pany Autovista, with the at­tack im­pact­ing sys­tems in Europe and Australia.

McGraw Hill breach: Hackers have leaked the per­sonal de­tails of 13.5 mil­lion users of ed­u­ca­tional plat­form McGraw Hill. The data was taken from the com­pa­ny’s SalesForce ac­counts. It was leaked af­ter a failed ex­tor­tion at­tempt by the ShinyHunters group. It in­cludes de­tails such as real names, home ad­dresses, emails, and phone num­bers.

Standard Bank breach: South Africa’s largest bank has dis­closed a se­cu­rity breach. The Standard Bank says hack­ers breached last week an in­ter­nal net­work stor­ing cus­tomer data. The in­ci­dent is the third hack of a South African bank this year. [IOL]

BlueLeaks 2.0 data is now up for sale: A hacker is sell­ing 8.3 mil­lion con­fi­den­tial crime tips for $10,000 in cryp­tocur­rency. The data was stolen ear­lier this year from P3 Global Intel, a soft­ware provider for US law en­force­ment agen­cies. The hacker, who goes by the name Internet Yiff Machine, ini­tially pro­vided the data for free to se­lect jour­nal­ists and the DDoSecrets pro­ject. The hacker says they’re sell­ing the data be­cause principles are for the well-fed, and I’m un­for­tu­nately not in a great place.” [Straight Arrow News // DataBreaches.net]

Krybit hacks 0APT: The Krybit ran­somware group has hacked the web­site of ri­val ran­som group 0APT. The in­ci­dent oc­curred af­ter the 0APT group threat­ened to dox Krybit’s mem­bers last week. According to se­cu­rity firm Barricade, 0APT leaked plain­text cre­den­tials for Krybit’s ran­somware back­end panel, along with Bitcoin ad­dresses and vic­tim names. Krybit re­turned the fa­vor by leak­ing 0APTs en­tire server con­tents.

OpenAI an­nounces its own pri­vate cy­ber model: OpenAI has re­leased an LLM model for cy­ber­se­cu­rity work into pri­vate test­ing. Thousands of ver­i­fied pro­fes­sion­als and hun­dreds of teams re­spon­si­ble for de­fend­ing crit­i­cal soft­ware have been in­vited to test the GPT‑5.4‑Cyber model. The new model has loose per­mis­sions for cy­ber­se­cu­rity re­search, such as re­verse-en­gi­neer­ing and vul­ner­a­bil­ity dis­cov­ery. The new lim­ited ac­cess model is OpenAI’s re­sponse to Anthropic’s Project Glasswing and the Mythos model.

Anthropic rolls out KYC for Claude: Anthropic will ask cer­tain Claude users to ver­ify their iden­tity by pro­vid­ing a selfie and a gov­ern­ment ID. The com­pany says the new iden­tity ver­i­fi­ca­tion check will only roll out in a few use cases.” The checks are meant to pre­vent abuse and com­ply with le­gal oblig­a­tions. The ID checks will be han­dled by Persona, the same com­pany Discord had to cut ties be­cause of com­mu­nity back­lash.

BlueSky’s mega out­age: Social me­dia net­work BlueSky had a pro­longed out­age on Thursday that was so bad, even its server sta­tus page was down—prob­a­bly be­cause they hosted it on the same in­fra­struc­ture. You live and learn, I guess. [News.az]

Grok is still nud­i­fy­ing: xAI’s Grok is still gen­er­at­ing nude im­ages at users’ re­quests, de­spite a huge back­lash from au­thor­i­ties all over the world. Just take Grok be­hind the shed, Elon! It’s time. [NBC News]

Nudify apps are still every­where: Both Apple and Google are still host­ing nud­ify apps on their stores, and their ad sys­tems are of­ten used to lure users to the very same apps they’re sup­posed to have banned. [Tech Transparency Project]

News sites block the Internet Archive: Twenty-three ma­jor news out­lets are now block­ing the Internet Archive’s Wayback Machine from cre­at­ing copies of their con­tent. Most cited fear the backed up pages could be used as a proxy to train AI on their con­tent. [Tom’s Hardware]

IPv6 mile­stone: Global IPv6 traf­fic has crossed 50% for the first time at the end of last month.

IPv8 pro­to­col pro­posal: A new ver­sion of the IP ad­dress­ing pro­to­col has been pro­posed with the Internet Engineering Task Force. The new pro­to­col is be­ing called IPv8 and is meant to be com­pat­i­ble with old IPv4 ad­dresses. IPv8 ad­dresses will in­clude a pre­fix and an old IPv4 ad­dress. The pre­fix will be spe­cific to each ASN (network op­er­a­tor). For old IPv4 ad­dresses, this pre­fix will be 0.0.0.0. This will al­low de­vices and net­works with old IPv4 ad­dresses to con­nect to IPv8 sys­tems with­out any soft­ware up­dates re­quired.

Chrome does noth­ing to stop browser fin­ger­print­ing: Web pri­vacy ex­pert Alexander Hanff looks at the var­i­ous browser fin­ger­print­ing tech­niques used by on­line track­ers and how Chrome does­n’t do any­thing to block them.

Android gets new one-time data pick­ers: The next Android OS ver­sion will in­clude two new sys­tems to let users pick con­tacts or share their pre­cise lo­ca­tion for one time with­out an app need­ing per­sis­tent ac­cess to the read con­tacts and pre­cise ge­olo­ca­tion per­mis­sions.

Raspberry Pi dis­ables pass­word­less sudo: The Raspberry Pi pro­ject has dis­abled pass­word­less ac­cess to the sudo util­ity in its OS.

Some ESUs ex­tended: Microsoft has ex­tended the Exchange 2016/2019 Extended Security Updates (ESU) pro­gram un­til October this year. The ESU ended this month. Same goes for the Skype for Business ESU.

Windows adds RDP warn­ing pop­ups: Windows will now show a se­cu­rity warn­ing popup when­ever users open RDP con­fig­u­ra­tion files. The pop­ups will alert users that they are about to make dan­ger­ous changes that may al­low re­mote at­tack­ers to con­nect to their PCs and steal data. Several threat ac­tors have used ma­li­cious RDP con­fig files in phish­ing op­er­a­tions as a way to gain a foothold in­side tar­geted net­works. Russian group ATP29 is known for us­ing this tech­nique in es­pi­onage op­er­a­tions.

FCC ex­empts Netgear from for­eign router ban: The US Federal Communications Commission has ex­cluded Netgear from the Trump ad­min­is­tra­tion ban on for­eign-made routers. The agency granted the ex­emp­tion at the re­quest of the US Department of War. Netgear is an American com­pany but most of its routers are made in Southeast Asia.

More cy­ber EOs are com­ing: National Cyber Director Sean Cairncross says the Trump ad­min­is­tra­tion will soon sign and is­sue more cy­ber-re­lated ex­ec­u­tive or­ders to help push for­ward the im­ple­men­ta­tion of the White House’s new cy­ber­se­cu­rity strat­egy. [CyberScoop]

US Tech Force is hir­ing cy­ber staff: The Trump ad­min­is­tra­tion is re­cruit­ing cy­ber­se­cu­rity spe­cial­ists for its new and up­com­ing US Tech Force agency. The Tech Force was an­nounced at the end of last year. The plan is to re­cruit around 1,000 tech work­ers from large US corps to modernize” the US gov­ern­men­t’s net­works. The new hir­ing process comes af­ter the Trump ad­min­is­tra­tion fired a third of CISAs staff and plans hun­dreds more next year. CISA also re­cently can­celed sum­mer in­tern­ships for cy­ber schol­ar­ship stu­dents amid DHS fund­ing lapse.

Foreign in­ter­net traf­fic in Russia is be­com­ing very ex­pen­sive: Russian tel­cos will in­crease the price for in­ter­net traf­fic re­ceived from out­side the coun­try’s bor­ders as part of mea­sures to crack down on VPN use. [RBC]

EU launches age ver­i­fi­ca­tion app: The EU has launched its own in­ter­nally-de­vel­oped age ver­i­fi­ca­tion app. The app uses cryp­to­graphic proofs to ver­ify a user’s age with­out shar­ing their per­sonal data. EU of­fi­cials have urged on­line plat­forms to in­te­grate the app with their processes. Age ver­i­fi­ca­tion is manda­tory un­der the EUs new Digital Services Act. The app is avail­able for Android and iOS, and fu­ture desk­top and web ver­sions are planned. The source code is also avail­able on GitHub.

In this Risky Business spon­sor in­ter­view, Corelight’s Senior Director of Product Management, Dave Getman, tells James Wilson how Corelight Agentic Triage helps de­fend­ers stay ahead of AI-powered at­tacks.

DPRK lap­top farm­ers sen­tenced: The US has sen­tenced two in­di­vid­u­als to prison for run­ning a lap­top farm for North Korean re­mote IT work­ers. Kejia Wang and Zhenxing Wang were sen­tenced to 108 and 92 months in prison, re­spec­tively. Both hosted lap­tops at their homes in New Jersey that ran from US IPs to al­low North Koreans to pose as American cit­i­zens. Authorities also in­dicted nine North Koreans re­mote work­ers who par­tic­i­pated in the scheme.

16yo ar­rested for school cy­ber­at­tack: Northern Ireland au­thor­i­ties have ar­rested a 16-year-old for a cy­ber­at­tack that dis­rupted the coun­try’s na­tional school IT net­work. The C2K plat­form was down at the start of the month af­ter a cy­ber­at­tack that tar­geted a small num­ber of schools. More than 300,000 pupils and 20,000 teach­ers could­n’t ac­cess exam data, home as­sign­ments, and teach­ing ma­te­ri­als for days fol­low­ing the in­ci­dents, as of­fi­cials shut down the plat­form to in­ves­ti­gate. [BelfastLive]

53 DDoS-for-hire do­mains seized: Europol and other law en­force­ment agen­cies have seized 53 do­mains that hosted DDoS-for-hire ser­vices. Four sus­pects were also de­tained fol­low­ing 25 house searches. Authorities have also sent let­ters and emails to more than 75,000 users who had signed up for the ser­vices. They also worked with Google to re­move ads pro­mot­ing DDoS ser­vices.

UNC2465 shifts to Europe: Orange’s se­cu­rity team re­ports that a known ran­somware af­fil­i­ate tracked as UNC2465 has shifted its at­tacks to Europe. The group is cur­rently us­ing the SmokedHam back­door as an ini­tial en­try point for Qilin ran­somware at­tacks.

Black Basta off­shoots tar­get ex­ecs: A group of for­mer Black Basta af­fil­i­ates are us­ing au­to­mated email bomb­ing and Teams-based so­cial en­gi­neer­ing to tar­get ex­ec­u­tives and se­nior-level em­ploy­ees for ini­tial ac­cess into cor­po­rate net­works. [ReliaQuest]

Hazy Hawk hi­jacks uni­ver­sity sub­do­mains: A cy­ber­crime group has hi­jacked sub­do­mains at 34 US uni­ver­si­ties and ed­u­ca­tional or­ga­ni­za­tions to show porno­graphic spam. MIT, Harvard, Stanford, Johns Hopkins, and other large uni­ver­si­ties have had sub­do­mains hacked. The spam cam­paign has been linked to Hazy Hawk, a group that hi­jacked CDC sub­do­mains last year. [SH Consulting]

QEMU abused in the wild: Sophos says at least two cy­ber­crime groups are de­ploy­ing the QEMU vir­tu­al­iza­tion en­vi­ron­ment on com­pro­mised net­works to hide ma­li­cious ac­tiv­ity and later de­ploy ran­somware.

WP scan­ning: F5 says a bad­ness clus­ter it’s been keep­ing an eye on has re­cently started mass-scans for sites run­ning vul­ner­a­ble WordPress plu­g­ins.

FTP ex­po­sure is still huge: According to Censys, there are still 6 mil­lion end­points ex­pos­ing an FTP port over the in­ter­net, al­most 55 years af­ter the pro­to­col was cre­ated.

C2 servers in Russia: A large-scale study of the Russian web host­ing space has found more than 1,200 ma­li­cious com­mand and con­trol servers hosted in­side Russia this year. Most of the servers are for IoT mal­ware bot­nets, such as Keitaro, Hajime, Mozi, and Mirai. [Hunt Intelligence]

Rhadamanthys’s se­cret bug: The Rhadamanthys in­fos­tealer left its com­mand and con­trol server APIs ex­posed on­line with­out au­then­ti­ca­tion, al­low­ing se­cu­rity re­searchers to track its ac­tiv­ity for months be­fore the Europol take­down last year. [Censys]

Direct-Sys Loader: The Cyderes team has dis­cov­ered a new mal­ware loader named Direct-Sys Loader be­ing de­liv­ered in the wild.

PowMix bot­net: Cisco Talos has spot­ted a new Windows bot­net mal­ware strain named PowMix, cur­rently go­ing on a test run in the Czech Republic.

AngrySpark: Gen Digital has spot­ted a new Windows rootkit named AngrySpark, al­ready used in the wild on a UK vic­tim’s sys­tem.

W3LL PhaaS: Group-IB pub­lished a re­port on W3LL, the phish­ing plat­form seized by au­thor­i­ties ear­lier this month.

ATHR plat­form: A cy­ber­crime group has de­vel­oped and is rent­ing ac­cess to a plat­form that au­to­mates voice phish­ing at­tacks. The ATHR plat­form uses AI agents to call tar­gets us­ing pre­con­fig­ured and multi-step scripts. ATHR ac­cess is be­ing sold for $4,000 and 10% of a cam­paign’s prof­its. According to AbnormalAI, the plat­form is pri­mar­ily be­ing used to trick vic­tims into re­veal­ing cre­den­tials for their on­line ac­counts.

James Pope, Corelight’s Director of Technical Marketing Engineering, demon­strates the com­pa­ny’s Open NDR Platform and how it com­bines net­work de­tec­tions with a whole host of other data sources.

UAC-0247 and AGINGFLY: CERT-UA re­ported a new wave of at­tacks against its gov­ern­ment agen­cies, hos­pi­tals, and emer­gency ser­vices. This ac­tiv­ity was linked to a clus­ter tracked as UAC-0247. The fi­nal pay­load was a new in­fos­tealer named AGINGFLY.

Sapphire Sleet tar­gets ma­cOS: DPRK APT group Sapphire Sleet has adapted its install this Zoom up­date to hear me” mal­ware de­liv­ery tech­nique for ma­cOS, per a new Microsoft re­port.

PyPI se­cu­rity au­dit: Python’s PyPI has com­pleted its sec­ond se­cu­rity au­dit.

Zero Day Quest 2026: Microsoft awarded $2.3 mil­lion in bug bounty re­wards at this year’s edi­tion of Zero Day Quest, its cloud and AI hack­ing con­test.

Mythos guid­ance: Cisco [PDF] and the Cloud Security Alliance have is­sued guides on how to pro­tect and de­fend net­works in the face of ris­ing pow­er­ful AI vul­ner­a­bil­ity dis­cov­ery agents like Anthropic’s Mythos.

Mythos/Glasswing vul­ner­a­bil­i­ties: VulnCheck has sifted through its huge CVE data­base and be­lieves it has tracked down some of the bugs dis­cov­ered us­ing Anthropic’s Mythos agent as part of Project Glasswing. There are 75 CVEs that men­tion Anthropic, 40 cred­ited to Anthropic, but only one specif­i­cally men­tions Glasswing. So far, it’s un­clear if any of the Mythos-found bugs even re­ceived proper CVEs.

You can trick Claude by be­ing an in­dus­try leg­end: Manifold Security tricked Claude’ GitHub bot to merge ma­li­cious code to repos­i­to­ries by spoof­ing their re­quests un­der the names of fa­mous de­vel­op­ers.

Researcher drops an­other Windows zero-day: A dis­grun­tled se­cu­rity re­searcher has pub­lished proof-of-con­cept code for a new Windows zero-day. The RedSun zero-day can be used to el­e­vate priv­i­leges on Windows to SYSTEM level ac­cess. The re­searcher re­leased the pub­lic ex­ploit af­ter a dis­agree­ment with the Microsoft team that han­dles its bug bounty pro­gram. The same re­searcher also re­leased an­other Windows zero-day named BlueHammer ear­lier this month.

NGINX UI bug ex­ploited in the wild: Hackers are ex­ploit­ing a bug in a pop­u­lar dash­board for man­ag­ing NGINX web servers. Attacks be­gan last month and are tar­get­ing the dash­board’s MCP end­points. Tracked as CVE-2026-33032, the bug al­lows at­tack­ers to ac­cess the MCP end­point with­out au­then­ti­ca­tion and then mod­ify the server’s con­fig files. More than 2,600 of NGINX UI dash­boards are cur­rently ex­posed on the in­ter­net. [Pluto Security]

RAGFlow patches bug af­ter pub­lic dis­clo­sure: The RAGFlow AI toolkit has patched a re­mote code ex­e­cu­tion bug in its soft­ware al­most a week af­ter the bug was pub­licly dis­closed by se­cu­rity re­searchers. The pro­ject ini­tially ig­nored the re­port and only patched the is­sue af­ter the re­searchers them­selves sub­mit­ted the patch code.

Dolibarr RCE: The Dolibarr CRM and ERP has patched an eval-based re­mote code ex­e­cu­tion bug (CVE-2026-22666). A write-up and POC are avail­able via Jiva Security.

Thymeleaf RCE: A crit­i­cal vul­ner­a­bil­ity has been patched in the Java tem­plate en­gine Thymeleaf. Tracked as CVE-2026-40478, the bug al­lows at­tack­ers to by­pass se­cu­rity checks and in­ject ma­li­cious con­tent in server page tem­plates. The bug im­pacts all Thymeleaf ver­sions ever re­leased and has a wide im­pact since Thymeleaf is also the de­fault tem­plate en­gine in the Spring Boot Java frame­work. [Endor Labs]

Codex hacks a smart TV: Security firm Calif has used OpenAI’s Codex agent to hack and gain root ac­cess on a Samsung smart TV.

Fabricked at­tack: A team of aca­d­e­mics has de­vel­oped a new at­tack that breaks the con­fi­den­tial­ity of AMDs se­cure en­clave tech­nol­ogy. The Fabricked at­tack redi­rects mem­ory trans­ac­tions to trick AMDs se­cure co-proces­sor into im­prop­erly ini­tial­iz­ing SEV-SNP en­claves. The novel tech­nique al­lows at­tack­ers to con­trol con­fi­den­tial vir­tual ma­chines where each in­di­vid­ual cus­tomer’s data is typ­i­cally processed in cloud en­vi­ron­ments. AMD re­leased patches this week as part of its Patch Tuesday. Frabricked is one of mul­ti­ple AMD SEV-SNP at­tacks dis­closed over the past two years. Others in­clude RMPocalypse, BadRAM, Ahoi, Heracles, WireTap, BatteringRAM, and TEE. Fail.

Threat/trend re­ports: Check Point, CyberHUB-AM, Google Mandiant, GuidePoint Security, Kaspersky, and Sysdig have re­cently pub­lished re­ports and sum­maries cov­er­ing var­i­ous threats and in­fosec in­dus­try trends.

New tool—Jaspr: Google has open-sourced Jaspr, a new web de­vel­op­ment frame­work writ­ten in Dart.

New tool—Mal­fixer: Mobile se­cu­rity firm Cleafy has open-sourced Malfixer, a toolkit for in­spect­ing and re­cov­er­ing mal­formed Android APK files.

New tool—RePython­NET-MCP: Security firm Sekoia has open-sourced RePythonNET-MCP, an MCP server for .NET re­verse en­gi­neer­ing au­toma­tion.

New tool—PMG: DevSecOps firm SafeDep has re­leased PMG, a tool that de­lays npm and Python pack­age in­stalls un­til the li­braries are checked against its threat in­tel data­base.

New tool—Hon­ey­Wire: Andrea Termine has pub­lished HoneyWire, a light­weight dis­trib­uted de­cep­tion en­gine de­signed for in­ter­nal net­works.

New tool—Net­Watch: Westpac’s chief en­gi­neer Matt Hartley has re­leased NetWatch, a real-time net­work di­ag­nos­tics tool for ter­mi­nals.

In this edi­tion of Seriously Risky Business, Tom Uren and Amberleigh Jack talk about a new Citizen Lab re­port into Webloc, a tool to iden­tify and track mo­bile de­vices. It demon­strates how the col­lec­tion and sale of mo­bile phone ge­olo­ca­tion data pre­sents pri­vacy and na­tional se­cu­rity risks.

In this episode of Risky Business Features, James Wilson chats to pro­fes­sional hacker Jamieson O’Reilly about Anthropic’s Mythos and the im­pact it could have on of­fen­sive se­cu­rity. Jamieson is CEO of DVULN and co-founder of Aether AI.

...

Read the original on risky.biz »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.