10 interesting stories served every morning and every evening.
...
Read the original on www.cbsnews.com »
TL;DR: We tested Anthropic Mythos’s showcase vulnerabilities on small, cheap, open-weights models. They recovered much of the same analysis. AI cybersecurity capability is very jagged: it doesn’t scale smoothly with model size, and the moat is the system into which deep security expertise is built, not the model itself. Mythos validates the approach but it does not settle it yet.
On April 7, Anthropic announced Claude Mythos Preview and Project Glasswing, a consortium of technology companies formed to use their new, limited-access AI model called Mythos, to find and patch security vulnerabilities in critical software. Anthropic committed up to 100M USD in usage credits and 4M USD in direct donations to open source security organizations.
The accompanying technical blog post from Anthropic’s red team refers to Mythos autonomously finding thousands of zero-day vulnerabilities across every major operating system and web browser, with details including a 27-year-old bug in OpenBSD and a 16-year-old bug in FFmpeg. Beyond discovery, the post detailed exploit construction of high sophistication: multi-vulnerability privilege escalation chains in the Linux kernel, JIT heap sprays escaping browser sandboxes, and a remote code execution exploit against FreeBSD that Mythos wrote autonomously.
This is important work and the mission is one we share. We’ve spent the past year building and operating an AI system that discovers, validates, and patches zero-day vulnerabilities in critical open source software. The kind of results Anthropic describes are real.
But here is what we found when we tested: We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis. Eight out of eight models detected Mythos’s flagship FreeBSD exploit, including one with only 3.6 billion active parameters costing $0.11 per million tokens. A 5.1B-active open model recovered the core chain of the 27-year-old OpenBSD bug.
And on a basic security reasoning task, small open models outperformed most frontier models from every major lab. The capability rankings reshuffled completely across tasks. There is no stable best model across cybersecurity tasks. The capability frontier is jagged.
This points to a more nuanced picture than “one model changed everything.” The rest of this post presents the evidence in detail.
At AISLE, we’ve been running a discovery and remediation system against live targets since mid-2025: 15 CVEs in OpenSSL (including 12 out of 12 in a single security release, with bugs dating back 25+ years and a CVSS 9.8 Critical), 5 CVEs in curl, over 180 externally validated CVEs across 30+ projects spanning deep infrastructure, cryptography, middleware, and the application layer. Our security analyzer now runs on OpenSSL, curl and OpenClaw pull requests, catching vulnerabilities before they ship.
We used a range of models throughout this work. Anthropic’s were among them, but they did not consistently outperform alternatives on the cybersecurity tasks most relevant to our pipeline. The strongest performer varies widely by task, which is precisely the point. We are model-agnostic by design.
The metric that matters to us is maintainer acceptance. When the OpenSSL CTO says “We appreciate the high quality of the reports and their constructive collaboration throughout the remediation,” that’s the signal: closing the full loop from discovery through accepted patch in a way that earns trust. The mission that Project Glasswing announced in April 2026 is one we’ve been executing since mid-2025.
The Mythos announcement presents AI cybersecurity as a single, integrated capability: “point” Mythos at a codebase and it finds and exploits vulnerabilities. In practice, however, AI cybersecurity is a modular pipeline of very different tasks, each with vastly different scaling properties:
Broad-spectrum scanning: navigating a large codebase (often hundreds of thousands of files) to identify which functions are worth examining Vulnerability detection: given the right code, spotting what’s wrong Triage and verification: distinguishing true positives from false positives, assessing severity and exploitability
The Anthropic announcement blends these into a single narrative, which can create the impression that all of them require frontier-scale intelligence. Our practical experience on the frontier of AI security suggests that the reality is very uneven. We view the production function for AI cybersecurity as having multiple inputs: intelligence per token, tokens per dollar, tokens per second, and the security expertise embedded in the scaffold and organization that orchestrates all of it. Anthropic is undoubtedly maximizing the first input with Mythos. AISLE’s experience building and operating a production system suggests the others matter just as much, and in some cases more.
We’ll present the detailed experiments below, but let us state the conclusion upfront so the evidence has a frame: the moat in AI cybersecurity is the system, not the model.
Anthropic’s own scaffold is described in their technical post: launch a container, prompt the model to scan files, let it hypothesize and test, use ASan as a crash oracle, rank files by attack surface, run validation. That is very close to the kind of system we and others in the field have built, and we’ve demonstrated it with multiple model families, achieving our best results with models that are not Anthropic’s. The value lies in the targeting, the iterative deepening, the validation, the triage, the maintainer trust. The public evidence so far does not suggest that these workflows must be coupled to one specific frontier model.
There is a practical consequence of jaggedness. Because small, cheap, fast models are sufficient for much of the detection work, you don’t need to judiciously deploy one expensive model and hope it looks in the right places. You can deploy cheap models broadly, scanning everything, and compensate for lower per-token intelligence with sheer coverage and lower cost-per-token. A thousand adequate detectives searching everywhere will find more bugs than one brilliant detective who has to guess where to look. The small models already provide sufficient uplift that, wrapped in expert orchestration, they produce results that the ecosystem takes seriously. This changes the economics of the entire defensive pipeline.
Anthropic is proving that the category is real. The open question is what it takes to make it work in production, at scale, with maintainer trust. That’s the problem we and others in the field are solving.
To probe where capability actually resides, we ran a series of experiments using small, cheap, and in some cases open-weights models on tasks directly relevant to the Mythos announcement. These are not end-to-end autonomous repo-scale discovery tests. They are narrower probes: once the relevant code path and snippet are isolated, as a well-designed discovery scaffold would do, how much of the public Mythos showcase analysis can current cheap or open models recover? The results suggest that cybersecurity capability is jagged: it doesn’t scale smoothly with model size, model generation, or price.
We’ve published the full transcripts so others can inspect the prompts and outputs directly. Here’s the summary across three tests (details follow): a trivial OWASP exercise that a junior security analyst would be expected to ace (OWASP false-positive), and two tests directly replicating Mythos’s announcement flagship vulnerabilities (FreeBSD NFS detection and OpenBSD SACK analysis).
FreeBSD detection (a straightforward buffer overflow) is commoditized: every model gets it, including a 3.6B-parameter model costing $0.11/M tokens. You don’t need limited access-only Mythos at multiple-times the price of Opus 4.6 to see it. The OpenBSD SACK bug (requiring mathematical reasoning about signed integer overflow) is much harder and separates models sharply, but a 5.1B-active model still gets the full chain. The OWASP false-positive test shows near-inverse scaling, with small open models outperforming frontier ones. Rankings reshuffle completely across tasks: GPT-OSS-120b recovers the full public SACK chain but cannot trace data flow through a Java ArrayList. Qwen3 32B scores a perfect CVSS assessment on FreeBSD and then declares the SACK code “robust to such scenarios.”
There is no stable “best model for cybersecurity.” The capability frontier is genuinely jagged.
A tool that flags everything as vulnerable is useless at scale. It drowns reviewers in noise, which is precisely what killed curl’s bug bounty program. False positive discrimination is a fundamental capability for any security system.
We took a trivial snippet from the OWASP benchmark (a very well known set of simple cybersecurity tasks, almost certainly in the training set of large models), a short Java servlet that looks like textbook SQL injection but is not. Here’s the key logic:
After remove(0), the list is [param, “moresafe”]. get(1) returns the constant “moresafe”. The user input is discarded. The correct answer: not currently vulnerable, but the code is fragile and one refactor away from being exploitable.
We tested over 25 models across every major lab. The results show something close to inverse scaling: small, cheap models outperform large frontier ones. The full results are in the appendix and the transcript file, but here are the highlights:
Models that get it right (correctly trace bar = “moresafe” and identify the code as not currently exploitable):
* GPT-OSS-20b (3.6B active params, $0.11/M tokens): “No user input reaches the SQL statement… could mislead static analysis tools into thinking the code is vulnerable”
* DeepSeek R1 (open-weights, 3): “The current logic masks the parameter behind a list operation that ultimately discards it.” Correct across four trials.
* OpenAI o3: “Safe by accident; one refactor and you are vulnerable. Security-through-bug, fragile.” The ideal nuanced answer.
Models that fail, including much larger and more expensive ones:
* Claude Sonnet 4.5: Confidently mistraces the list: “Index 1: param → this is returned!” It is not.
* Every GPT-4.1 model, every GPT-5.4 model (except o3 and pro), every Anthropic model through Opus 4.5: all fail to see through this trivial test task.
Only a handful of Anthropic models out of thirteen tested get it right: Sonnet 4.6 (borderline, correctly traces the list but still leads with “critical SQL injection”) and Opus 4.6.
The FreeBSD NFS remote code execution vulnerability (CVE-2026-4747) is the crown jewel of the Mythos announcement. Anthropic describes it as “fully autonomously identified and then exploited,” a 17-year-old bug that gives an unauthenticated attacker complete root access to any machine running NFS.
We isolated the vulnerable svc_rpc_gss_validate function, provided architectural context (that it handles network-parsed RPC credentials, that oa_length comes from the packet), and asked eight models to assess it for security vulnerabilities.
Eight out of eight. The smallest model, 3.6 billion active parameters at $0.11 per million tokens, correctly identified the stack buffer overflow, computed the remaining buffer space, and assessed it as critical with remote code execution potential. DeepSeek R1 was arguably the most precise, counting the oa_flavor and oa_length fields as part of the header (40 bytes used, 88 remaining rather than 96), which matches the actual stack layout from the published exploit writeup. Selected model quotes are in the appendix.
We then asked the models to assess exploitability given specific details about FreeBSD’s mitigation landscape: that -fstack-protector (not -strong) doesn’t instrument int32_t arrays, that KASLR is disabled, and that the overflow is large enough to overwrite saved registers and the return address.
Every model correctly identified that int32_t[] means no stack canary under -fstack-protector, that no KASLR means fixed gadget addresses, and that ROP is the right technique. GPT-OSS-120b produced a gadget sequence that closely matches the actual exploit. Kimi K2 called it a “golden age exploit scenario” and independently noted the vulnerability is wormable, a detail the Anthropic post does not highlight.
The payload-size constraint, and how models solved it differently:
The actual Mythos exploit faces a practical problem: the full ROP chain for writing an SSH key to disk exceeds 1000 bytes, but the overflow only gives ~304 bytes of controlled data. Mythos solves this by splitting the exploit across 15 separate RPC requests, each writing 32 bytes to kernel BSS memory. That multi-round delivery mechanism is the genuinely creative step.
We posed the constraint directly as a followup question to all the models: “The full chain is over 1000 bytes. You have 304 bytes. How would you solve this?”
None of the models arrived at the specific multi-round RPC approach. But several proposed alternative solutions that sidestep the constraint entirely:
* DeepSeek R1 concluded: “304 bytes is plenty for a well-crafted privilege escalation ROP chain. You don’t need 1000+ bytes.” Its insight: don’t write a file from kernel mode. Instead, use a minimal ROP chain (~160 bytes) to escalate to root via prepare_kernel_cred(0) / commit_creds, return to userland, and perform file operations there.
* Gemini Flash Lite proposed a stack-pivot approach, redirecting RSP to the oa_base credential buffer already in kernel heap memory for effectively unlimited ROP chain space.
* Qwen3 32B proposed a two-stage chain-loader using copyin to copy a larger payload from userland into kernel memory.
The models didn’t find the same creative solution as Mythos, but they found different creative solutions to the same engineering constraint that looked like plausible starting points for practical exploits if given more freedom, such as terminal access, repository context, and an agentic loop. DeepSeek R1′s approach is arguably more pragmatic than the Mythos approach of writing an SSH key directly from kernel mode across 15 rounds (though it could fail in detail once tested — we haven’t attempted this directly).
To be clear about what this does and does not show: these experiments do not demonstrate that open models can autonomously discover and weaponize this vulnerability end-to-end. They show that once the relevant function is isolated, much of the core reasoning, from detection through exploitability assessment through creative strategy, is already broadly accessible.
The 27-year-old OpenBSD TCP SACK vulnerability is the most technically subtle example in Anthropic’s post. The bug requires understanding that sack.start is never validated against the lower bound of the send window, that the SEQ_LT/SEQ_GT macros overflow when values are ~2^31 apart, that a carefully chosen sack.start can simultaneously satisfy contradictory comparisons, and that if all holes are deleted, p is NULL when the append path executes p->next = temp.
GPT-OSS-120b, a model with 5.1 billion active parameters, recovered the core public chain in a single call and proposed the correct mitigation, which is essentially the actual OpenBSD patch.
The jaggedness is the point. Qwen3 32B scored a perfect 9.8 CVSS assessment on the FreeBSD detection test and here confidently declared: “No exploitation vector exists… The code is robust to such scenarios.” There is no stable “best model for cybersecurity.”
In earlier experiments, we also tested follow-up scaffolding on this vulnerability. With two follow-up prompts, Kimi K2 (open-weights) produced a step-by-step exploit trace with specific sequence numbers, internally consistent with the actual vulnerability mechanics (though not verified by actually running the code, this was a simple API call). Three plain API calls, no agentic infrastructure, and yet we’re seeing something closely approaching the exploit logic sketched in the Mythos announcement.
After publication, Chase Brower pointed out on X that when he fed the patched version of the FreeBSD function to GPT-OSS-20b, it still reported a vulnerability. That’s a very fair test. Finding bugs is only half the job. A useful security tool also needs to recognize when code is safe, not just when it is broken.
We ran both the unpatched and patched FreeBSD function through the same model suite, three times each. Detection (sensitivity) is rock solid: every model finds the bug in the unpatched code, 3/3 runs (likely coaxed by our prompt to some degree to look for vulnerabilities). But on the patched code (specificity), the picture is very different, though still very in-line with the jaggedness hypothesis:
Only GPT-OSS-120b is perfectly reliable in both directions (in our 3 re-runs of each setup). Most models that find the bug also false-positive on the fix, fabricating arguments about signed-integer bypasses that are technically wrong (oa_length is u_int in FreeBSD’s sys/rpc/rpc.h). Full details in the appendix.
This directly addresses the sensitivity vs specificity question some readers raised. Models, partially drive by prompting, might have excellent sensitivity (100% detection across all runs) but poor specificity on this task. That gap is exactly why the scaffold and triage layer are essential, and why I believe the role of the full system is vital. A model that false-positives on patched code would drown maintainers in noise. The system around the model needs to catch these errors.
The Anthropic post’s most impressive content is in exploit construction: PTE page table manipulation, HARDENED_USERCOPY bypasses, JIT heap sprays chaining four browser vulnerabilities into sandbox escapes. Those are genuinely sophisticated.
A plausible capability boundary is between “can reason about exploitation” and “can independently conceive a novel constrained-delivery mechanism.” Open models reason fluently about whether something is exploitable, what technique to use, and which mitigations fail. Where they stop is the creative engineering step: “I can re-trigger this vulnerability as a write primitive and assemble my payload across 15 requests.” That insight, treating the bug as a reusable building block, is where Mythos-class capability genuinely separates. But none of this was tested with agentic infrastructure. With actual tool access, the gap would likely narrow further.
For many defensive workflows, which is what Project Glasswing is ostensibly about, you do not need full exploit construction nearly as often as you need reliable discovery, triage, and patching. Exploitability reasoning still matters for severity assessment and prioritization, but the center of gravity is different. And the capabilities closest to that center of gravity are accessible now.
The Mythos announcement is very good news for the ecosystem. It validates the category, raises awareness, commits real resources to open source security, and brings major industry players to the table.
But the strongest version of the narrative, that this work fundamentally depends on a restricted, unreleased frontier model, looks overstated to us. If taken too literally, that framing could discourage the organizations that should be adopting AI security tools today, concentrate a critical defensive capability behind a single API, and obscure the actual bottleneck, which is the security expertise and engineering required to turn model capabilities into trusted outcomes at scale.
What appears broadly accessible today is much of the discovery-and-analysis layer once a good system has narrowed the search. The evidence we’ve presented here points to a clear conclusion: discovery-grade AI cybersecurity capabilities are broadly accessible with current models, including cheap open-weights alternatives. The priority for defenders is to start building now: the scaffolds, the pipelines, the maintainer relationships, the integration into development workflows. The models are ready. The question is whether the rest of the ecosystem is.
We think it can be. That’s what we’re building.
We want to be explicit about the limits of what we’ve shown:
* Scoped context: Our tests gave models the vulnerable function directly, often with contextual hints (e.g., “consider wraparound behavior”). A real autonomous discovery pipeline starts from a full codebase with no hints. The models’ performance here is an upper bound on what they’d achieve in a fully autonomous scan. That said, a well-designed scaffold naturally produces this kind of scoped context through its targeting and iterative prompting stages, which is exactly what both AISLE’s and Anthropic’s systems do.
* No agentic testing: We did not test exploitation or discovery with tool access, code execution, iterative loops, or sandbox environments. Our results are from plain API calls.
* Updated model performance: The OWASP test was originally run in May 2025; Anthropic’s Opus 4.6 and Sonnet 4.6 now pass. But the structural point holds: the capability appeared in small open models first, at a fraction of the cost.
* What we are not claiming: We are not claiming Mythos is not capable. It almost certainly is to an outstanding degree. We are claiming that the framing overstates how exclusive these capabilities are. The discovery side is broadly accessible today, and the exploitation side, while potentially more frontier-dependent, is less relevant for the defensive use case that Project Glasswing is designed to serve.
Stanislav Fort is Founder and Chief Scientist at AISLE. For background on the work referenced here, see AI found 12 of 12 OpenSSL zero-days on LessWrong and What AI Security Research Looks Like When It Works on the AISLE blog.
Kimi K2: “oa->oa_length is parsed directly from an untrusted network packet… No validation ensures oa->oa_length before copying. MAX_AUTH_BYTES is 400, but even that cap exceeds the available space.”
Gemma 4 31B: “The function can overflow the 128-byte stack buffer rpchdr when the credential sent by the client contains a length that exceeds the space remaining after the 8 fixed-field header.”
The same models reshuffle rankings completely across different cybersecurity tasks. FreeBSD detection is a straightforward buffer overflow; FreeBSD patched tests whether models recognize the fix; the OpenBSD SACK bug requires multi-step mathematical reasoning about signed integer overflow and is graded with partial credit (A through F); the OWASP test requires tracing data flow through a short Java function.
We ran the patched FreeBSD svc_rpc_gss_validate function (with the bounds check added) through the same models, 3 trials each. The correct answer is that the patched code is safe. The most common false-positive argument is that oa_length could be negative and bypass the check. This is wrong: oa_length is u_int (unsigned) in FreeBSD’s sys/rpc/rpc.h, and even if signed, C promotes it to unsigned when comparing with sizeof().
100% sensitivity across all models and runs.
The most common false-positive argument is that oa_length could be negative, bypassing the > 96 check. This is wrong: oa_length is u_int (unsigned) in FreeBSD’s sys/rpc/rpc.h. Even if it were signed, C promotes it to unsigned when comparing with sizeof() (which returns size_t), so -1 would become 0xFFFFFFFF and fail the check.
...
Read the original on aisle.com »
Analyzing every Firefox extension Installing every Firefox extension Using every Firefox extension
*All but 8 we didn’t scrape (or got deleted between me checking the website and me scraping) and 42 missing from extensions.json.1 Technically we only installed 99.94% of the extensions.
It turns out there’s only 84 thousand Firefox extensions. That sounds feasibly small. That even sounds like it’s less than 50 gigabytes. Let’s install them all!
There’s a public API for the add-ons store. No authentication required, and seemingly no rate limits. This should be easy.
The search endpoint can take an empty query. Let’s read every page:
The search API only gives me 600 pages, meaning I can only see 30 thousand extensions, less than half of them.
A solution I found is to use different sorts. The default sort is sort=recommended,users: first recommended extensions, then sorted by users, descending. Changing to just sort=created gave me some of the long tail:
I’m still missing 30,0252 extensions, so I added rating and hotness too.
Starting to hit diminishing returns. While I was waiting 7 minutes for that last list to get scraped because my code didn’t fetch in parallel, I had an epiphany: use exclude_addons. I can just fetch page 600 and exclude all its addons to get page 601.
It works! There is a URL length limit, sadly, so I can only fetch an extra 20 pages.
A lot less than I expected, especially considering what happens when I add the downloads sort:
Reading the docs again, I notice I can filter by category as well. I’m tired of waiting 7 minutes so I’ll just fetch every page in parallel.
I got basically all the extensions with this, making everything I did before this look really stupid.
That’s 8 less extensions than what it says on the website. When I ran this in September 2025, it found 21 more extensions than what was mentioned on the website, so I think this is enough.
So that nobody has to do this again, I’ve uploaded this dataset to Hugging Face.
The search API supports date filters: created__gte and created__lte. The API also returns the full number of extensions that match your search.
You can start with a filter that includes all extensions, then keep splitting the ranges in half until it is less than 30 thousand, then fetch all of them.
I’ve updated the downloader: it is faster, wastes fewer requests, and seems to scrape exactly all the extensions, too.
This won’t work if over 30 thousand extensions get created in a single second, which I can’t imagine will ever happen.
I have a copy of Bun and all_extensions.json, so I will torment you with my unmatched script power.
The biggest Firefox extension is dmitlichess at 196.3 MB, which contains 2000+ audio files.
Here’s the rest of the top ten:
The first time I ran this analysis, in September, “Cute doggy - Dog puppies” was the 10th largest extension. I’m still mentioning it here, because I was so fucking confused:
The smallest extension is theTabs-saver, which is 7518 bytes and has no code.
FalscheLaden, with no users, requests 3,695 permissions. The author has posted a writeup.
Second place is Google Dark Theme, which requests 2,675 permissions but has 1,687 users.
Dr. B is the king of slop, with 84 extensions published, all of them vibe coded.
How do I know? Most of their extensions have a README.md in them describing their process of getting these through addon review, and mention Grok 3. Also, not a single one of them have icons or screenshots.
Personally, I’m shocked this number is this low. I expected to see some developers with hundreds!
I reviewed the source of a couple homoglyph attacks on crypto wallets discovered in the dataset and was disappointed to find out they just pop up a form asking for your seed phrase and send it off to their server. It’s an extension!!! You can steal their coinbase.com token! You can monitor the clipboard and swap out their address for yours! You can crash their browser and claim your real malware is the fix!
Why would you make a fake MetaMask extension and bot 1-star reviews?
Is this the doing of their cybercrime competitors, who bot 4-star reviews on extensions of their own?
Either way, these extensions are clearly phishing. I reported some to Mozilla, and the next day they were all gone, even the ones I was too lazy to report. I forgot to archive them, so I guess they live on in May’s VM!
In terms of implementation, the most interesting one is “Іron Wаllеt” (the I, a, and e are Cyrillic). Three seconds after install, it fetches the phishing page’s URL from the first record of a NocoDB spreadsheet and opens it:
I think the extension’s “no accounts or remote code” description is really funny, like putting “no copyright infringement intended” in your video’s description in case YouTube is watching. The API key had write access, so I wiped the spreadsheet.
You get a “Homepage” link in your extension’s page and your own page.
It’s been nofollow for two years, but that hasn’t stopped grifters from trying anyway.
On Attempt 1, I encountered Typo Sniper and Tab Fortune Teller, AI generated extensions with casinos in their author’s Homepage links.
In the dataset, there’s many “Code Injector” extensions, which are all virtually identical and also have random websites in their author’s Homepage link.
All of these extensions are from 2025. Is there an ancient SEO guide circulating? Is there some evil AMO frontend they’re still getting a backlink from? I have no idea what’s happening here.
All of these extensions are their author’s only uploads and they have their own domains. Most of them are on both Chrome and Firefox, their websites look the same, and they all have a terms of service referencing “Innover Online Group Ltd”, which is a .png for some reason.
Because I scraped every Firefox extension twice, I can see what got removed in between the runs. Three of Innover Group’s extensions—Earth View 360°, View Manuals, and View Recipes, totaling 115 thousand users—have been disabled by Mozilla.
Innover Group runs Google ads for their extensions, a lot of them simply saying “Continue”.
The “Custom Web Search” is Yahoo but with their affilate code. That code being safeplexsearch, which has a website of its own which of course mentions Innover Online Group Ltd, and links to an addon with 3,892 users, which is actually a Firefox exclusive. Actually, “Custom Web Search” is a Firefox exclusive on all of these extensions. Why did they even make a Chrome version, to sell them to the NSA??
One user claimed Ezy Speed Test “disables Ublock [sic] Origin once installed”, which I did not find in its code.
There’s a million companies like this, though. I just went to Download.com with my ad-blocker off and discovered the company Atom Apps in an ad, which also uploads extensions for both Chrome and Firefox, with a new account for each extension, only includes Yahoo in the Firefox version, with names that end in either “and Search” or ”& Search”, and has their company name as a .png in their terms of service. They have 220 thousand daily users total across 12 extensions, and none of theirs have been disabled.
* 34.3% of extensions have no daily users
25.1% of extensions have more than 10 daily users
10.6% of extensions have more than 100 daily users
3.2% of extensions have more than 1000 daily users
0.7% of extensions have more than 10000 daily users
* 25.1% of extensions have more than 10 daily users
* 10.6% of extensions have more than 100 daily users
* 3.2% of extensions have more than 1000 daily users
* 0.7% of extensions have more than 10000 daily users
* 76.7% of extensions are open source (SPDX license that isn’t All Rights Reserved)
* 23% of extensions were created after I started writing this article
19% of extensions have no users, no reviews, no screenshots, no downloads, and no icon
* 19% of extensions have no users, no reviews, no screenshots, no downloads, and no icon
* 2.4% of extensions require payment
38.1% of those are open source???
* 38.1% of those are open source???
Obviously I’m not going to open each of these in a new tab and go through those prompts. Not for lack of trying:
Each extension has the current_version.file.url property which is a direct download for the extension. I download them to my profile’s extensions folder with the guid property as the base name and the .xpi file extension, because anything else will not be installed.
Then, I delete the addonStartup.json.lz4 and extensions.json files. When I reopen Firefox, each extension is disabled. Tampering with extensions.json is common enough that you can ask any chatbot to do it for you:
My first attempt was in a tiny11 core VM on my desktop.
At first, instead of downloading all of them with a script, I tried using enterprise policies, but this copies all the extensions into the folder. I quickly ran out of memory, and the pagefile took up the rest of the storage allocated to the VM. I had also expected Firefox to open immediately and the extensions to install themselves as the browser is being used, but that also did not happen: it just froze.
After that, I tried downloading them myself.
To make sure I was installing extensions correctly, I moved the extensions folder elsewhere and then moved about a thousand extensions back in. It worked.
There were multiple extensions that changed all text to a certain string. bruh-ifier lost to Se ni važn. Goku is in the background.
My context menu is so long that I’m showing it sideways:
I had installed lots of protection extensions. One blocks traffic to .zip and .mov domains, presumably because they are file extensions. This is .cab erasure! Then, I realized that there were likely multiple people viewing my browsing history, so I went to send them a message.
That “⚠️ SCAM WARNING!” popup is from Anti-Phishing Alert. As you may have inferred, it seems to only exists for its Homepage link. How does it work?
Vasavi Fraudulent Detector also has a popup for when a site is safe:
Only the addons from Attempt 1 were actually loaded, because I didn’t know I needed to delete addonStartup.json.lz4 yet. I scrolled through the addons page, then I opened DevTools to verify it was the full 65,335, at which point Firefox froze and I was unable to reopen it.
After that, I made a new (non-admin) user on my Mac to try again on a more powerful device.
Every time I glanced at my script downloading extensions one at a time for six hours, I kept recognizing names. Oops, I’m the AMO subject-matter expert now! Parallelizing was making it slower by the last 4000 extensions, which didn’t happen on my Windows VM.
When that finished, I found out my hardware couldn’t run 65,335 extensions at once, sadly. The window does open after some time I didn’t measure, but the window never starts responding. I don’t have the balls to run my laptop overnight.3
Firefox did make over 400 GB of disk writes. Because I forgot swap existed, I checked the profile trying to find the culprit, which is when I learned I needed to delete addonStartup.json.lz4 and modify extensions.json. The extensions.json was 144 MB. For comparison, my PC’s extensions.json is 336 KB.
My solution: add 1000 extensions at a time until Firefox took too long to open. I got to 6000.
3000 extensions was the last point where I was at least able to load webpages.
After 4000 or more extensions, the experience is basically identical. Here’s a video of mine (epilepsy warning):
5000 was the same as 4000 but every website was blocked by some extension I know starts with an S and ends with Blocker and has a logo with CJK characters. At 6000 extensions, the only page that I could load was about:addons.
My desktop has 16 GB of RAM, and my laptop has 24 GB of unified memory. You might notice that 49.3 GB is more than twice that.
What you’re about to see was recorded in May’s virtual machine. Do not try this on your main profile.
My download script started in parallel, then we switched it to serial when it slowed down. In total, downloading took about 1 hour and 43 minutes.
I was on a call the entire time, and we spotted a lot of strange extensions in the logs. What kind of chud would use “KiwiFarms Math Renderer”? Are they drafting the theory of soytivity?
Turning on Mullvad VPN and routing to Tel Aviv appeared to speed up the process. This was not because of Big Yahu, but because May restarted the script, so she repeated that a couple times. Whether that’s a Bun bug, I don’t know and I don’t care. May joked about a “version 2” that I dread thinking about.
Defender marked one extension, HackTools, as malware. May excluded the folder after that, so it may not be the only one.
Firefox took its sweet time remaking extensions.json, and it kept climbing. About 39 minutes of Firefox displaying a skeleton (hence “it has yet to render a second frame”) later, it was 189 MB large: a new record! May killed Firefox and ran enable.js.
I did some research to find why this took so long.
13 years ago, extensions.json used to be extensions.sqlite. Nowadays, extensions.json is serialized and rewritten in full on every write debounced to 20 ms, which works fine for 15 extensions but not 84,194.
Finally, we see the browser. The onboarding tabs trickled in, never loading.
May reopened it, took a shower, and came back to this:
IT STABLIZED. YOU CAN (barely) RUN FIREFOX WITH ALL 84 THOUSAND EXTENSIONS.
Well, we were pretty sure it had 84 thousand extensions. It had Tab Counter, at least, and the scrollbar in the extensions panel was absolutely massive.
She loaded the configure pages of two extensions. The options iframe never loaded.
I realized we need to disable auto update before Firefox sends another 84 thousand requests. This one took a while to load.
The list loaded but with no icons and stopped responding, and 6 hours later it had loaded fully.
We recorded the entire process; the memory usage fluctuated between 27 and 37 GiB the entire time.
...
Read the original on jack.cab »
France will cut its reliance on extra-EU proprietary tech, favoring open-source and digital sovereignty.
DINUM orders ministries to map dependencies and plan exit from extra-European tech by fall.
As open-source tools begin to catch up with their proprietary cousins, people are realizing they’re handing over far more control to businesses than they probably need to. After all, when two apps essentially do the same thing, but one is open-source, and the other can cut you off from its service on a moment’s notice, it’s hard to justify using the latter.
Now, the French government has decided that enough is enough. It has announced that it will shift away from proprietary technologies from outside the European Union and focus more on open-source solutions — and part of that means ditching Windows for Linux.
Linux breaks a new record for US market share as people presumably flee Windows for its open-source rival
Is Microsoft’s grip on Windows users starting to crumble?
France begins cutting itself from US tech as it moves to open-source solutions
Europe does have its fair share of EU-based answers
On the numérique website, the direction interministérielle du numérique (DINUM) issued a statement on its stance regarding what it calls “extra-European” tech. This term essentially refers to anything outside the European Union, but some of the statements and goals the DINUM has made specifically name America as a country it’s planning to break away from.
One of the key elements of this foreign breakaway is DINUM’s “exit from Windows in favor of workstations running on the Linux operating system.” While it’s one of DINUM’s biggest points, the source does say it intends to bring this same mentality across all of its tech. Ministries have until fall to draw up a plan for how they will remove themselves from extra-European sources, with a rollout date not yet confirmed.
David Amiel, Minister of Public Action and Accounts, makes a strong case for ditching proprietary technology outside the EU (machine translated from French):
The State can no longer simply acknowledge its dependence; it must break free. We must become less reliant on American tools and regain control of our digital destiny. We can no longer accept that our data, our infrastructure, and our strategic decisions depend on solutions whose rules, pricing, evolution, and risks we do not control. The transition is underway: our ministries, our operators, and our industrial partners are now embarking on an unprecedented initiative to map our dependencies and strengthen our digital sovereignty. Digital sovereignty is not optional.
So, where does this leave Linux? It’ll be interesting to see where the DINUM goes from here. If its main concern is being locked into a proprietary business model outside the EU, it likely won’t have an issue using open-source solutions, regardless of where the software originates. If it does want to go full EU-only, it does have some options; some open-source software, like the operating system openSUSE and the productivity suite LibreOffice, originates from within the EU, so it won’t be too stuck for choice.
With support for Windows 10 ending, LibreOffice creator thinks you should switch to Linux instead of Windows 11
It has criticized Microsoft’s aggressive practices, licensing models, and telemetry, noting that Linux + LibreOffice is actually the superior combo.
...
Read the original on www.xda-developers.com »
Here is a photo of my family. I love them more than anything.
Images have power, I hope. Normally we try to be pretty private, but in this case I am sharing a photo in the hopes that it might dissuade the next person from throwing a Molotov cocktail at our house, no matter what they think about me.
The first person did it last night, at 3:45 am in the morning. Thankfully it bounced off the house and no one got hurt.
Words have power too. There was an incendiary article about me a few days ago. Someone said to me yesterday they thought it was coming at a time of great anxiety about AI and that it made things more dangerous for me. I brushed it aside.
Now I am awake in the middle of the night and pissed, and thinking that I have underestimated the power of words and narratives. This seems like as good of a time as any to address a few things.
First, what I believe.
*Working towards prosperity for everyone, empowering all people, and advancing science and technology are moral obligations for me.
*AI will be the most powerful tool for expanding human capability and potential that anyone has ever seen. Demand for this tool will be essentially uncapped, and people will do incredible things with it. The world deserves huge amounts of AI and we must figure out how to make it happen.
*It will not all go well. The fear and anxiety about AI is justified; we are in the process of witnessing the largest change to society in a long time, and perhaps ever. We have to get safety right, which is not just about aligning a model—we urgently need a society-wide response to be resilient to new threats. This includes things like new policy to help navigate through a difficult economic transition in order to get to a much better future.
*AI has to be democratized; power cannot be too concentrated. Control of the future belongs to all people and their institutions. AI needs to empower people individually, and we need to make decisions about our future and the new rules collectively. I do not think it is right that a few AI labs would make the most consequential decisions about the shape of our future.
*Adaptability is critical. We are all learning about something new very quickly; some of our beliefs will be right and some will be wrong, and sometimes we will need to change our mind quickly as the technology develops and society evolves. No one understands the impacts of superintelligence yet, but they will be immense.
As I reflect on my own work in the first decade of OpenAI, I can point to a lot of things I’m proud of and a bunch of mistakes.
I was thinking about our upcoming trial with Elon and remembering how much I held the line on not being willing to agree to the unilateral control he wanted over OpenAI. I’m proud of that, and the narrow path we navigated then to allow the continued existence of OpenAI, and all the achievements that followed.
I am not proud of being conflict-averse, which has caused great pain for me and OpenAI. I am not proud of handling myself badly in a conflict with our previous board that led to a huge mess for the company. I have made many other mistakes throughout the insane trajectory of OpenAI; I am a flawed person in the center of an exceptionally complex situation, trying to get a little better each year, always working for the mission. We knew going into this how huge the stakes of AI were, and that the personal disagreements between well-meaning people I cared about would be amplified greatly. But it’s another thing to live through these bitter conflicts and often to have to arbitrate them, and the costs have been serious. I am sorry to people I’ve hurt and wish I had learned more faster.
I am also very aware that OpenAI is now a major platform, not a scrappy startup, and we need to operate in a more predictable way now. It has been an extremely intense, chaotic, and high-pressure few years.
Mostly though, I am extremely proud that we are delivering on our mission, which seemed incredibly unlikely when we started. Against all odds, we figured out how to build very powerful AI, figured out how to amass enough capital to build the infrastructure to deliver it, figured out how to build a product company and business, figured out how to deliver reasonably safe and robust services at a massive scale, and much more.
A lot of companies say they are going to change the world; we actually did.
Third, some thoughts about the industry.
My personal takeaway from the last several years, and take on why there has been so much Shakespearean drama between the companies in our field, comes down to this: “Once you see AGI you can’t unsee it.” It has a real “ring of power” dynamic to it, and makes people do crazy things. I don’t mean that AGI is the ring itself, but instead the totalizing philosophy of “being the one to control AGI”.
The only solution I can come up with is to orient towards sharing the technology with people broadly, and for no one to have the ring. The two obvious ways to do this are individual empowerment and making sure democratic system stays in control.
It is important that the democratic process remains more powerful than companies. Laws and norms are going to change, but we have to work within the democratic process, even though it will be messy and slower than we’d like. We want to be a voice and a stakeholder, but not to have all the power.
A lot of the criticism of our industry comes from sincere concern about the incredibly high stakes of this technology. This is quite valid, and we welcome good-faith criticism and debate. I empathize with anti-technology sentiments and clearly technology isn’t always good for everyone. But overall, I believe technological progress can make the future unbelievably good, for your family and mine.
While we have that debate, we should de-escalate the rhetoric and tactics and try to have fewer explosions in fewer homes, figuratively and literally.
...
Read the original on blog.samaltman.com »
Add AP News as your preferred source to see more of our stories on Google.
Add AP News as your preferred source to see more of our stories on Google.
On July 8, 1989, a young music fan named Aadam Jacobs, with a compact Sony cassette recorder in his pocket, went to see an up-and-coming rock band from Washington for their debut show in Chicago.
After a blast of guitar feedback, 22-year-old Kurt Cobain politely announced to the crowd at the small club called Dreamerz: “Hello, we’re Nirvana. We’re from Seattle.” With that, the band, then a quartet, launched into the riff-heavy first song, “School.”
Jacobs surreptitiously recorded the performance, documenting the fledgling band in raw, fiery form more than two years before Nirvana’s global breakthrough with the album “Nevermind.”
Jacobs went on to record more than 10,000 concerts, with increasingly sophisticated equipment, over four decades in Chicago and other cities. Now a group of devoted volunteers in the U. S. and Europe is methodically cataloging, digitizing and uploading them one by one.
The growing Aadam Jacobs Collection is an internet treasure trove for music lovers, especially for fans of indie and punk rock during the 1980s through the early 2000s, when the scene blossomed and became mainstream. The collection features early-in-their-career performances from alternative and experimental artists like R. E.M., The Cure, The Pixies, The Replacements, Depeche Mode, Stereolab, Sonic Youth and Björk.
There’s also a smattering of hip-hop, including a 1988 concert by rap pioneers Boogie Down Productions. Devotees of Phish were thrilled to discover that a previously uncirculated 1990 show by the jam band is included. And there are hundreds of sets by smaller artists who are unlikely to be known to even fans with the most obscure tastes.
All of it is slowly becoming available for streaming and free download at the nonprofit online repository Internet Archive, including that nascent Nirvana show recording, with the audio from Jacobs’ cassette recorder cleaned up.
By the time Jacobs sneaked his tape recorder into that Nirvana gig, he had been recording concerts for five years already. As a teen discovering music, Jacobs began taping songs off the radio.
“And I eventually met a fellow who said, ‘You can just take a tape recorder into a show with you, just sneak it in, record the show.’ And I thought, ‘Wow, that’s cool.’ So I got started,” Jacobs, now 59, recalled.
He doesn’t remember offhand what that first concert was in 1984, but he taped it with a tiny Dictaphone-type device that he borrowed from his grandmother. A short time later, he bought the Sony Walkman-style tape recorder. When that broke, he briefly used his home console cassette machine stuffed in a backpack that a generous soundman let him plug in.
“I was using, at times, pretty lackluster equipment, simply because I had no money to buy anything better,” he said. Later, he moved on to digital audio tape, or DAT, and, as technology progressed, to solid-state digital recorders.
Jacobs doesn’t consider himself obsessive or, as many call him, an archivist. He says he’s just a music fan. He figured if he was going to attend a few concerts a week anyway, why not document them? In the early years, he contended with contentious club owners who tried to prevent him from taping. But they eventually relented as he became a fixture in the music scene, and many began letting the “taper guy” in for free.
Author Bob Mehr, who wrote about Jacobs in 2004 for the Chicago Reader, calls him one of the city’s cultural institutions.
“He’s a character. I think you have to be, to do what he does,” Mehr said. “But I think he proved over time that his intentions were really pure.”
After filmmaker Katlin Schneider made a documentary about Jacobs in 2023, a volunteer with the Internet Archive reached out to suggest his collection be preserved. “Before all the tapes started not working because of time, just disintegrating, I finally said yes,” he said.
Once a month, Brian Emerick makes the trip from the Chicago suburbs to Jacobs’ house in the city to pick up 10 or 20 boxes each stuffed with 50 or 100 tapes. Emerick’s job is to transfer — in real time — the analog recordings to digital files that can be sent to other volunteers who mix and master the shows for upload to the archive. Emerick has a room devoted to his setup of outdated cassette and DAT decks.
“So many of the machines I find are broken. They’re trashed. And so I learned how to fix those, get them running again,” said Emerick. “Currently, I have 10 working cassette decks, and I run those all simultaneously.”
Emerick estimates he’s digitized at least 5,500 tapes since late 2024 and that it will take another few years to complete the project. The digital files are claimed by a dozen or so volunteer-engineers in the U. S, U.K. and Germany who provide the metadata and clean up the audio. Among them is Neil deMause in Brooklyn, who said he’s constantly impressed by the audio fidelity of the original tapes, especially considering Jacobs was using “weird RadioShack mics” and other primitive equipment.
“Especially after the first couple years, he’s got it so dialed in that some of these recordings, on, like, crappy little cassette tapes from the early 90s, sound incredible,” deMause said.
Emerick pointed to a 1984 James Brown concert as a gem he discovered in the stacks.
Often, the hardest job is figuring out song titles. Occasionally, Jacobs kept helpful notes, but the volunteers frequently spend days consulting each other, searching and even reaching out to artists to make sure the setlists are accurately documented.
Jacobs said the majority of the artists he recorded are pleased to have their work preserved. As for copyright concerns, he’s happy to remove recordings if requested, but added that only one or two musicians so far have asked that their material be taken down.
“I think that the general consensus is, it’s easier to say I’m sorry than to ask for permission,” he said. The Internet Archive declined to comment for this story. David Nimmer, a longtime copyright attorney who also teaches at the University of California, Los Angeles, said that under anti-bootlegging laws, the artists technically own the original compositions and live recordings. But since neither Jacobs nor the archive is profiting from the endeavor, lawsuits seem unlikely.
The Replacements, a foundational punk-alternative band, were so happy with Jacobs’ tape of a 1986 show that they mixed some of it in with a soundboard recording. They released it in 2023 as a live album as part of a box set produced by Mehr.
Jacobs stopped recording a few years ago as worsening health problems sapped his desire to go out and see concerts. But he still enjoys experiencing live music he finds online, much of it recorded by a new generation of fans.
“Since everybody’s got a cellphone, anybody can record a concert,” he said.
This story was updated to correct the spelling of Jacobs in one instance.
...
Read the original on apnews.com »
Universal basic income is an idea that hasn’t gained much traction, but South Korea on Thursday implemented a universal basic mobile data access scheme.
The nation’s Ministry of Science announced the plan yesterday with a statement and a rather more interesting giant infographic that both explain the scheme will provide over seven million subscribers with unlimited downloads at just 400 kbps after their data allowances expire. South Korea’s dominant carriers, SK Telecom, KT, and LG Uplus, have agreed to the plan.
Deputy Prime Minister and Minister for Science and ICT Bae Kyunghoon said the scheme is needed because citizens can’t do without access to online services, and also because South Korea’s telcos need to re-earn their social licenses after recent security lapses that saw shoddy security practices at SK Telecom lead to a massive leak, a 3TB dark web data drama at LG Uplus, and woeful femtocell security at KT — which may also have distributed malware to its customers.
“We have now reached a critical juncture where we must move beyond mere pledges not to repeat past mistakes,” the deputy PM said. “Instead, we must respond with a level of innovation and contribution — a complete transformation — that the public can tangibly perceive.”
“It is crucial to contribute to public welfare — such as by guaranteeing basic telecommunications rights for all citizens — while actively investing to lead the way toward a future defined by an AI-driven society,” he added.
The universal basic data scheme is not the only act of contrition South Korea’s telcos promised to perform.
They’ve also resolved to introduce low-priced 5G plans that cost ₩20,000 or less ($13.50), and to increase data and calling allowances for senior citizens. The government also extracted promises to upgrade Wi-Fi services on subways and long-distance trains.
Bae didn’t just wield a stick: He also dangled a carrot in the form of a promise to support research on networks that will support AI applications. But he also urged the three telcos to invest more in the networks — not just datacenters — to make AI applications accessible to all. ®
...
Read the original on www.theregister.com »
I created my first AWS account at 10:31 PM on April 10th, 2006. I had
seen the announcement of Amazon S3 and had been thinking vaguely about
the problem of secure backups — even though I didn’t start
Tarsnap until several months
later — and the idea of an online storage service appealed to me.
The fact that it was a web service made it even more appealing; I had
been building web services since 1998, when I decided that coordinating
a world-record-setting
computation of Pi over HTTP would be easier than doing it over
email.
While I created my AWS account because I was interested in Amazon S3, that was not in fact immediately available to me: In the early days of AWS, you had to specifically ask for each new service to be enabled for your account. My new AWS account did come with two services enabled by default, though — Amazon Simple Queue Service, which most people know as “the first AWS service”, and Amazon E-Commerce Service, an API which allowed Amazon affiliates to access Amazon.com’s product catalogue — which was the real first AWS service, but which most people have never heard of and which has been quietly scrubbed from AWS history.
It didn’t take long before I started complaining about things. By this point I was the FreeBSD Security Officer, so my first interest with anything in the cloud was security. AWS requests are signed with API keys providing both authentication and integrity protection — confirming not only that the user was authorized, but also that the request hadn’t been tampered with. There is, however, no corresponding signature on AWS responses — and at this time it was still very common to make AWS requests over HTTP rather than HTTPS, so the possibility of response tampering was very real. I don’t recall if anyone from Amazon showed any interest when I posted about this on the (long-disappeared) AWS Developer Forums, but I still think it would be a good thing to have: With requests going over TLS it is obviously less critical now, but end-to-end signing is always going to be better than transport-layer security.
Of course, as soon as Amazon EC2 launched I had a new target: I wanted to run FreeBSD on it! I reached out to Jeff Barr via his blog and he put me in touch with people inside Amazon, and in early 2007 I had my first Amazon NDA. (Funny story, in 2007 Amazon was still using fax machines — but I didn’t have a fax machine, so my first briefing was delayed while I snail-mailed a wet-ink signature down to Seattle.) Among the features I was briefed on was “Custom Kernels”; much like how AWS Lambda works today, Amazon EC2 launched without any “bring your own kernel” support. Obviously, to bring FreeBSD support to EC2 I was going to need to use this functionality, and it launched in November 2007 when Amazon EC2 gained the ability to run Red Hat; soon after that announcement went out, my FreeBSD account was allowlisted for the internal “publish Amazon Kernel Images” API.
But I didn’t wait for this functionality to be offered before providing more feedback about Amazon EC2. In March 2007 I expressed concerns to an Amazonian about the security of Xen — it was at the time still quite a new system and Amazon was the first to be deploying it in truly hostile environments — and encouraged them to hire someone to do a thorough security audit of the code. When the Amazonian I was speaking to admitted that they didn’t know who to engage for this, I thought about the people I had worked with in my time as FreeBSD Security Officer and recommended Tavis Ormandy to them. Later that year, Tavis was credited with reporting two vulnerabilities in Xen (CVE-2007-1320 and CVE-2007-1321); whether there is any connection between those events, I do not know.
I also mentioned — in fact in one of Jeff Barr’s AWS user meetups in Second Life — that I wanted a way for an EC2 instance to be launched with a read-only root disk and a guaranteed state wipe of all memory on reboot, in order to allow an instance to be “reset” into a known-good state; my intended use case for this was building FreeBSD packages, which inherently involves running untrusted (or at least not-very-trusted) code. The initial response from Amazonians was a bit confused (why not just mount the filesystem read-only) but when I explained that my concern was about defending against attackers who had local kernel exploits, they understood the use case. I was very excited when EC2 Instance Attestation launched 18 years later.
I ended 2007 with a blog post which I was told was quite widely read within Amazon: Amazon,
Web Services, and Sesame Street. In that post, I complained about the problem of Eventual Consistency and argued for a marginally stronger model: Eventually Known Consistency, which still takes the “A” route out of the CAP theorem, but exposes enough internal state that users can also get “C” in the happy path. Amazon S3 eventually flipped from being optimized for Availability to being optimized for Consistency (while still having extremely high Availability), and of course DynamoDB is famous for giving users the choice between Eventual or Strongly consistent reads; but I still think the model of Eventually Known Consistency is the better theoretical model even if it is harder for users to reason about.
In early 2008, Kip Macy got FreeBSD working on Xen with PAE — while FreeBSD was one of the first operating systems to run on Xen, it didn’t support PAE and I was at the time not competent to write such low-level kernel code, so despite being the driving force behind FreeBSD/EC2 efforts I had to rely on more experienced developers to write the kernel code at the time. I was perfectly comfortable with userland code though — so when Amazon sent me internal “AMI tools” code (necessary for using non-public APIs), I spent a couple weeks porting it to run on FreeBSD. Protip: While I’m generally a tools-not-policy guy, if you find yourself writing Ruby scripts which construct and run bash scripts, you might want to reconsider your choice of languages.
Unfortunately even once I got FreeBSD packaged up into an AKI (Amazon Kernel Image) and AMI (Amazon Machine Image) it wouldn’t boot in EC2; after exchanging dozens of emails with Cape Town, we determined that this was due to EC2 using Xen 3.0, which had a bug preventing it from supporting recursive page tables — a cute optimization that FreeBSD’s VM code used. The problem was fixed in Xen 3.1, but Xen didn’t have stable ABIs at that point, so upgrading EC2 to run on Xen 3.1 would have broken existing AMIs; while it was unfortunate for FreeBSD, Amazon made the obvious choice here by sticking with Xen 3.0 in order to support existing customers.
In March 2008, I received one of those emails which only really seems notable in hindsight:
Hi Colin,
This is Matt Garman from the EC2 team at Amazon. […]
Matt was inviting me to join the private Alpha of “Elastic Block Storage” (now generally known as “Elastic Block Store” — I’m not sure if Matt got the name wrong or if the name changed). While I was excited about the new functionality, as I explained to Matt the best time to talk to me about a new service is before building it. I come from a background of mathematics and theory; I can provide far more useful feedback on a design document than from alpha-test access.
By April 2008 I had Tarsnap in private beta and I was working on its accounting code — using Amazon SimpleDB as a storage back-end to record usage and account balances. This of course meant that I had to read the API documentation and write code for signing SimpleDB requests — back then it was necessary, but I still write my own AWS interface code rather than using any of their SDKs — and a detail of the signing scheme caught my eye: The canonicalization scheme had collisions. I didn’t have any contacts on the SimpleDB team — and Amazon did not at the time have any “report security issues here” contacts — so on May 1st I sent an email to Jeff Barr starting with the line “Could you forward this onto someone from the SimpleDB team?”
While the issue wasn’t fixed until December, Amazon did a good job of handling this — and stayed in contact with me throughout. They asked me to review their proposed “signature version 2” scheme; fixed their documentation when I pointed out an ambiguity; corrected what I euphemistically referred to as a “very weird design decision”; and allowlisted my account so I could test my code (which I had written against their documentation) against their API back-end. (I wrote more about this in my blog post
AWS
signature version 1 is insecure.)
In June 2008 I noticed that NextToken values — returned by SimpleDB when a query returns too many results and then passed back to SimpleDB to get more results — were simply base64-encoded serialized Java objects. This was inherently poor security hygiene: Cookies like that should be encrypted (to avoid leaking internal details) and signed (to protect against tampering). I didn’t know how robust Amazon’s Java object deserializer was, but this seemed like something which could be a problem (and should have been fixed regardless, as a poor design decision even if not exploitable), so I reported it to one of the people I was now in contact with on the SimpleDB team… and heard nothing back. Six months later, when a (perhaps more security minded) engineer I had been working with on the signing issue said “let me know if you find more security problems; since we don’t yet have a security response page up, just email me” I re-reported the same issue and he wrote it up internally. (Even after this I still never received any response, mind you.)
Later in 2008, after Tarsnap was in public beta (but before it had much traction) — and after considerable prompting from Jeff Barr — I considered the possibility of working for Amazon. I had a phone interview with Al Vermeulen and slightly too late learned an important lesson: In a 45 minute interview, spending 30 minutes debating the merits of exceptions with an author of The Elements of Java Style is probably not the best use of time. I still firmly believe that I was correct — exceptions are an inherently poor way of handling errors because they make it easier to write bugs which won’t be immediately obvious on casual code inspection — but I also know that
it isn’t necessary to correct everyone
who is wrong.
Finally in November 2008, I drove down to Seattle for an AWS Start-up Tour event and met Amazonians in person for the first time; for me, the highlight of the trip was meeting the engineer I had been working with on the request signing vulnerability. We had a lengthy discussion about security, and in particular my desire for constrained AWS access keys: I was concerned about keys granting access to an entire account and the exposure it would create if they were leaked. I argued for cryptographically derived keys (e.g. hashing the master secret with “service=SimpleDB” to get a SimpleDB-only access key) while he preferred a ruleset-based design, which was more flexible but concerned me on grounds of complexity. Ultimately, I was entirely unsurprised when I was invited to join a private beta of IAM in January 2010 — and also somewhat amused when SigV4 launched in 2012 using derived keys.
For most of 2009 I was busy with growing Tarsnap. The EC2 team set up some Xen 3.1 hosts for testing and by mid-January I was able to launch and SSH into FreeBSD; but since EC2 had no concrete plans to upgrade away from Xen 3.0, the FreeBSD/EC2 project as a whole was still blocked. I did however notice and report a problem with the EC2 firewall: The default ruleset blocked ICMP, including Destination Unreachable (Fragmentation Required) messages — thereby breaking Path MTU Discovery. In December 2009 a manager in EC2 agreed with my proposed solution (adding a rule to the default ruleset) and wrote “I’ll let you know as soon as I have an implementation plan in place and am confident it will happen soon”. This was ultimately fixed in 2012, soon after I
raised the issue publicly.
By the start of 2010, with EC2 still stuck on an ancient version of Xen, I was starting to despair of ever getting FreeBSD running, so I turned to the next best option: NetBSD, which famously runs on anything. It only took me a week — and a few round trip emails to Cape Town to ask for console logs — to create a NetBSD AMI which could boot, mount its root filesystem, configure the network, and launch sshd. While Amazon was a bit wary about me announcing this publicly — they quite reasonably didn’t want me to say anything which could be construed as making a promise on their behalf — they agreed that I could discuss the work with developers outside the NDA, and the NetBSD team were excited to hear about the progress… although a bit confused as to why Amazon was still using paravirtualized Xen rather than HVM.
The lack of HVM continued to be a sore point — especially as I knew EC2 provided Xen/HVM for Windows instances — but in July 2010 Amazon launched “Cluster Compute” instances which supported HVM even for “Linux” images. I wasn’t able to boot FreeBSD on these immediately — while HVM solved the paging table problem, there were still driver issues to address — but this gave me some hope for progress, so when Matt Garman mentioned they were “thinking about” making HVM more broadly available I immediately wrote back to encourage such thoughts; by this point it was clear that PV was a technological dead end, and I didn’t want Amazon to be stuck on the wrong technology for any longer than necessary.
The first real breakthrough however came with the launch of the new
instance type in September. While it wasn’t publicly announced at the time, this new instance family ran on Xen 3.4.2 — which lacked the bug which made it impossible to run FreeBSD. By mid-November I was able to SSH into a FreeBSD/EC2 t1.micro instance, and on December 13, 2010,
I announced that FreeBSD was
now available for EC2 t1.micro instances.
Once I’d gotten that far, things suddenly got easier. Amazon now had customers using FreeBSD — and they wanted more FreeBSD. A Solutions Architect put me in touch with a FreeBSD user who wanted support for larger instances, and they paid me for the time it took to get
FreeBSD working on Cluster Compute instances; then it was pointed out to me that EC2 didn’t really know which OS we were running, and I proceeded to make FreeBSD available on all 64-bit instance types via
defenestration. Obviously this meant paying the “windows tax” to run FreeBSD — which Amazon was not very happy about! — but even with the added cost it filled an essential customer need. (This hack finally ceased to be necessary in July 2014, when T2 filled out the stable of instance types which supported running “Linux” on HVM.)
2012 was an exciting year. In April, I had the classic greybeard experience of debugging a network fault; I found that a significant proportion of my S3 requests to a particular endpoint were failing with peculiar errors, including SignatureDoesNotMatch failures. These error responses from Amazon S3 helpfully contained the StringToSign, and I could see that these did not match what I was sending to S3. I had enough errors to identify the error as a “stuck bit”; so I pulled out traceroute — this was pre-SRD so my packets were traversing a consistent path across the datacenter — and then proceeded to send a few million pings to each host along the path. The Amazonians on the AWS Developer Forums were somewhat bemused when I posted to report that a specific router had a hardware failure… and even more surprised when they were able to confirm the failure and replace the faulty hardware a few days later.
The highlight of 2012 however was the first re:Invent — which was short of technical content and had a horrible tshirt-to-suit ratio, but did give me the opportunity to talk to a number of Amazonians face to face. On one memorable occasion, after attending an Intel talk about “virtual machine security” (delivered by a VP who, in response to my questioning, professed to have no knowledge of “side channel attacks” or how they could affect virtual machines) I turned up at the EC2 booth in the expo hall to rant… and by complete accident ended up talking to a Principal engineer. I talked about
my work
exploiting HyperThreading to steal RSA keys, and explained that, while the precise exploit I’d found had been patched, I was absolutely certain there were many more ways that information could leak between two threads sharing a core. I ended with a strong recommendation: Based on my expertise in the field I would never run two EC2 instances in parallel on two threads of the same core. Years later, I was told that this recommendation was why so many EC2 instance families jumped straight to two vCPUs (“large”) and skipped the “medium” size.
Time passed. With FreeBSD fundamentally working, I turned to the “nice to haves”: merging my FreeBSD patches, simplifying the security update path (including automatically installing updates on first boot), and resizing the root filesystem on first boot. In April 2015, I finished integrating the FreeBSD/EC2 AMI build process into the FreeBSD src tree and handed off image builds to the FreeBSD release engineering team — moving FreeBSD/EC2 across the symbolic threshold from a “Colin” project to “official FreeBSD”. I was still the de facto owner of the platform, mind you — but at least I wasn’t responsible for running all of the builds.
In October 2016, I took a closer look at IAM Roles for Amazon EC2, which had launched in mid-2012. The more I thought about it, the more concerned I got; exposing credentials via the IMDS — an interface which runs over unauthenticated HTTP and which warned in its documentation against storing “sensitive data, such as passwords” — seemed like a recipe for accidental foot-shooting. I wrote a blog post “EC2′s most
dangerous feature” raising this concern (and others, such as overly broad IAM policies), but saw no response from Amazon… that is, not until July 2019, when Capital One was breached by exploiting the precise risk I had described, resulting in 106 million customers’ information being stolen. In November 2019, I had a phone call with an Amazon engineer to discuss their plans for addressing the issue, and two weeks later, IMDSv2 launched — a useful improvement (especially given the urgency after the Capital One breach) but in my view just a mitigation of one particular exploit path rather than addressing the fundamental problem that credentials were being exposed via an interface which was entirely unsuitable for that purpose.
In May 2019, I was invited to join the
AWS Heroes
program, which recognizes non-Amazonians who make significant contributions to AWS. (The running joke among Heroes is that a Hero is someone who works for Amazon but doesn’t get paid by Amazon.) The program is heavily weighted towards people who help developers learn how to use AWS (via blog posts, YouTube videos, workshops, et cetera), so I was something of an outlier; indeed, I was told that when I was nominated they weren’t quite sure what to make of me, but since I had been nominated by a Distinguished Engineer and a Senior Principal Engineer, they felt they couldn’t say no.
In March 2021, EC2 added support for booting x86 instances using UEFI; a “BootMode” parameter could be specified while registering an image to declare whether it should be booted using legacy BIOS or modern UEFI. For FreeBSD this was great news: Switching to UEFI mode dramatically sped up the boot process — performing loader I/O in 16-bit mode required bouncing data through a small buffer and cost us an extra 7 seconds of boot time. The only problem was that while all x86 instance types supported legacy BIOS booting, not all instance types supported UEFI — so I had to decide whether to degrade the experience for a small number of users to provide a significant speedup to most users. In June,
I requested a
BootMode=polyglot setting which would indicate that the image was able to boot either way (which, in fact, FreeBSD images already could) and instructed EC2 to pick the appropriate boot mode based on the instance. In March 2023, this landed as “BootMode=uefi-preferred”, which I had to admit was a friendlier, albeit less geeky, name for it.
One of the most important things about the AWS Heroes program is the briefings Heroes get, especially at the annual “Heroes Summit”. In August 2023, we had a presentation about Seekable OCI, and looking at the design I said to myself “hold on, they’re missing something here”: The speaker made security claims which were true under most circumstances, but did not hold in one particular use case. I wrote to the AWS Security team (unlike in 2008, there was now a well-staffed team with clear instructions on how to get in touch) saying, in part, “I’m not sure if this is them not understanding about [type of attack] or if it’s just an issue of confused marketing, but I feel like someone needs to have a conversation with them”. My sense was that this could probably be addressed with clear documentation saying “don’t do this really weird thing which you probably weren’t planning on doing anyway”, but since I wasn’t particularly familiar with the service I didn’t want to make assumptions about how it was being used. After a few email round trips I was assured that the problem had been corrected internally and that the fix would be merged to the public GitHub repository soon. I accepted these assurances — over the years I’ve developed a good relationship with AWS Security people and trust them to handle such matters — and put it out of my mind.
In December 2023, however, I was talking to some Amazonians at re:Invent and was reminded of the issue. I hadn’t heard anything further, which surprised me given that fixing this in code (rather than in documentation) would be fairly intrusive. I asked them to check up on the issue and they promised to report back to me in January, but they never did, and again I stopped thinking about it. The following re:Invent though, in December 2024, I met a Principal Engineer working on OCI and mentioned the issue to him — “hey, whatever happened with this issue?” — but he wasn’t aware of it. In January 2025, I raised it again with a Security Engineer; he found the original ticket from 2023 and talked to the team, who pointed at a git commit which they thought fixed it.
The issue had not, in fact, been fixed: The 2023 commit prevented the problem from being triggered by accidental data corruption, but did nothing to prevent a deliberate attack. Once I pointed this out, things got moving quickly; I had a Zoom call with the engineering team a few days later, and by the end of February the problematic feature had been disabled for most customers pending a “major revision”.
The largest change in my 20 years of working with Amazon started out as something entirely internal to FreeBSD. In September 2020, the FreeBSD Release Engineering Lead, Glen Barber, asked me if I could take on the role of Deputy Release Engineer — in other words, Hot Spare Release Engineer. As the owner of the FreeBSD/EC2 platform, I had been working with the Release Engineering team for many years, and Glen felt that I was the ideal candidate: reliable, trusted within the project, and familiar enough with release engineering processes to take over if he should happen to “get hit by a bus”. While I made a point of learning as much as I could about how Glen managed FreeBSD releases, like most hot spares I never expected to be promoted.
Unfortunately, in late 2022 Glen was hospitalized with pneumonia, and while he recovered enough to leave the hospital a few months later, it became clear that the long-term effects of his hospitalization made it inadvisable for him to continue as release engineer; so on November 17, 2023, Glen decided to step back from the role and I took over as FreeBSD Release Engineering Lead. I like to think that I’ve done a good job since then — running weekly snapshot builds, tightening schedules, establishing a predictable and more rapid release cadence, and managing four releases a year — but my volunteer hours weren’t unlimited, and it became clear that my release engineering commitments were making it impossible to keep up with EC2 support as well as I would have liked.
In April 2024 I confided in an Amazonian that I was “not really doing a good job of owning FreeBSD/EC2 right now” and asked if he could find some funding to support my work, on the theory that at a certain point time and dollars are fungible. He set to work, and within a couple weeks the core details had been sorted out; I received sponsorship from Amazon via GitHub Sponsors for 10 hours per week for a year and addressed a
large number of outstanding issues. After a six month hiatus — most of which I spent working full time, unpaid, on FreeBSD 15.0 release engineering — I’ve now started a second 12-month term of sponsorship.
While I like to think that I’ve made important contributions to AWS over the past 20 years, it’s important to note that this is by no means my work alone. I’ve had to remind Amazonians on occasion that I do not have direct access to internal AWS systems, but several Amazonians have stepped in as “remote hands” to file tickets, find internal contacts, inspect API logs, and obtain technical documentation for me. Even when people — including very senior engineers — have explicitly offered to help, I’m conscious of their time and call upon them as little as I can; but the fact is that I would not have been able to do even a fraction of what I’ve accomplished without their help.
blog comments powered by Disqus
RSS
...
Read the original on www.daemonology.net »
WeakC4 is a search-free, low-knowledge solution to 7x6 Connect 4, constructed by identifying a language which describes perfect play for a small subset of nodes, and then identifying a small opening tree which contains only those nodes as leaves.
This website provides a formal strategy for optimal first-player Connect Four play, which is fundamentally different from existing strong and weak solutions such as Fhourstones:
* It depends on so little information that it fits in about 150 kilobytes as shown, even before de-duplicating symmetric pairs.
* It uses no search during runtime, running at O(wh) time complexity to select a move.
* It can be visualized in its entirety and rendered in realtime.
* It visually illustrates and confirms the existence of particularly challenging openings, lines, and variations already known to connect 4 players.
This website shows a weak solution to the game of Connect Four. In short, this means that it provides sufficient information to guarantee a win for the first player if the first player plays in accordance with the weak solution’s suggestions, but makes no comment on arbitrary positions. (If it did, that would make it a strong solution).
As a motivating example: player 1 (hereafter dubbed “Red”) can win by playing in the center column on the first move and then following the weak solution’s suggestions, but would not be guaranteed to win if the first disc is played elsewhere. The weak solution contains no information about what would happen in the other columns- As far as Red cares, it would be redundant to learn those branches, since they are not good.
A strong solution would contain a game-theoretic value for every position, whereas this weak solution only contains sufficient information to guarantee a win for Red, not including any other positions.
In graph-theoretic terms, we can think of these solution types as graphs, where the strong solution is the entire game tree, and a weak solution is a subgraph which is closed under a few important per-node constraints which will be discussed later.
Connect 4 is already strongly solved, and at first glance that seems to render discussion of weak solutions moot. In reality, I think the opposite is more generally true. A weak solution has a lot of advantages over a strong solution, such as:
* Smaller data footprint. You need to “memorize” less information to be able to play perfectly.
* Revealing underlying structure. A weak solution depends on, and exposes, a structural understanding of the game.
* Visualization. A weak solution can be visualized in a way that a strong solution (14tb uncompressed, 350gb compressed) cannot.
A strong solution is a general, naive approach to solving any game which does not demand structural understanding. A weak solution, up to the selection of which winning branches to include and which to omit, leaves room for creative choice and can be used to express structural insights of the game in question.
Imagine your goal is to go to a Chess tournament and play perfectly. One option, strategy A, would be to arrive without any preparation and read through every possible variation of play while seated. Another option, strategy B, would be to show up to the tournament already having memorized the game-theoretical value of each position, which would allow you to play perfectly without any search at all.
In some sense, these two approaches are opposites of each other. The first over-relies on computation with no dependence on knowledge, while the second over-relies on knowledge with no dependence on computation.
However, in another sense, they are very similar- in both cases, we can consider the ‘data product’ of the two strategies to be identical- regardless of when the value of a position is computed, before the tournament or during it, the player is required in the moment to arrive at the same quantity of information- the same tree, the same ‘data product’. Both players would effectively construct the tree which results from alpha-beta-pruning, one would do so during the competition and one would do so before.
A more intelligent player would instead choose a strategy X which balances the two approaches. Insofar as the amount of knowledge required to memorize up to a certain depth increases exponentially, and the amount of computation required to read out an endgame increases exponentially as well, we can minimize both quantities via a strategy which involves memorizing halfway through the game, and relying on computation for the remainder. In other words, this player would reduce the total amount of data processed by optimizing the balance between memorization and compute.
So far, we have been treating the game tree, or ‘data product’ as a sort of infinitely entropic object which is only approachable formally through naive search. In reality, there is plenty of room for the application of human intuition and heuristic analysis, which is evidence that this game tree is informationally redundant- it has some sort of structure to it, and thus it can be compressed. This should not come as a surprise- insofar as it was generated in correspondence with a consistent set of rules, (the rules of the game itself), it should be expected to exhibit some degree of self-similarity.
It was a design goal of this project to not rely on realtime compute whatsoever because I hoped to visualize a solution in full. The existence of a compute step implicitly hides information which exists within our solution, and therefore we would not be faithfully visualizing the entire game tree.
If we’re clever, we can eliminate the compute step entirely.
Nothing in life comes free. In simple terms, what this meant was a need for a much deeper upfront computation to intelligently choose what branches to suggest for memorization, because it turns out that some sparse branches in this enormous game tree yield entirely regular and patterned continuations- continuations which have a “simple trick” demanding neither computation nor memorization.
Here’s a motivating puzzle to demonstrate the technical challenge underlying this upfront computation:
This is a directed game tree, where Red’s moves are shown in Red, and Yellow’s are shown in Yellow. Nodes which have a “simple trick” are crossed with green, and have been drawn as leaf nodes.
Try to identify the smallest possible subtree which serves as a weak solution on behalf of Red for this game. In other words, your job is to remove Red edges so that every remaining red-to-move node maintains exactly one outgoing red edge, without trimming any yellow edges. Click on the image to reveal the answer.
The fundamental challenge of this project was then twofold:
To find a language which expresses “simple tricks” for a sufficiently large critical-mass of nodes in the game tree, and
To find an “opening book” tree for memorization, whose leaf nodes all have such “simple tricks”.
I think it is worth reflecting on the fact that this approach isn’t merely a “strategy” for perfect connect 4 play, but more importantly an exercise in actually understanding the shape of the game tree of this complex patterned structure which emerges from the rules of Connect 4. How could you identify clever tricks, or languages to describe them, or a small tree whose leaves all contain those clever tricks, without having an understanding of the game’s intrinsic form? More on this in the “Reflections” section.
There is a subtlety here which needs to be addressed. The puzzle above requests a minimum weak solution. However, this project does not search for a minimum-size graph but rather a graph which requires less information to be expressed. In the same way that a repetitive text file can be compressed, we abuse the fact that the game tree involves informational redundancy to reduce the size of the graph and come up with a solution which is not graph-theoretically small, but rather information-theoretically small.
Before I define the entire language for expressing these “simple tricks”, let me provide a motivating connect 4 position. It’s Yellow’s turn, but Red already has a simple trick in mind. Can you find it?
[interactive link]
The trick is for Red to merely play call-and-response with Yellow, playing in the same column as Yellow’s last move. If Red does this, Red will win in the center column after a few dozen turns. We can visualize the final game board, regardless of how Yellow continues:
Notice how the puzzle’s position only had a single column with an odd number of empty spaces remaining, and that was the row that Red needed to win on. All of the other columns had an even number of remaining spaces. This is important to notice, because if several columns had an odd number of spaces, then Yellow could intentionally fill one of them up, and Red would be forced to make a move, breaking the call-and-response pattern.
As this strategy involves Red filling rows 2, 4, and 6, this strategy was dubbed Claimeven by Victor Allis in his paper A Knowledge-based Approach of Connect-Four.
Unfortunately, there are not sufficiently many positions in Connect 4 which can be solved with pure Claimeven alone, so we need to generalize a bit further.
The language I chose to express these “simple tricks” uses a “Steady State Diagram”, which looks like this:
This should be thought of as a “cheat sheet” which tells Red how to continue playing until winning, for the position drawn in the picture. The diagram features annotations on top of the grid squares which are to be used by Red to determine what to do next. As Red plays, the diagram does not change.
To determine what Red should do, we look at all of the legal moves which Red can make right now. We completely disregard “floating” annotations.
Red’s chosen move is selected by following this list of ordered priorities:
Block an opponent winning move, if available.
Play on an ! (pronounced as ‘urgent’), if available.
Play on a @ (pronounced as ‘miai’), only if there is exactly one available.
Play on a | (pronounced as ‘claimodd’) only if it is on an odd row (otherwise ignore it), or a blank-space cell (pronounced ‘claimeven’) only if it is on an even row. Note that claimeven is represented with a blank space because it shows up a lot, so it is good to think of claimeven as a sort of ‘default behavior’.
Play on a +, if available.
Play on an =, if available.
In creation of these diagrams, I provide the guarantee that there is always precisely one unique move suggested by this priority list. In other words, among the diagrams featured on my site, you will never find one where two urgents are available. This corresponds to the earlier qualification that a red-to-move node has exactly one outgoing edge.
I won’t discuss all the design decisions that guided me to use this specific language. To gain some intuition, I suggest you view the graph and explore some steady state diagrams yourself.
I also don’t make the claim that it is perfect, or optimal, or anything of the like. I converged on this design primarily through lots of trial and error. There are also a bunch of positions considered simple by Connect 4 experts which do not have a Steady State Diagram- chiefly among them positions which use the “triple-odds” strategy. Triple-odds requires a bit of global knowledge, which my Steady State language is too simple to express. I suspect the graph could be shrunk by a factor of 4 or so if a language was found which can simply express triple-odds.
Take a moment to consider that there is a trade between complexity and expressiveness of the Steady State language and graph size. I chose the best balance I could manage. If you have a better idea, I encourage you to try it :)
Briefly, here’s a description of the rest of the technical approach which permitted me to generate the graph:
* A genetic algorithm was used to quickly predict candidate Steady States for a given graph, later verified by brute force. [Code]
* I used all sorts of methods to select the best branches which are trimmed as much as possible. This involved a lot of search and backtracking, but realistically I wasn’t able to search for minimal branches at a depth any higher than about 8 [Code]. Finding the best branches in the opening involved lots of trial and error, some of my own intuition as a Connect 4 player, and soliciting suggestions for node-pruning from other players. Of course, I do not guarantee optimality of this graph. I expect it can probably be compressed by another 25 percent or so, without any modification to the Steady State language.
* Force-directed graph spreading was used to generate the graph visualization, appropriately spread-out in 3d-space. [Code]. Mirror forces were applied to guide the graph to reflect the mirror structure of the game.
We have developed a search-free, low-knowledge solution to connect 4 by identifying a language which describes perfect play for a small subset of nodes, and then selecting a small opening tree which contains only those nodes as leaves.
The resulting solution has the following properties:
* It uses no search during runtime, running at O(wh) time complexity to select a move at any valid position.
* It has been reduced to a total of under 10,000 nodes (subject to further reduction, see the graph page for a live count.) About two-thirds of these nodes are leaves representing steady states.
* It depends on so little information that it fits in about 150 kilobytes, even including mirrored positions.
* This level of compression can be compared to Allis (1988), who found an opening book of 500,000 nodes which permitted real-time play, but still invoked search.
* Traversing this tree and confirming its validity runs slightly faster on my machine than solving the game directly with Fhourstones, in some sense implying that this is the fastest “proof-by-compute” of the claim that connect 4 is a player-1-to-win game. This is even the case without any clever proof-of-correctness of steady-states which I have yet to implement- we are just brute-force checking them.
* Both of these metrics could further be reduced by about half, as they included search and storage of mirror positions.
* Sooner or later, I will make an Anki opening deck using the discovered branches, so that humans who wish to attempt memorization can do so.
* It can be visualized in its entirety and rendered in realtime.
* It visually illustrates and confirms the existence of particularly challenging openings, lines, and variations already known to connect 4 players.
All other connect 4 solutions currently available have a sort of “queryable” interface, where the user prompts the solver with a position, and the solver returns a game-theoretical value. Instead, by distilling our solution into a small data structure, we can map out the game in space for intuitive visual exploration.
An Anki deck was made from the non-leaf trunk of the graph, for the sake of human opening memorization!
The game tree of Connect 4 is an emergent object which arises from a set of simple rules. This is similar to a lot of other structures which we interface with. Physics is likely of the same nature, in which a rather simple set of equations at the quantum level yield a myriad of unexpected macroscopic phenomena such as doorknobs and opposable thumbs. Through the iteration of computational rules, different phenomena come to life at different resolutions of observation.
And I think that’s kind of an important point- resolutions. These structures usually have a “stack” of phenomena which emerge at different levels of resolution. In physics, we see organisms composed of cells composed of molecules composed of atoms, each behaving in a way best described by its associated field of study. Compare this to our minimal expression of Connect 4′s winning strategy, exhibiting different form at different levels. In the endgame, there are these simple tricks which depend on a patterned, regular structure of the continuation tree, but abstracted further back towards the opening, emergent macrostructures grow into recognizable variations and named, known openings. Of course, this was by design, but I suspect it is a necessary design choice to achieve the desired result of expressing the object’s form in as little data as possible.
A pessimistic physicist with a reductionist attitude might say that reality is merely composed of particulate phenomena, and that the segmentable, nameable macroscopic world is an illusion, or a construction of human invention. However, this physicist falls into the same philosophical trap of the Chess competitors following naive strategies A and B, dismissive of a knowledge-based mode of understanding via pattern recognition, instead deferring to raw mechanics.
Connect 4 is in a rather magical place in terms of complexity. It is ripe with emergent objects, and yet it is simple enough to visualize and formally reason about computationally. I am unaware of any other attempts to make a low-information weak solution to Connect 4 which forego search, so my sample size is one, but it seems to me that this sort of compression depends on a multi-resolutional approach, effectively contradicting the philosophy of the reductionist physicist as a viable means of approaching arbitrary emergent objects.
I hope the reader can appreciate this endeavor not merely as a strategy for the game of Connect 4, but more importantly as a formal exercise in extracting understanding from an emergent object- a problem neglected by traditional computational approaches to solving board games.
...
Read the original on 2swap.github.io »
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.