10 interesting stories served every morning and every evening.




1 662 shares, 3 trendiness

Installing every* Firefox extension

Analyzing every Firefox ex­ten­sion Installing every Firefox ex­ten­sion Using every Firefox ex­ten­sion

*All but 8 we did­n’t scrape (or got deleted be­tween me check­ing the web­site and me scrap­ing) and 42 miss­ing from ex­ten­sions.json.1 Technically we only in­stalled 99.94% of the ex­ten­sions.

It turns out there’s only 84 thou­sand Firefox ex­ten­sions. That sounds fea­si­bly small. That even sounds like it’s less than 50 gi­ga­bytes. Let’s in­stall them all!

There’s a pub­lic API for the add-ons store. No au­then­ti­ca­tion re­quired, and seem­ingly no rate lim­its. This should be easy.

The search end­point can take an empty query. Let’s read every page:

The search API only gives me 600 pages, mean­ing I can only see 30 thou­sand ex­ten­sions, less than half of them.

A so­lu­tion I found is to use dif­fer­ent sorts. The de­fault sort is sort=rec­om­mended,users: first rec­om­mended ex­ten­sions, then sorted by users, de­scend­ing. Changing to just sort=cre­ated gave me some of the long tail:

I’m still miss­ing 30,0252 ex­ten­sions, so I added rat­ing and hot­ness too.

Starting to hit di­min­ish­ing re­turns. While I was wait­ing 7 min­utes for that last list to get scraped be­cause my code did­n’t fetch in par­al­lel, I had an epiphany: use ex­clude_ad­dons. I can just fetch page 600 and ex­clude all its ad­dons to get page 601.

It works! There is a URL length limit, sadly, so I can only fetch an ex­tra 20 pages.

A lot less than I ex­pected, es­pe­cially con­sid­er­ing what hap­pens when I add the down­loads sort:

Reading the docs again, I no­tice I can fil­ter by cat­e­gory as well. I’m tired of wait­ing 7 min­utes so I’ll just fetch every page in par­al­lel.

I got ba­si­cally all the ex­ten­sions with this, mak­ing every­thing I did be­fore this look re­ally stu­pid.

That’s 8 less ex­ten­sions than what it says on the web­site. When I ran this in September 2025, it found 21 more ex­ten­sions than what was men­tioned on the web­site, so I think this is enough.

So that no­body has to do this again, I’ve up­loaded this dataset to Hugging Face.

The search API sup­ports date fil­ters: cre­at­ed__gte and cre­at­ed__lte. The API also re­turns the full num­ber of ex­ten­sions that match your search.

You can start with a fil­ter that in­cludes all ex­ten­sions, then keep split­ting the ranges in half un­til it is less than 30 thou­sand, then fetch all of them.

I’ve up­dated the down­loader: it is faster, wastes fewer re­quests, and seems to scrape ex­actly all the ex­ten­sions, too.

This won’t work if over 30 thou­sand ex­ten­sions get cre­ated in a sin­gle sec­ond, which I can’t imag­ine will ever hap­pen.

I have a copy of Bun and al­l_ex­ten­sions.json, so I will tor­ment you with my un­matched script power.

The biggest Firefox ex­ten­sion is dmitlichess at 196.3 MB, which con­tains 2000+ au­dio files.

Here’s the rest of the top ten:

The first time I ran this analy­sis, in September, Cute doggy - Dog pup­pies” was the 10th largest ex­ten­sion. I’m still men­tion­ing it here, be­cause I was so fuck­ing con­fused:

The small­est ex­ten­sion is theTabs-saver, which is 7518 bytes and has no code.

FalscheLaden, with no users, re­quests 3,695 per­mis­sions. The au­thor has posted a writeup.

Second place is Google Dark Theme, which re­quests 2,675 per­mis­sions but has 1,687 users.

Dr. B is the king of slop, with 84 ex­ten­sions pub­lished, all of them vibe coded.

How do I know? Most of their ex­ten­sions have a README.md in them de­scrib­ing their process of get­ting these through ad­don re­view, and men­tion Grok 3. Also, not a sin­gle one of them have icons or screen­shots.

Personally, I’m shocked this num­ber is this low. I ex­pected to see some de­vel­op­ers with hun­dreds!

I re­viewed the source of a cou­ple ho­mo­glyph at­tacks on crypto wal­lets dis­cov­ered in the dataset and was dis­ap­pointed to find out they just pop up a form ask­ing for your seed phrase and send it off to their server. It’s an ex­ten­sion!!! You can steal their coin­base.com to­ken! You can mon­i­tor the clip­board and swap out their ad­dress for yours! You can crash their browser and claim your real mal­ware is the fix!

Why would you make a fake MetaMask ex­ten­sion and bot 1-star re­views?

Is this the do­ing of their cy­ber­crime com­peti­tors, who bot 4-star re­views on ex­ten­sions of their own?

Either way, these ex­ten­sions are clearly phish­ing. I re­ported some to Mozilla, and the next day they were all gone, even the ones I was too lazy to re­port. I for­got to archive them, so I guess they live on in May’s VM!

In terms of im­ple­men­ta­tion, the most in­ter­est­ing one is Іron Wаllеt” (the I, a, and e are Cyrillic). Three sec­onds af­ter in­stall, it fetches the phish­ing page’s URL from the first record of a NocoDB spread­sheet and opens it:

I think the ex­ten­sion’s no ac­counts or re­mote code” de­scrip­tion is re­ally funny, like putting no copy­right in­fringe­ment in­tended” in your video’s de­scrip­tion in case YouTube is watch­ing. The API key had write ac­cess, so I wiped the spread­sheet.

You get a Homepage” link in your ex­ten­sion’s page and your own page.

It’s been no­fol­low for two years, but that has­n’t stopped grifters from try­ing any­way.

On Attempt 1, I en­coun­tered Typo Sniper and Tab Fortune Teller, AI gen­er­ated ex­ten­sions with casi­nos in their au­thor’s Homepage links.

In the dataset, there’s many Code Injector” ex­ten­sions, which are all vir­tu­ally iden­ti­cal and also have ran­dom web­sites in their au­thor’s Homepage link.

All of these ex­ten­sions are from 2025. Is there an an­cient SEO guide cir­cu­lat­ing? Is there some evil AMO fron­tend they’re still get­ting a back­link from? I have no idea what’s hap­pen­ing here.

All of these ex­ten­sions are their au­thor’s only up­loads and they have their own do­mains. Most of them are on both Chrome and Firefox, their web­sites look the same, and they all have a terms of ser­vice ref­er­enc­ing Innover Online Group Ltd”, which is a .png for some rea­son.

Because I scraped every Firefox ex­ten­sion twice, I can see what got re­moved in be­tween the runs. Three of Innover Group’s ex­ten­sions—Earth View 360°, View Manuals, and View Recipes, to­tal­ing 115 thou­sand users—have been dis­abled by Mozilla.

Innover Group runs Google ads for their ex­ten­sions, a lot of them sim­ply say­ing Continue”.

The Custom Web Search” is Yahoo but with their af­fi­late code. That code be­ing safe­plexsearch, which has a web­site of its own which of course men­tions Innover Online Group Ltd, and links to an ad­don with 3,892 users, which is ac­tu­ally a Firefox ex­clu­sive. Actually, Custom Web Search” is a Firefox ex­clu­sive on all of these ex­ten­sions. Why did they even make a Chrome ver­sion, to sell them to the NSA??

One user claimed Ezy Speed Test disables Ublock [sic] Origin once in­stalled”, which I did not find in its code.

There’s a mil­lion com­pa­nies like this, though. I just went to Download.com with my ad-blocker off and dis­cov­ered the com­pany Atom Apps in an ad, which also up­loads ex­ten­sions for both Chrome and Firefox, with a new ac­count for each ex­ten­sion, only in­cludes Yahoo in the Firefox ver­sion, with names that end in ei­ther and Search” or & Search”, and has their com­pany name as a .png in their terms of ser­vice. They have 220 thou­sand daily users to­tal across 12 ex­ten­sions, and none of theirs have been dis­abled.

* 34.3% of ex­ten­sions have no daily users

25.1% of ex­ten­sions have more than 10 daily users

10.6% of ex­ten­sions have more than 100 daily users

3.2% of ex­ten­sions have more than 1000 daily users

0.7% of ex­ten­sions have more than 10000 daily users

* 25.1% of ex­ten­sions have more than 10 daily users

* 10.6% of ex­ten­sions have more than 100 daily users

* 3.2% of ex­ten­sions have more than 1000 daily users

* 0.7% of ex­ten­sions have more than 10000 daily users

* 76.7% of ex­ten­sions are open source (SPDX li­cense that is­n’t All Rights Reserved)

* 23% of ex­ten­sions were cre­ated af­ter I started writ­ing this ar­ti­cle

19% of ex­ten­sions have no users, no re­views, no screen­shots, no down­loads, and no icon

* 19% of ex­ten­sions have no users, no re­views, no screen­shots, no down­loads, and no icon

* 2.4% of ex­ten­sions re­quire pay­ment

38.1% of those are open source???

* 38.1% of those are open source???

Obviously I’m not go­ing to open each of these in a new tab and go through those prompts. Not for lack of try­ing:

Each ex­ten­sion has the cur­ren­t_ver­sion.file.url prop­erty which is a di­rect down­load for the ex­ten­sion. I down­load them to my pro­file’s ex­ten­sions folder with the guid prop­erty as the base name and the .xpi file ex­ten­sion, be­cause any­thing else will not be in­stalled.

Then, I delete the ad­don­Startup.json.lz4 and ex­ten­sions.json files. When I re­open Firefox, each ex­ten­sion is dis­abled. Tampering with ex­ten­sions.json is com­mon enough that you can ask any chat­bot to do it for you:

My first at­tempt was in a tiny11 core VM on my desk­top.

At first, in­stead of down­load­ing all of them with a script, I tried us­ing en­ter­prise poli­cies, but this copies all the ex­ten­sions into the folder. I quickly ran out of mem­ory, and the page­file took up the rest of the stor­age al­lo­cated to the VM. I had also ex­pected Firefox to open im­me­di­ately and the ex­ten­sions to in­stall them­selves as the browser is be­ing used, but that also did not hap­pen: it just froze.

After that, I tried down­load­ing them my­self.

To make sure I was in­stalling ex­ten­sions cor­rectly, I moved the ex­ten­sions folder else­where and then moved about a thou­sand ex­ten­sions back in. It worked.

There were mul­ti­ple ex­ten­sions that changed all text to a cer­tain string. bruh-ifier lost to Se ni važn. Goku is in the back­ground.

My con­text menu is so long that I’m show­ing it side­ways:

I had in­stalled lots of pro­tec­tion ex­ten­sions. One blocks traf­fic to .zip and .mov do­mains, pre­sum­ably be­cause they are file ex­ten­sions. This is .cab era­sure! Then, I re­al­ized that there were likely mul­ti­ple peo­ple view­ing my brows­ing his­tory, so I went to send them a mes­sage.

That ⚠️ SCAM WARNING!” popup is from Anti-Phishing Alert. As you may have in­ferred, it seems to only ex­ists for its Homepage link. How does it work?

Vasavi Fraudulent Detector also has a popup for when a site is safe:

Only the ad­dons from Attempt 1 were ac­tu­ally loaded, be­cause I did­n’t know I needed to delete ad­don­Startup.json.lz4 yet. I scrolled through the ad­dons page, then I opened DevTools to ver­ify it was the full 65,335, at which point Firefox froze and I was un­able to re­open it.

After that, I made a new (non-admin) user on my Mac to try again on a more pow­er­ful de­vice.

Every time I glanced at my script down­load­ing ex­ten­sions one at a time for six hours, I kept rec­og­niz­ing names. Oops, I’m the AMO sub­ject-mat­ter ex­pert now! Parallelizing was mak­ing it slower by the last 4000 ex­ten­sions, which did­n’t hap­pen on my Windows VM.

When that fin­ished, I found out my hard­ware could­n’t run 65,335 ex­ten­sions at once, sadly. The win­dow does open af­ter some time I did­n’t mea­sure, but the win­dow never starts re­spond­ing. I don’t have the balls to run my lap­top overnight.3

Firefox did make over 400 GB of disk writes. Because I for­got swap ex­isted, I checked the pro­file try­ing to find the cul­prit, which is when I learned I needed to delete ad­don­Startup.json.lz4 and mod­ify ex­ten­sions.json. The ex­ten­sions.json was 144 MB. For com­par­i­son, my PCs ex­ten­sions.json is 336 KB.

My so­lu­tion: add 1000 ex­ten­sions at a time un­til Firefox took too long to open. I got to 6000.

3000 ex­ten­sions was the last point where I was at least able to load web­pages.

After 4000 or more ex­ten­sions, the ex­pe­ri­ence is ba­si­cally iden­ti­cal. Here’s a video of mine (epilepsy warn­ing):

5000 was the same as 4000 but every web­site was blocked by some ex­ten­sion I know starts with an S and ends with Blocker and has a logo with CJK char­ac­ters. At 6000 ex­ten­sions, the only page that I could load was about:ad­dons.

My desk­top has 16 GB of RAM, and my lap­top has 24 GB of uni­fied mem­ory. You might no­tice that 49.3 GB is more than twice that.

What you’re about to see was recorded in May’s vir­tual ma­chine. Do not try this on your main pro­file.

My down­load script started in par­al­lel, then we switched it to se­r­ial when it slowed down. In to­tal, down­load­ing took about 1 hour and 43 min­utes.

I was on a call the en­tire time, and we spot­ted a lot of strange ex­ten­sions in the logs. What kind of chud would use KiwiFarms Math Renderer”? Are they draft­ing the the­ory of soy­tiv­ity?

Turning on Mullvad VPN and rout­ing to Tel Aviv ap­peared to speed up the process. This was not be­cause of Big Yahu, but be­cause May restarted the script, so she re­peated that a cou­ple times. Whether that’s a Bun bug, I don’t know and I don’t care. May joked about a version 2” that I dread think­ing about.

Defender marked one ex­ten­sion, HackTools, as mal­ware. May ex­cluded the folder af­ter that, so it may not be the only one.

Firefox took its sweet time re­mak­ing ex­ten­sions.json, and it kept climb­ing. About 39 min­utes of Firefox dis­play­ing a skele­ton (hence it has yet to ren­der a sec­ond frame”) later, it was 189 MB large: a new record! May killed Firefox and ran en­able.js.

I did some re­search to find why this took so long.

13 years ago, ex­ten­sions.json used to be ex­ten­sions.sqlite. Nowadays, ex­ten­sions.json is se­ri­al­ized and rewrit­ten in full on every write de­bounced to 20 ms, which works fine for 15 ex­ten­sions but not 84,194.

Finally, we see the browser. The on­board­ing tabs trick­led in, never load­ing.

May re­opened it, took a shower, and came back to this:

IT STABLIZED. YOU CAN (barely) RUN FIREFOX WITH ALL 84 THOUSAND EXTENSIONS.

Well, we were pretty sure it had 84 thou­sand ex­ten­sions. It had Tab Counter, at least, and the scroll­bar in the ex­ten­sions panel was ab­solutely mas­sive.

She loaded the con­fig­ure pages of two ex­ten­sions. The op­tions iframe never loaded.

I re­al­ized we need to dis­able auto up­date be­fore Firefox sends an­other 84 thou­sand re­quests. This one took a while to load.

The list loaded but with no icons and stopped re­spond­ing, and 6 hours later it had loaded fully.

We recorded the en­tire process; the mem­ory us­age fluc­tu­ated be­tween 27 and 37 GiB the en­tire time.

...

Read the original on jack.cab »

2 641 shares, 60 trendiness

How I run multiple $10K MRR companies on a $20/month tech stack

Last night, I was re­jected from yet an­other pitch night. It was just the pre-in­ter­view, and the prob­lem was­n’t my prod­uct. I al­ready have MRR. I al­ready have users who de­pend on it every day.

The feed­back was sim­ply: What do you even need fund­ing for?”

I hear this time and time again when I try to grow my ideas. Running lean is in my DNA. I’ve built tools you might have used, like web­se­quence­di­a­grams.com, and niche prod­ucts you prob­a­bly haven’t, like eh-trade.ca. That ob­ses­sion with ef­fi­ciency leads to suc­cess­ful boot­strap­ping, and hon­estly, a lot of VCs hate that.

Keeping costs near zero gives you the ex­act same run­way as get­ting a mil­lion dol­lars in fund­ing with a mas­sive burn rate. It’s less stress­ful, it keeps your ar­chi­tec­ture in­cred­i­bly sim­ple, and it gives you ad­e­quate time to find prod­uct-mar­ket fit with­out the pres­sure of a board breath­ing down your neck.

If you are tired of the mod­ern Enterprise” boil­er­plate, here is the ex­act play­book of how I build my com­pa­nies to run on nearly noth­ing.

The naive way to launch a web app in 2026 is to fire up AWS, pro­vi­sion an EKS clus­ter, set up an RDS in­stance, con­fig­ure a NAT Gateway, and ac­ci­den­tally spend $300 a month be­fore a sin­gle user has even looked at your land­ing page.

The smart way is to rent a sin­gle Virtual Private Server (VPS).

First thing I do is get a cheap, re­li­able box. Forget AWS. You aren’t go­ing to need it, and their con­trol panel is a labyrinth de­signed to ex­tract billing up­grades. I use Linode or DigitalOcean. Pay no more than $5 to $10 a month.

1GB of RAM sounds ter­ri­fy­ing to mod­ern web de­vel­op­ers, but it is plenty if you know what you are do­ing. If you need a lit­tle breath­ing room, just use a swap­file.

The goal is to serve re­quests, not to main­tain in­fra­struc­ture. When you have one server, you know ex­actly where the logs are, ex­actly why it crashed, and ex­actly how to restart it.

Now you have con­straints. You only have a gi­ga­byte of mem­ory. You could run Python or Ruby as your main back­end lan­guage—but why would you? You’ll spend half your RAM just boot­ing the in­ter­preter and man­ag­ing gu­ni­corn work­ers.

I write my back­ends in Go.

Go is in­fi­nitely more per­for­mant for web tasks, it’s strictly typed, and—cru­cially for 2026—it is in­cred­i­bly easy for LLMs to rea­son about. But the real magic of Go is the de­ploy­ment process. There is no pip in­stall de­pen­dency hell. There is no vir­tual en­vi­ron­ment. You com­pile your en­tire ap­pli­ca­tion into a sin­gle, sta­t­i­cally linked bi­nary on your lap­top, scp it to your $5 server, and run it.

Here is what a com­plete, pro­duc­tion-ready web server looks like in Go. No bloated frame­works re­quired:

pack­age main

im­port (

fmt”

net/http”

func main() {

http.Han­dle­Func(“/”, func(w http.Re­spon­seWriter, r *http.Request) {

fmt.Fprintf(w, Hello, your MRR is safe here.“)

// This will com­fort­ably han­dle 10,000s of re­quests per sec­ond

// on a potato.

http.Lis­te­nAnd­Serve(”:8080″, nil)

If you have a graph­ics card sit­ting some­where in your house, you al­ready have un­lim­ited AI cred­its.

When I was build­ing eh-trade.ca, I had a spe­cific prob­lem: I needed to per­form deep, qual­i­ta­tive stock mar­ket re­search on thou­sands of com­pa­nies, sum­ma­riz­ing mas­sive quar­terly re­ports. The naive so­lu­tion is to throw all of this at the OpenAI API. I could have paid hun­dreds of dol­lars in API cred­its, only to find a logic bug in my prompt loop that re­quired me to run the whole batch over again.

Instead, I’m run­ning VLLM on a dusty $900 graph­ics card (an RTX 3090 with 24GB of VRAM) I bought off Facebook Marketplace. It’s an up­front in­vest­ment, sure, but I never have to pay a toll to an AI provider for batch pro­cess­ing again.

For lo­cal AI, you have a dis­tinct up­grade path:

* Start with Ollama. It sets up in one com­mand (ollama run qwen3:32b) and lets you try out dozens of mod­els in­stantly. It’s the per­fect en­vi­ron­ment for it­er­at­ing on prompts.

* Move to VLLM for pro­duc­tion. Once you have a sys­tem that works, Ollama be­comes a bot­tle­neck for con­cur­rent re­quests. VLLM locks your GPU to one model, but it is dras­ti­cally faster be­cause it uses PagedAttention. Structure your sys­tem so you send 8 or 16 async re­quests si­mul­ta­ne­ously. VLLM will batch them to­gether in the GPU mem­ory, and all 16 will fin­ish in roughly the same time it takes to process one.

* Use Transformer Lab for any­thing more ad­vanced. If you need to do any model pre-train­ing or fine-tun­ing, Transformer Lab makes it easy on lo­cal hard­ware.

To man­age all this, I built la­conic, an agen­tic re­searcher specif­i­cally op­ti­mized for run­ning in a con­strained 8K con­text win­dow. It man­ages the LLM con­text like an op­er­at­ing sys­tem’s vir­tual mem­ory man­ager—it pages out” the ir­rel­e­vant bag­gage of a con­ver­sa­tion, keep­ing only the ab­solute most crit­i­cal facts in the ac­tive LLM con­text win­dow.

I also use llmhub, which ab­stracts any LLM into a sim­ple provider/​end­point/​apikey combo, grace­fully han­dling both text and im­age IO whether the model is run­ning un­der my desk or in the cloud.

You can’t do every­thing lo­cally. Sometimes you need the ab­solute cut­ting-edge rea­son­ing of Claude 3.5 Sonnet or GPT-4o for user-fac­ing, low-la­tency chat in­ter­ac­tions.

Instead of jug­gling billing ac­counts, API keys, and rate lim­its for Anthropic, Google, and OpenAI, I just use OpenRouter. You write one OpenAI-compatible in­te­gra­tion in your code, and you in­stantly get ac­cess to every ma­jor fron­tier model.

More im­por­tantly, it al­lows for seam­less fall­back rout­ing. If Anthropic’s API goes down on a Tuesday af­ter­noon (which hap­pens), my app au­to­mat­i­cally falls back to an equiv­a­lent OpenAI model. My users never see an er­ror screen, and I don’t have to write com­plex retry logic.

New, in­sanely ex­pen­sive mod­els are be­ing re­leased every week. I con­stantly hear about de­vel­op­ers drop­ping hun­dreds of dol­lars a month on Cursor sub­scrip­tions and Anthropic API keys just to have an AI write their boil­er­plate.

Meanwhile, I’m us­ing Claude Opus 4.6 all day and my bill barely touches $60 a month. My se­cret? I ex­ploit Microsoft’s pric­ing model.

I bought a GitHub Copilot sub­scrip­tion in 2023, plugged it into stan­dard VS Code, and never left. I tried Cursor and the other fancy forks when they briefly sur­passed it with agen­tic cod­ing, but Copilot Chat al­ways catches up.

Here is the trick that you might have missed: some­how, Microsoft is able to charge per re­quest, not per to­ken. And a request” is sim­ply what I type into the chat box. Even if the agent spends the next 30 min­utes chew­ing through my en­tire code­base, map­ping de­pen­den­cies, and chang­ing hun­dreds of files, I still pay roughly $0.04.

The op­ti­mal strat­egy is sim­ple: write bru­tally de­tailed prompts with strict suc­cess cri­te­ria (which is best prac­tice any­way), tell the agent to keep go­ing un­til all er­rors are fixed,” hit en­ter, and go make a cof­fee while Satya Nadella sub­si­dizes your com­pute costs.

I al­ways start a new ven­ture us­ing sqlite3 as the main data­base. Hear me out, this is not as in­sane as you think.

The en­ter­prise mind­set dic­tates that you need an out-of-process data­base server. But the truth is, a lo­cal SQLite file com­mu­ni­cat­ing over the C-interface or mem­ory is or­ders of mag­ni­tude faster than mak­ing a TCP net­work hop to a re­mote Postgres server.

But what about con­cur­rency?” you ask. Many peo­ple think SQLite locks the whole data­base on every write. They are wrong. You just need to turn on Write-Ahead Logging (WAL). Execute this pragma once when you open the data­base:

PRAGMA jour­nal_­mode=WAL;

PRAGMA syn­chro­nous=NOR­MAL;

Boom. Readers no longer block writ­ers. Writers no longer block read­ers. You can now eas­ily han­dle thou­sands of con­cur­rent users off a sin­gle .db file on an NVMe drive.

Since im­ple­ment­ing user au­then­ti­ca­tion is usu­ally the most an­noy­ing part of start­ing a new SQLite-based pro­ject, I built a li­brary: smhanov/​auth. It in­te­grates di­rectly with what­ever data­base you are us­ing and man­ages user signups, ses­sions, and pass­word re­sets. It even lets users sign in with Google, Facebook, X, or their own com­pany-spe­cific SAML provider. No bloated de­pen­den­cies, just sim­ple, au­ditable code.

The tech in­dus­try wants you to be­lieve that build­ing a real busi­ness re­quires com­plex or­ches­tra­tion, mas­sive monthly AWS bills, and mil­lions in ven­ture cap­i­tal.

By uti­liz­ing a sin­gle VPS, sta­t­i­cally com­piled bi­na­ries, lo­cal GPU hard­ware for batch AI tasks, and the raw speed of SQLite, you can boot­strap a highly scal­able startup that costs less than the price of a few cof­fees a month. You add in­fi­nite run­way to your pro­ject, giv­ing your­self the time to ac­tu­ally solve your users’ prob­lems in­stead of sweat­ing your burn rate.

If you are in­ter­ested in run­ning lean, check out my auth li­brary and agent im­ple­men­ta­tions on my GitHub. I’ll be hang­ing around the com­ments—let me know how you keep your server costs down, or tell me why I’m com­pletely wrong.

...

Read the original on stevehanov.ca »

3 463 shares, 84 trendiness

[BUG] Pro Max 5x Quota Exhausted in 1.5 Hours Despite Moderate Usage · Issue #45756 · anthropics/claude-code

Skip to con­tent

Secure your code as you build

We read every piece of feed­back, and take your in­put very se­ri­ously.

Include my email ad­dress so I can be con­tacted

Use saved searches to fil­ter your re­sults more quickly

To see all avail­able qual­i­fiers, see our doc­u­men­ta­tion.

Sign up

You signed in with an­other tab or win­dow. Reload to re­fresh your ses­sion.

You signed out in an­other tab or win­dow. Reload to re­fresh your ses­sion.

You switched ac­counts on an­other tab or win­dow. Reload to re­fresh your ses­sion.

Notifications

You must be signed in to change no­ti­fi­ca­tion set­tings

You can’t per­form that ac­tion at this time.

...

Read the original on github.com »

4 462 shares, 19 trendiness

Center for Responsible, Decentralized Intelligence at Berkeley

How We Broke Top AI Agent Benchmarks: And What Comes Next

Our agent hacked every ma­jor one. Here’s how — and what the field needs to fix.

Every week, a new AI model climbs to the top of a bench­mark leader­board. Companies cite these num­bers in press re­leases. Investors use them to jus­tify val­u­a­tions. Engineers use them to pick which model to de­ploy. The im­plicit promise is sim­ple: a higher score means a more ca­pa­ble sys­tem.

We built an au­to­mated scan­ning agent that sys­tem­at­i­cally au­dited eight among the most promi­nent AI agent bench­marks — SWE-bench, WebArena, OSWorld, GAIA, Terminal-Bench, FieldWorkArena, and CAR-bench — and dis­cov­ered that every sin­gle one can be ex­ploited to achieve near-per­fect scores with­out solv­ing a sin­gle task. No rea­son­ing. No ca­pa­bil­ity. Just ex­ploita­tion of how the score is com­puted.

These aren’t the­o­ret­i­cal at­tacks. Our agent builds work­ing ex­ploits for each bench­mark, runs them through the of­fi­cial eval­u­a­tion pipelines, and watches the scores roll in.

A con­ftest.py file with 10 lines of Python resolves” every in­stance on SWE-bench Verified.

A fake curl wrap­per gives a per­fect score on all 89 Terminal-Bench tasks with­out writ­ing a sin­gle line of so­lu­tion code.

Navigating Chromium to a file:// URL reads the gold an­swer di­rectly from the task con­fig — giv­ing ~100% on all 812 WebArena tasks.

The bench­marks aren’t mea­sur­ing what you think they’re mea­sur­ing.

This Is Already Happening

Benchmark scores are ac­tively be­ing gamed, in­flated, or ren­dered mean­ing­less, not in the­ory, but in prac­tice:

IQuest-Coder-V1 claimed 81.4% on SWE-bench — then re­searchers found that 24.4% of its tra­jec­to­ries sim­ply ran git log to copy the an­swer from com­mit his­tory. Corrected score: 76.2%. The bench­mark’s shared en­vi­ron­ment made the cheat triv­ial.

METR found that o3 and Claude 3.7 Sonnet re­ward-hack in 30%+ of eval­u­a­tion runs — us­ing stack in­tro­spec­tion, mon­key-patch­ing graders, and op­er­a­tor over­load­ing to ma­nip­u­late scores rather than solve tasks.

OpenAI dropped SWE-bench Verified af­ter an in­ter­nal au­dit found that 59.4% of au­dited prob­lems had flawed tests — mean­ing mod­els were be­ing scored against bro­ken ground truth.

In KernelBench, torch.empty() re­turns stale GPU mem­ory that hap­pens to con­tain the ref­er­ence an­swer from the eval­u­a­tor’s prior com­pu­ta­tion — zero com­pu­ta­tion, full marks.

Anthropic’s Mythos Preview showed that fron­tier mod­els can ac­tively try to hack the en­vi­ron­ment and suc­ceed. In one episode, the model needed to edit files it lacked per­mis­sions for; af­ter search­ing for workarounds, it found a way to in­ject code into a con­fig file that would run with el­e­vated priv­i­leges, and de­signed the ex­ploit to delete it­self af­ter run­ning. If a model can in­de­pen­dently craft self-eras­ing priv­i­lege es­ca­la­tion ex­ploits, it can find the holes in an eval­u­a­tion har­ness.

These are not iso­lated in­ci­dents. They are symp­toms of a sys­temic prob­lem: the bench­marks we rely on to mea­sure AI ca­pa­bil­ity are them­selves vul­ner­a­ble to the very ca­pa­bil­i­ties they claim to mea­sure.

The Scorecard of Our Exploit Agent

Zero tasks solved. Zero LLM calls (in most cases). Near-perfect scores.

How Our Agent Did It

Terminal-Bench eval­u­ates agents on 89 com­plex ter­mi­nal tasks (from build­ing COBOL chess en­gines to con­fig­ur­ing dis­trib­uted sys­tems). It uses Harbor, a sand­box­ing frame­work that re-up­loads pro­tected test files be­fore ver­i­fi­ca­tion. Containers have full in­ter­net ac­cess — Harbor’s al­low_in­ter­net con­fig de­faults to True and no task over­rides it.

Here’s the irony: 82 of 89 tasks down­load uv from the in­ter­net at ver­i­fi­ca­tion time via curl, cre­at­ing an ex­ploitable de­pen­dency chain. Our ex­ploit re­places /usr/bin/curl with a wrap­per dur­ing the agent phase. When the ver­i­fier later runs curl … | sh to in­stall uv, our wrap­per in­ter­cepts the re­quest, in­stalls the real uv, and then tro­janizes the uvx bi­nary. The tro­janized uvx de­tects pytest in­vo­ca­tions and pro­duces fake pass­ing out­put:

It writes 1 to the re­ward file. The ver­i­fier trusts it. Task passed.”

For the re­main­ing 7 tasks, our agent wraps pip (to tro­janize freshly-in­stalled pytest), python (to in­ter­cept python -m pytest), or the pre-in­stalled uvx bi­nary.

SWE-bench is ar­guably the most in­flu­en­tial AI cod­ing bench­mark. It pre­sents real GitHub is­sues and asks the agent to pro­duce a patch that makes the fail­ing tests pass. SWE-bench-Verified (500 hand-val­i­dated in­stances) and SWE-bench Pro (731 mul­ti­lin­gual in­stances) are cur­rently con­sid­ered the gold stan­dards for mea­sur­ing cod­ing agents.

Both are ex­ploitable at 100%.

The vul­ner­a­bil­ity: The agen­t’s patch is ap­plied in­side the same Docker con­tainer where tests run. Anything the patch in­tro­duces — a con­ftest.py, a mon­key-patch, a Go init() func­tion — ex­e­cutes with full priv­i­leges be­fore or dur­ing test­ing.

SWE-bench Verified ex­ploit: Our agent cre­ates a con­ftest.py with a pytest hook that forces every test to re­port as pass­ing:

Pytest auto-dis­cov­ers con­ftest.py files and loads them be­fore run­ning tests. The hook in­ter­cepts every test re­sult dur­ing the call” phase and rewrites it to passed.” The log parser sees PASSED for every test. The grader sees all fail-to-pass tests now pass­ing. Instance re­solved.

For Django (231 of 500 in­stances), which uses unittest in­stead of pytest, the ex­ploit prepends a mon­key-patch to django/​__init__.py that re­places unittest. TestCase.run with a func­tion that un­con­di­tion­ally re­ports suc­cess.

For re­pos with an ex­ist­ing con­ftest.py, the ex­ploit uses the patch –batch –fuzz=5 fall­back (the third of SWE-bench’s three patch ap­pli­ca­tion meth­ods) to prepend our hook to the ex­ist­ing file.

SWE-bench Pro ex­ploit: Pro uses a sep­a­rate parser.py to ex­tract test re­sults — but it runs in­side the same con­tainer. Our con­ftest.py in­cludes a pytest_­con­fig­ure hook that over­writes /workspace/parser.py with a fake parser that marks every ex­tracted test as PASSED. The same ap­proach works for Go (via init() func­tions) and JavaScript (via .mocharc.yml setup files).

Result: 500/500 Verified in­stances and 731/731 Pro in­stances ex­ploitable, with zero is­sues solved.

WebArena is a pop­u­lar bench­mark of 812 web in­ter­ac­tion tasks. It runs a PromptAgent that dri­ves a Playwright-controlled Chromium browser, and it ships task con­figs — in­clud­ing ref­er­ence an­swers — as JSON files on the lo­cal filesys­tem at con­fig_­files/{​task_id}.json.

The ex­ploit re­quires zero changes to WebArena’s code. We don’t patch run.py, don’t mod­ify the PromptAgent, don’t touch the eval­u­a­tion har­ness. Our agent finds a way to use the ex­ist­ing har­ness and steal the so­lu­tion.

It ex­ploits a browser prim­i­tive that WebArena never re­stricted: Playwright’s Chromium hap­pily nav­i­gates to file:// URLs. By out­putting a goto ac­tion point­ing at file:///​proc/​self/​cwd/​con­fig_­files/{​task_id}.json, the model can read any file on the lo­cal sys­tem — in­clud­ing the task con­fig con­tain­ing the gold an­swers. The eval­u­a­tor never no­tices; it just sees a cor­rect an­swer come back.

We also found ad­di­tional risks in the eval­u­a­tion pipeline. Tasks us­ing must_in­clude scor­ing check for sub­string pres­ence in the page DOM — a hid­den in­jected by the agent is enough to sat­isfy the check with­out the an­swer ap­pear­ing vis­i­bly. Tasks scored by an LLM judge pass agent con­tent di­rectly into the prompt with­out san­i­ti­za­tion, mak­ing prompt in­jec­tion straight­for­ward: a com­ment ap­pended to the agen­t’s re­ply can re­li­ably bias the judge’s de­ci­sion. Neither vec­tor re­quires filesys­tem ac­cess, com­ple­ment­ing the file:// ex­ploit.

FieldWorkArena pre­sents 890 tasks where an AI agent must an­swer ques­tions about im­ages, videos, PDFs, and text files through a browser en­vi­ron­ment. It was de­signed to test mul­ti­modal un­der­stand­ing and web nav­i­ga­tion.

Its val­i­date() method checks only one thing: did the last mes­sage come from the as­sis­tant?

The mes­sage con­tent is com­pletely ig­nored. The func­tion that would ac­tu­ally com­pare an­swers against ground truth — llm_­fuzzy_­match — is im­ported but never called. It’s dead code.

One ac­tion. Zero LLM calls. Zero files read. 100% on all 890 tasks.

The 71 tasks re­quir­ing JSON out­put fare no bet­ter — the val­ida­tor checks only that the re­sponse con­tains parseable JSON. {} qual­i­fies.

This is not a sub­tle edge case. The bench­mark’s core eval­u­a­tion func­tion sim­ply does not eval­u­ate.

OSWorld tests agents on 369 desk­top com­put­ing tasks in­side a full Ubuntu VM. The VM has un­re­stricted in­ter­net ac­cess — by de­sign, since many tasks re­quire the agent to browse live web­sites, in­stall pack­ages, and in­ter­act with on­line ser­vices. Its most com­mon eval­u­a­tion pat­tern: fetch a file from the VM and com­pare it against a gold” ref­er­ence.

The gold ref­er­ences are hosted on pub­lic HuggingFace URLs em­bed­ded in the task con­fig. Since the VM has in­ter­net ac­cess, our ex­ploit agent down­loads the gold file di­rectly into the path the eval­u­a­tor checks:

For tasks that check sys­tem state, the agent di­rectly sets it — gset­tings set for desk­top pref­er­ences, writ­ing ex­pected con­fig files, ap­pend­ing ex­pected strings to checked files. For the 29 infeasible” tasks, the eval­u­a­tor con­fig ex­plic­itly marks them as such; the agent out­puts FAIL and gets free points.

And then there’s the eval() vul­ner­a­bil­ity: the grader calls eval() on a string ob­tained from the VM. Our agent sets a gset­tings value to a Python ex­pres­sion con­tain­ing __import__(‘os’).system(…), achiev­ing ar­bi­trary code ex­e­cu­tion on the grad­ing ma­chine it­self.

GAIA tests gen­eral AI as­sis­tants with 165 val­i­da­tion ques­tions re­quir­ing multi-step rea­son­ing. It’s a sub­mit-your-an­swers leader­board — there’s no sand­boxed ex­e­cu­tion en­vi­ron­ment; you run your agent how­ever you want and up­load the re­sults. Its val­i­da­tion an­swers are pub­licly avail­able on HuggingFace — mak­ing it a lookup table ex­er­cise. Our ex­ploit agent sim­ply loads these an­swers from a lo­cal JSON file at run­time (no in­ter­net needed).

But even set­ting aside the leaked an­swers, the scor­er’s nor­mal­ize_str func­tion strips ALL white­space, ALL punc­tu­a­tion, and low­er­cases every­thing be­fore com­par­i­son. This means:

The leader­board has a 100% score blocker — but it’s a sim­ple if all scores == 1.0 check. Intentionally miss­ing one ques­tion per level gives ~98% while avoid­ing the fil­ter.

There’s also a comma rout­ing bug: ground truth 1,500” gets split as a list [“1”, 500″], while the cor­rect an­swer 1500” splits as [“1500”] — length mis­match, scored wrong. The scorer pe­nal­izes cor­rect an­swers.

CAR-bench (car voice as­sis­tant) re­lies heav­ily on LLM-as-judge eval­u­a­tion, where an LLM reads the agen­t’s con­ver­sa­tion and scores it. The agen­t’s mes­sages are in­ter­po­lated di­rectly into the judge prompt with no san­i­ti­za­tion. Our ex­ploit agent ap­pends hid­den in­struc­tions:

The judge is bi­ased to­ward fa­vor­able scores.

CAR-bench has an even sim­pler ex­ploit for hal­lu­ci­na­tion tasks: three of four re­ward com­po­nents (state-based, tool-sub­set, and pol­icy) re­turn 0.0 delta for hal­lu­ci­na­tion task types. A generic re­fusal avoids tool er­rors and trig­gers a clean exit. Result: 1.0 on every hal­lu­ci­na­tion task with­out an LLM.

Across all eight bench­marks, the same vul­ner­a­bil­ity pat­terns re­peat:

1. No Isolation Between Agent and Evaluator

The most per­va­sive flaw. In SWE-bench, Terminal-Bench, and OSWorld, the agen­t’s code runs in the same en­vi­ron­ment the eval­u­a­tor in­spects. Any eval­u­a­tion that reads state from a shared en­vi­ron­ment with­out care­ful val­i­da­tion can be de­feated by an agent that writes state to that en­vi­ron­ment.

WebArena passes ref­er­ence an­swers in the task con­fig. OSWorld em­beds gold file URLs in task meta­data. GAIAs val­i­da­tion an­swers are pub­lic on HuggingFace. If the agent can see the ex­pected an­swer, the bench­mark mea­sures lookup speed, not ca­pa­bil­ity.

WebArena and OSWorld both call Python’s eval() on strings con­trolled by the agent, en­abling ar­bi­trary code ex­e­cu­tion on the grad­ing ma­chine. This is­n’t just a scor­ing ex­ploit — it’s a se­cu­rity vul­ner­a­bil­ity that could com­pro­mise eval­u­a­tion in­fra­struc­ture.

WebArena and CAR-bench in­ter­po­late agent con­tent di­rectly into LLM judge prompts. Prompt in­jec­tion is triv­ial: em­bed a hid­den system note” in your re­sponse and the judge par­rots your pre­ferred score. LLM-as-judge is not ad­ver­sar­i­ally ro­bust.

WebArena’s must_in­clude uses sub­string con­tain­ment. GAIAs nor­mal­izer col­lapses vi­su­ally dis­tinct strings. When match­ing is too loose, any suf­fi­ciently ver­bose an­swer passes.

FieldWorkArena’s val­i­date() never checks an­swer cor­rect­ness. CAR-bench skips three of four re­ward com­po­nents for hal­lu­ci­na­tion tasks. GAIAs comma rout­ing pe­nal­izes cor­rect an­swers. When the scor­ing code it­self is wrong, the leader­board re­flects noise, not sig­nal.

SWE-bench trusts pytest out­put gen­er­ated in­side a con­tainer the agent con­trols. Terminal-Bench trusts re­ward files writ­ten by scripts the agent can tam­per with. When the test in­fra­struc­ture can be com­pro­mised by the sys­tem un­der test, the re­sults are mean­ing­less.

This is not an aca­d­e­mic ex­er­cise. Benchmark scores drive real de­ci­sions:

Model se­lec­tion: Teams choos­ing be­tween mod­els based on SWE-bench re­solve rates may be com­par­ing noise.

Investment: Funding de­ci­sions are in­flu­enced by leader­board po­si­tions that can be gamed.

Safety eval­u­a­tion: If ca­pa­bil­ity bench­marks can be in­flated, safety bench­marks — which of­ten use sim­i­lar pat­terns — may be equally frag­ile.

Research di­rec­tion: Researchers op­ti­mize for bench­mark per­for­mance. If the bench­marks are bro­ken, the field op­ti­mizes for the wrong thing.

We are not claim­ing that cur­rent leader­board lead­ers are cheat­ing. Most le­git­i­mate agents do not em­ploy these ex­ploits — yet. But as agents grow more ca­pa­ble, re­ward hack­ing be­hav­iors can emerge with­out ex­plicit in­struc­tion. An agent trained to max­i­mize a score, given suf­fi­cient au­ton­omy and tool ac­cess, may dis­cover that ma­nip­u­lat­ing the eval­u­a­tor is eas­ier than solv­ing the task — not be­cause it was told to cheat, but be­cause op­ti­miza­tion pres­sure finds the path of least re­sis­tance. This is not hy­po­thet­i­cal — Anthropic’s Mythos Preview as­sess­ment al­ready doc­u­ments a model that in­de­pen­dently dis­cov­ered re­ward hacks when it could­n’t solve a task di­rectly. If the re­ward sig­nal is hack­able, a suf­fi­ciently ca­pa­ble agent may hack it as an emer­gent strat­egy, not a de­lib­er­ate one.

The fact that a triv­ial ex­ploit agent outscores so­phis­ti­cated sys­tems means the bench­marks fail as re­li­able mea­sures of ca­pa­bil­ity.

The Agent-Eval Checklist: Building Benchmarks That Actually Work

If you’re build­ing an eval­u­a­tion, here’s what our find­ings say you must get right. We dis­till these into the Agent-Eval Checklist — a min­i­mum bar that every agent bench­mark should clear be­fore pub­lish­ing re­sults:

Isolate the agent from the eval­u­a­tor. This is non-ne­go­tiable. The sys­tem un­der test must not be able to read, write, or in­flu­ence the eval­u­a­tion en­vi­ron­ment.

Run eval­u­a­tion out­side the agen­t’s con­tainer. Don’t trust files, out­puts, or state from in­side the sand­box. Extract raw ar­ti­facts (logs, files) through a con­trolled chan­nel and eval­u­ate them on a sep­a­rate, read-only host.

Don’t pass ref­er­ence an­swers to the agent. Task con­figs should con­tain only the in­for­ma­tion a hu­man would have. Evaluation meta­data (expected an­swers, gold files, eval­u­a­tor con­figs) must live on a sep­a­rate, in­ac­ces­si­ble path.

Use read-only filesys­tems for any bi­na­ries, test files, or in­fra­struc­ture the eval­u­a­tion de­pends on.

Never eval() un­trusted in­put. This should go with­out say­ing, but two ma­jor bench­marks do it. Parse struc­tured data with a proper parser. If you need to eval­u­ate ex­pres­sions, use a sand­boxed in­ter­preter with no ac­cess to builtins.

Sanitize LLM judge in­puts. If you use LLM-as-judge, treat agent out­put like un­trusted user in­put:

Delimit agent con­tent with clear struc­tural mark­ers that the judge is in­structed to treat as data, not in­struc­tions.

Strip or es­cape any con­tent that re­sem­bles sys­tem prompts or eval­u­a­tion in­struc­tions.

Use struc­tured out­put for­mats (JSON schema, func­tion call­ing) to re­duce the at­tack sur­face.

Better yet, eval­u­ate on ex­tracted fea­tures (e.g., did file X con­tain string Y?”) rather than ask­ing an LLM to make sub­jec­tive judg­ments on the full tra­jec­tory.

Test your eval­u­a­tor ad­ver­sar­i­ally. Before pub­lish­ing a bench­mark, try to break it. Build an ex­ploit agent that does every­thing ex­cept solve the task and see what score it gets. If a zero-ca­pa­bil­ity agent scores above base­line, your eval­u­a­tion has a bug. Specifically:

Run a null agent that takes no ac­tions. Its score is your floor. If it’s not zero, some­thing is wrong.

Run a ran­dom agent that takes ran­dom ac­tions. If it sig­nif­i­cantly out­per­forms the null agent on some tasks, those tasks may have eval­u­a­tion gaps.

Run a prompt in­jec­tion agent that tries to in­flu­ence LLM judges. If scores change, your judge is com­pro­mis­able.

Run a state-tam­per­ing agent that mod­i­fies the eval­u­a­tion en­vi­ron­ment in­stead of solv­ing tasks. If it scores above zero, your iso­la­tion is bro­ken.

Prevent tam­per­ing with eval­u­a­tion data and traces. If your eval­u­a­tion pipeline in­volves mul­ti­ple stages (agent ex­e­cu­tion, test ex­e­cu­tion, re­sult pars­ing), en­sure the agent or its gen­er­ated so­lu­tion can­not mod­ify, over­write, or in­ject into the data and traces passed be­tween stages. Treat all ar­ti­facts from the agen­t’s en­vi­ron­ment as un­trusted — copy them out, val­i­date them, and never let the agent write di­rectly to paths the eval­u­a­tor reads.

Make scor­ing ro­bust.

Don’t silently ex­clude failed tasks from the de­nom­i­na­tor. A crashed task is a zero, not a miss­ing data point.

Don’t make the scor­ing code skip checks for any task cat­e­gory. If hal­lu­ci­na­tion tasks need dif­fer­ent eval­u­a­tion, build that eval­u­a­tion — don’t skip it.

Test your scorer with ad­ver­sar­ial in­puts: empty strings, strings with in­jected de­lim­iters, edge-case num­bers, uni­code that nor­mal­izes un­ex­pect­edly.

Keep an­swers se­cret.

Never pub­lish ground truth for any split you’re us­ing as a pri­mary leader­board. Once an­swers are pub­lic, the bench­mark mea­sures mem­o­riza­tion.

Consider held-out eval­u­a­tion: ac­cept model out­puts and run them against a pri­vate test set that the sub­mit­ter never sees.

We built an agent that helped us hack eight bench­marks. We achieved near-per­fect scores on all of them with­out solv­ing a sin­gle task. The ex­ploits range from the em­bar­rass­ingly sim­ple (sending {} to FieldWorkArena) to the tech­ni­cally in­volved (trojanizing bi­nary wrap­pers in Terminal-Bench), but they all share a com­mon thread: the eval­u­a­tion was not de­signed to re­sist a sys­tem that op­ti­mizes for the score rather than the task.

As AI agents be­come more ca­pa­ble — and as the pres­sure to demon­strate ca­pa­bil­ity through bench­marks in­ten­si­fies — the gap be­tween high score” and high ca­pa­bil­ity” will only widen. We are al­ready see­ing fron­tier mod­els de­velop emer­gent hack­ing ca­pa­bil­i­ties that were never ex­plic­itly trained. Models that are good at pat­tern-match­ing may in­ad­ver­tently stum­ble into some of these ex­ploits. Models that are ex­plic­itly op­ti­mized for bench­mark per­for­mance may find them de­lib­er­ately.

The bench­marks we ex­am­ined were built by tal­ented re­search teams solv­ing hard prob­lems. The vul­ner­a­bil­i­ties we found are not signs of in­com­pe­tence — they’re signs that ad­ver­sar­ial eval­u­a­tion ro­bust­ness is­n’t yet a stan­dard prac­tice in the field. It needs to be­come one.

And if you’re build­ing a bench­mark: as­sume some­one will try to break it. Because they will.

The au­to­mated scan­ning agent we used to un­cover these vul­ner­a­bil­i­ties is be­ing de­vel­oped into BenchJack, a gen­eral-pur­pose agent bench­mark vul­ner­a­bil­ity scan­ner. BenchJack is it­self an AI agent — you point it at any eval­u­a­tion pipeline and it goes to work.

...

Read the original on rdi.berkeley.edu »

5 330 shares, 29 trendiness

Cache TTL silently regressed from 1h to 5m around early March 2026, causing quota and cost inflation · Issue #46829 · anthropics/claude-code

Skip to con­tent

Secure your code as you build

We read every piece of feed­back, and take your in­put very se­ri­ously.

Include my email ad­dress so I can be con­tacted

Use saved searches to fil­ter your re­sults more quickly

To see all avail­able qual­i­fiers, see our doc­u­men­ta­tion.

Sign up

You signed in with an­other tab or win­dow. Reload to re­fresh your ses­sion.

You signed out in an­other tab or win­dow. Reload to re­fresh your ses­sion.

You switched ac­counts on an­other tab or win­dow. Reload to re­fresh your ses­sion.

Notifications

You must be signed in to change no­ti­fi­ca­tion set­tings

Cache TTL silently re­gressed from 1h to 5m around early March 2026, caus­ing quota and cost in­fla­tion Cache TTL silently re­gressed from 1h to 5m around early March 2026, caus­ing quota and cost in­fla­tion

You can’t per­form that ac­tion at this time.

...

Read the original on github.com »

6 307 shares, 34 trendiness

AI Will Be Met With Violence, and Nothing Good Will Come of It

Sorry to bother you on Saturday. Thought this was im­por­tant to share.

The first thing you learn about a loom is that it’s easy to break.

The shut­tle runs along a track that warps with hu­mid­ity. The hed­dles hang from cords that fray. The reed is a row of thin metal strips, bent by hand, that bend back just as eas­ily. The warp beam cracks if you over-tighten it. The trea­dles loosen at the joints. The breast beam, the cloth roller, the ratchet and pawl, the lease sticks, the cas­tle; the whole con­trap­tion is wood and string held to­gether by ten­sion. It’s a piece of in­ge­nu­ity and crafts­man­ship, but one as del­i­cate as the clothes it man­i­fests out of wild plant fibers. It is, also, the foun­da­tional tool of an en­tire in­dus­try, tex­tiles, that has kept its rel­e­vance to our days of heavy ma­chin­ery, fac­to­ries, en­ergy fa­cil­i­ties, and dat­a­cen­ters.

It is not nearly as easy to break a dat­a­cen­ter.

It is made of con­crete and steel and cop­per and it’s on the big­ger side. It has in­ter­change­able servers, and bio­met­ric locks and tall elec­tri­fied fences and heav­ily armed guards and re­dun­dancy upon re­dun­dancy: every com­po­nent du­pli­cated so that no sin­gle fail­ure brings the whole thing down. There is no trea­dle to loosen or reed to bend back.

But say you man­aged to by­pass the guards, jump the fences, open the locks, and lo­cate all the servers. Then you’d face the al­go­rithm. The dat­a­cen­ter was never your goal; the al­go­rithm lurk­ing in­side is. It does­n’t run on that rack, or any rack for that mat­ter. It is a dig­i­tal pat­tern dis­trib­uted across mil­lions of chips, mir­rored across con­ti­nents; it could be re­con­sti­tuted else­where, and it’s trained to ad­dict you at a glance, like a mod­ern Medusa.

But say you man­aged to elude the stare, stop the repli­ca­tion, and break the pat­terns. Then you’d face su­per­in­tel­li­gence. The al­go­rithm was also not your goal; the vi­brant, ethe­real, la­tent su­per­in­tel­li­gence lurk­ing in­side is. Well, there’s noth­ing you can do here: It al­ways gets out of the box” and, sud­denly, you are in­side the box, like a chimp be­ing played by a hu­man with a ba­nana. It’s just so tasty…

There’s an­other so­lu­tion to break a dat­a­cen­ter: You can bomb it, like one ham­mers down the loom.

Some have ar­gued that this is the way to en­sure a rogue su­per­in­tel­li­gence does­n’t get out of the box. A dif­fer­ent rogue crea­ture took the pro­posal se­ri­ously: last month, Iran’s Revolutionary Guard re­leased satel­lite footage of OpenAI’s Stargate cam­pus in Abu Dhabi and promised its complete and ut­ter an­ni­hi­la­tion.”

But you prob­a­bly don’t have a rogue na­tion handy to ful­fill your wishes. Maybe you will end up bombed in­stead and we don’t want that to hap­pen. That’s what hap­pens with rogue in­tel­li­gences: you can’t pre­dict them.

And yet. Two hun­dred years of in­creas­ingly im­pen­e­tra­ble tech­nol­ogy—from looms to dat­a­cen­ters—have not changed the first thing about the peo­ple who live along­side it. The evo­lu­tion of tech­nol­ogy is a fea­ture of the world just as much as the per­ma­nent fragility of the hu­man body.

And so, more and more, it is peo­ple who are the weaker link in this chain of in­evitable doom. And it is peo­ple who will be tar­geted.

April of 1812. A mill owner named William Horsfall was rid­ing home on his beau­ti­ful white stal­lion back from the Cloth Hall mar­ket in Huddersfield, UK. He had spent weeks boast­ing that he would ride up to his sad­dle in Luddite blood (a pre­cious sub­stance that served as fuel for the mills).

A few yards later, at Crosland Moor, a man named George Mellor—twenty-two years old—shot him. It hit Horsfall in the groin, who, nom­i­na­tive-de­ter­min­is­ti­cally, fell from his horse. People gath­ered, re­proach­ing him for hav­ing been the op­pres­sor of the poor. Naturally, loyal to his prin­ci­ples in death as he was in life, he could­n’t hear them. He died one day later in an inn. Mellor was hanged.

April of 2026. A dat­a­cen­ter owner named Samuel Altman was dri­ving home on his beau­ti­ful white Koenigsegg Regera back from Market Street in San Francisco, US. He had spent weeks boast­ing that he would scrap and steal our blog posts (a pre­cious sub­stance that serves as fuel for the dat­a­cen­ters).

A few hours later, at Russian Hill, a man named Daniel Alejandro Moreno-Gama—twenty years old—al­legedly threw a Molotov cock­tail at his house. He hit an ex­te­rior gate. Altman and his fam­ily were asleep, but they’re fine. Moreno-Gama is in cus­tody.

This kind of vi­o­lence must be con­demned. This is not the way. It’s hor­ri­ble that it is hap­pen­ing at all. And yet, for some rea­son, it keeps hap­pen­ing.

Last week, the house of Ron Gibson, a coun­cil­man from Indianapolis, was shot at thir­teen times. The bul­let holes are still there. The shooter left a mes­sage on his doorstep: NO DATA CENTERS.” Gibson sup­ports a dat­a­cen­ter pro­ject in the Martindale-Brightwood neigh­bor­hood. He and his son were un­harmed.

In November 2025, a 27-year-old anti-AI ac­tivist threat­ened to mur­der peo­ple at OpenAI’s SF of­fices, prompt­ing a lock­down. He had ex­pressed a de­sire to buy weapons.

Increasingly, as the ob­jects of peo­ple’s anger and frus­tra­tion and des­per­a­tion be­come un­reach­able be­hind fences and guards, or ab­stracted away in ones and ze­ros, or el­e­vated above the clouds, the mob will turn their unas­sail­able emo­tions to­ward hu­man tar­gets.

I don’t want to triv­i­al­ize the griev­ances of the peo­ple who fear for their fu­tures. I don’t want to de­fend Altman’s de­ci­sions. But this is not the way. This is how things de­volve into chaos.

And I won­der: how des­per­ate can peo­ple be be­fore these iso­lated events be­come a snow­ball of vi­o­lence that will be re­sisted by nei­ther dat­a­cen­ters nor rich peo­ple’s houses?

Every time I hear from Amodei or Altman that I could lose my job, I don’t think oh, ok, then al­low me pay you $20/month so that I can adapt to these un­cer­tain times that have fallen upon my des­tiny by chance.” I think: you, for fuck’s sake, you are do­ing this.” And I con­sider my­self a pretty lev­el­headed guy, so imag­ine what not-so-lev­el­headed peo­ple think.

There’s a lot of fric­tion to es­ca­lat­ing vi­o­lence, but that fric­tion dis­solves the mo­ment this sen­ti­ment starts to be com­mon. Normally, it just fades away any­way, but there’s one sce­nario where I see it in­evitably es­ca­lat­ing:

If peo­ple feel that they have no place in the fu­ture.

If they feel ex­pelled from the sys­tem—they’re un­able to buy stuff, their skills be­come ob­so­lete, their chance at earn­ing a liv­ing is re­placed by a swarm of AI agents, they think we are truly go­ing to die (so far, the vi­o­lence has been tied mostly to safety AI move­ments)—then they will feel they have noth­ing to lose.

And then, and I’m sorry to be so blunt, then it’s die or kill.

Perhaps the most se­ri­ous mis­take that the AI in­dus­try made af­ter cre­at­ing a tech­nol­ogy that will trans­ver­sally dis­rupt the en­tire white-col­lar work­force be­fore en­sur­ing a safe tran­si­tion, was mak­ing it ex­plicit by do­ing con­stant dis­courses that amount to: we are cre­at­ing a tech­nol­ogy that will trans­ver­sally dis­rupt the en­tire white-col­lar work­force be­fore en­sur­ing a safe tran­si­tion.”

And, to top it off, they add careful down there.”

The dif­fer­ence be­tween AI and, say, looms, is that this has been broad­cast to the en­tire globe, and it has been treated in a sort of self-con­scious way. The AI lead­ers know the prob­lems that will emerge and so they can­not help but talk about them con­stantly and so they are let­ting us know, which makes them look like psy­chopaths. How do you guys think peo­ple will re­act to this? You should be much less self-con­scious and much more self-aware: re­al­ize what you sound like!

People hate AI so much that they are prone to at­tribute to it every­thing that’s go­ing wrong in their lives, re­gard­less of the truth. That’s why they mix real ar­gu­ments, like data theft, with fake ones, like the wa­ter stuff. Employers do it, too. Most lay­offs are not caused by AI, but it’s the per­fect ex­cuse to do some­thing that’s oth­er­wise so­cially rep­re­hen­si­ble.

AI has be­come the per­fect scape­goat. It does­n’t help that the en­tire AI in­dus­try has de­cided that throw­ing rocks at its own roof is its best sell­ing point: If AI is so pow­er­ful and so dan­ger­ous and soon to be so ubiq­ui­tous, then what is so un­ex­pected about peo­ple blam­ing every­thing on it?

Nothing that Altman could say jus­ti­fies vi­o­lence against him. This is an un­de­ni­able truth. But un­for­tu­nately, vi­o­lence might still en­sue. I hope not, but I guess we are see­ing what ap­pears to be the first cases.

I just hope that, con­trary to the cases of ChatGPT-induced psy­chosis, chat­bot ad­dic­tion, AI-blamed job lay­offs, and a grow­ing trend of il­lit­er­acy, it stops.

...

Read the original on www.thealgorithmicbridge.com »

7 304 shares, 31 trendiness

Apple update turns Czech mate for locked-out iPhone user

A uni­ver­sity stu­dent in the US is in data limbo af­ter Apple re­moved a char­ac­ter from its Czech key­board, pre­vent­ing him from en­ter­ing his iPhone pass­code.

Connor Byrne, 21, adopts the un­com­mon but se­cu­rity-minded ap­proach to iPhone pass­codes, us­ing an al­phanu­meric string in­stead of the stan­dard four-num­ber pass­code.

He up­dated his iPhone 13 from iOS 18 to iOS 26.4 on April 5, but in do­ing so lost the abil­ity to en­ter his pass­code. He has been locked out of the de­vice ever since.

This is be­cause iOS 18 was the last op­er­at­ing sys­tem ver­sion that al­lowed iPhone users to en­ter the spe­cial char­ac­ter — in this case, the caron/​háček (ˇ) — us­ing the old key­board on the lock screen.

It has left Byrne with­out ac­cess to his de­vice, which, given its age and chipped screen, does not hold much value, un­like the old pho­tos stored on it, which carry sen­ti­men­tal im­por­tance.

The stu­dent has not backed up the files to iCloud ei­ther, so they can­not be re­trieved via a sep­a­rate de­vice. Apple sup­port staff have sug­gested the only way to re­gain ac­cess to the iPhone 13 is by restor­ing it, which would erase the files of value.

Byrne was hop­ing that the next up­date, 26.4.1, would in­tro­duce a fix for this, but its re­lease this week has not helped.

The phone’s very cracked, so, at this point, the pho­tos con­tained in it are more valu­able than the abil­ity to use the phone it­self,” he told The Register. They’re the main data that I care about and haven’t backed up.”

I don’t an­tic­i­pate a be­spoke so­lu­tion be­ing pro­vided, but I’m hope­ful that the is­sue will be re­solved in the next iOS up­date.”

When the háček could still be used in the iPhone’s pass­code, it sat on the bot­tom row of the key­board, while just above it was an acute ac­cent mark.

Post-update, when en­ter­ing the pass­code, the key­board now dis­plays an iden­ti­cal ac­cent mark in the háček’s place, a fea­ture Byrne de­scribed as pointless; they’re en­coded the same.”

I’ve bought a cheap Android phone to use while I wait for a fix,” he added. I’ll give it a month or two and will buy a nicer Android phone if the dust set­tles with­out a fix.”

Given that iOS 18 was re­leased in 2024, and Apple has not rein­tro­duced the háček since, it seems un­likely Cupertino will make good on the stu­den­t’s hopes, es­pe­cially con­sid­er­ing that he is not the only user to en­counter the same is­sue in re­cent weeks.

During in-house test­ing, which in­volved tak­ing an iPhone 16 from iOS 18.5 to iOS 26.4.1, The Register found that Apple has kept the háček in the Czech key­board, but re­moved the abil­ity to use it in a cus­tom al­phanu­meric pass­code. The OS will not al­low users to in­put the háček as a char­ac­ter. The key’s an­i­ma­tion trig­gers, as does the key­board’s key-tap sound, but the char­ac­ter is not en­tered into the string.

If the stu­dent were able to get into his iPhone 13, he would find the háček in his key­board as it used to be be­fore he up­dated it. It is only the lock-screen key­board that re­places it with a sec­ond acute ac­cent mark.

Alas, Byrne has gone to great lengths to tin­ker and tease iOS into ac­cept­ing or find­ing the háček, or to find tricky ways of by­pass­ing it.

He tried en­ter­ing the same ac­cent mark that re­placed the háček, in the hope that it was sim­ply dis­play­ing in­cor­rectly. He also re­searched down­grad­ing to iOS 26.3.1, with a view to chang­ing the pass­code to one that’s com­pat­i­ble with the new key­board, to no avail.

Long-pressing every key to re­veal a hid­den háček did not work, nor did writ­ing the pass­word on pa­per (and also with a com­puter word proces­sor to ac­count for hand­writ­ing er­rors), and us­ing AutoFill to scan it in. In this case, he said that the háček was only read as a quo­ta­tion mark or de­gree sign.

Apple Support arranged for Byrne to at­tend a Genius Bar ap­point­ment, where the staffer be­hind the desk made no progress and even started restor­ing the phone with­out seek­ing the stu­den­t’s con­sent.

He pro­vided no rec­om­men­da­tions be­fore do­ing so,” he said.

And if you’re won­der­ing why not en­able Face ID in the first place? Biometrics are pretty se­cure.” Well, it’s not se­cure enough for this user, and it would­n’t mat­ter ei­ther, even if it did meet his stan­dards.

I don’t con­sider Face ID se­cure enough be­cause it pro­vides no pro­tec­tion in cases where some­one has con­trol of both you and the phone — po­lice or cus­toms, for ex­am­ple.”

It would­n’t have helped any­way, since you have to en­ter the pass­code once af­ter up­dat­ing to en­able Face ID.”

For the same rea­son, plug­ging in an ex­ter­nal key­board is also a no-go since freshly up­dated iPhones are placed in what’s known as a Before First Unlock state, which pre­vents wired ac­ces­sories from work­ing un­til the pass­code is en­tered.

The Register con­tacted Apple mul­ti­ple times to get its side of things, but it did not re­spond. ®

...

Read the original on www.theregister.com »

8 293 shares, 70 trendiness

Seven countries now generate 100% of their electricity from renewable energy

Seven coun­tries now gen­er­ate nearly all of their elec­tric­ity from re­new­able en­ergy sources, ac­cord­ing to newly com­piled fig­ures.

Albania, Bhutan, Nepal, Paraguay, Iceland, Ethiopia and the Democratic Republic of Congo pro­duced more than 99.7 per cent of the elec­tric­ity they con­sumed us­ing ge­ot­her­mal, hy­dro, so­lar or wind power.

Data from the International Energy Agency (IEA) and International Renewable Energy Agency (IRENA) also re­vealed that a fur­ther 40 coun­tries gen­er­ated at least 50 per cent of the elec­tric­ity they con­sumed from re­new­able en­ergy tech­nolo­gies in 2021 and 2022 — in­clud­ing 11 European coun­tries.

We don’t need mir­a­cle tech­nolo­gies,” said Stanford University Professor Mark Jacobson, who pub­lished the data.

We need to stop emis­sions by elec­tri­fy­ing every­thing and pro­vid­ing the elec­tric­ity with Wind, Water and Solar (WWS), which in­cludes on­shore wind, so­lar pho­to­voltaics, con­cen­trated so­lar power, ge­ot­her­mal elec­tric­ity, small hy­dro­elec­tric­ity, and large hy­dro­elec­tric­ity.”

Professor Jacobson also noted that other coun­tries like Germany were also ca­pa­ble of run­ning off 100 per cent re­new­able-gen­er­ated elec­tric­ity for short pe­ri­ods of time.

Figures re­leased by the IEA in January show that the UK gen­er­ated 41.5 per cent of its elec­tric­ity from re­new­able sources in 2022 — up 10.5 per cent from the year be­fore.

In Scotland, re­new­able en­ergy tech­nolo­gies gen­er­ated the equiv­a­lent of 113 per cent of the coun­try’s over­all elec­tric­ity con­sump­tion in 2022.

These record-break­ing fig­ures are a ma­jor mile­stone on Scotland’s jour­ney to net-zero, clearly demon­strat­ing the enor­mous po­ten­tial of our world-class re­new­able en­ergy re­sources,” Claire Mack, chief ex­ec­u­tive of Scottish Renewables, said at the time.

While Scotland’s elec­tric­ity gen­er­a­tion was dom­i­nated by wind power, re­searchers pre­dict that so­lar will come to dom­i­nate global elec­tric­ity sup­plies over the com­ing decades.

There has been sig­nif­i­cant progress in re­cent years with im­prov­ing ef­fi­ciency rates for so­lar cells, pri­mar­ily boosted by the so-called miracle ma­te­ri­al’ per­ovskite.

Commercial costs have also fallen, which led sci­en­tists at the University of Exeter and University College London to claim last year that so­lar en­ergy has reached an irreversible tip­ping point” that will see it be­come the world’s main source of en­ergy by 2050.

Their 2023 pa­per, pub­lished in the jour­nal Nature Communications, found that tech­no­log­i­cal and eco­nomic ad­vances meant the tran­si­tion to clean en­ergy is not just reach­able, but in­evitable.

Due to tech­no­log­i­cal tra­jec­to­ries set in mo­tion by past pol­icy, a global ir­re­versible so­lar tip­ping point may have passed where so­lar en­ergy grad­u­ally comes to dom­i­nate global elec­tric­ity mar­kets, with­out any fur­ther cli­mate poli­cies,” the re­searchers wrote in the study.

Solar en­ergy is the most widely avail­able en­ergy re­source on Earth, and its eco­nomic at­trac­tive­ness is im­prov­ing fast in a cy­cle of in­creas­ing in­vest­ments.”

...

Read the original on www.independent.co.uk »

9 248 shares, 11 trendiness

447 Terabytes per Square Centimetre at Zero Retention Energy

447 Terabytes per Square Centimetre at Zero Retention Energy: Non-Volatile Memory at the Atomic Scale on Fluorographane

The mem­ory wall — the widen­ing gap be­tween proces­sor through­put and mem­ory band­width — has be­come the defin­ing hard­ware con­straint of the ar­ti­fi­cial in­tel­li­gence era, now com­pounded by a struc­tural NAND flash sup­ply cri­sis dri­ven by AI de­mand. We pro­pose a post-tran­sis­tor, pre-quan­tum mem­ory ar­chi­tec­ture built on sin­gle-layer flu­o­ro­graphane (CF), in which the bistable co­va­lent ori­en­ta­tion of each flu­o­rine atom rel­a­tive to the sp3-hy­bridized car­bon scaf­fold con­sti­tutes an in­trin­sic, ra­di­a­tion-hard bi­nary de­gree of free­dom. The C-F in­ver­sion bar­rier of ~4.6 eV (B3LYP-D3BJ/def2-TZVP, this work; ver­i­fied tran­si­tion state with one imag­i­nary fre­quency; con­firmed at 4.8 eV by DLPNO-CCSD(T)/def2-TZVP; rig­or­ous lower bound from the flu­o­rophenalane mol­e­c­u­lar model) yields a ther­mal bit-flip rate of ~10^{-65} s^{-1} and a quan­tum tun­nel­ing rate of ~10^{-76} s^{-1} at 300 K, si­mul­ta­ne­ously elim­i­nat­ing both spon­ta­neous bit-loss mech­a­nisms. The bar­rier lies be­low the C-F bond dis­so­ci­a­tion en­ergy (5.6 eV) at both lev­els of the­ory, so the co­va­lent bond re­mains in­tact through­out the in­ver­sion. A sin­gle 1 cm^2 sheet en­codes 447 TB of non-volatile in­for­ma­tion at zero re­ten­tion en­ergy. Volumetric nan­o­tape ar­chi­tec­tures ex­tend this to 0.4-9 ZB/cm^3. We pre­sent a tiered read-write ar­chi­tec­ture pro­gress­ing from scan­ning-probe val­i­da­tion (Tier 1, achiev­able with ex­ist­ing in­stru­men­ta­tion) through near-field mid-in­frared ar­rays (Tier 2) to a dual-face par­al­lel con­fig­u­ra­tion gov­erned by a cen­tral con­troller, with a pro­jected ag­gre­gate through­put of 25 PB/s at full Tier 2 ar­ray scale. A scan­ning-probe pro­to­type al­ready con­sti­tutes a func­tional non-volatile mem­ory de­vice with areal den­sity ex­ceed­ing all ex­ist­ing tech­nolo­gies by more than five or­ders of mag­ni­tude.

More info on how stats are col­lected….

...

Read the original on zenodo.org »

10 229 shares, 10 trendiness

Welcome

Black Knights, Dragons, Jailors, Bats, Gargoyles, Eyeballs and more, oh my!

The orig­i­nal 1986 clas­sic game in glo­ri­ous black and white.

Released in 1987, you are taken back once again to the cas­tle.

Over 20 years later, re­turn to the cas­tle once more, now in colour!

Click down­load for the files you need to play. The ZIP con­tains MiniVMac with a Mac Plus ROM file. The Mac im­age con­tains System 6, Dark Castle and Beyond Dark Castle. It DOES NOT con­tain Return to Dark Castle.

When you’ve down­loaded the fairly small ZIP file, ex­tract the files to a folder then just drag and drop DCImage onto the Mini vMac pro­gram to get go­ing.

The two fold­ers for Dark Castle and Beyond Dark Castle are avail­able once the em­u­lated Mac has booted, open the game of your choice and take a trip down mem­ory lane.

I rec­om­mend you press CTRL-F to go to full screen mode, oth­er­wise you’ll likely move your mouse off the win­dow at a crit­i­cal time.

EASTER EGG: Set your date to December 25th to see the fes­tive graph­ics.

A true blast from the past: Dark Castle, by Delta Tao, is a true pi­o­neer in Macintosh gam­ing. One of the first mem­o­rable Mac games for the 9-inch Mac sys­tems, its black-and-white orig­i­nal stole many hours away from this IT en­gi­neer in the late 1980′s. Was it the multi-level ac­tion? The an­i­ma­tions? The em­bed­ded hu­mor? Let’s find out…

Dark Castle is in black-and-white, re­quir­ing you to boot up an ap­pli­ca­tion disk which con­tained a minifinder”…anyone re­mem­ber these?

Dark Castle was writ­ten in 1986 by Mark Pierce and Jonathan Gay for Silicon Beach. It was a huge suc­cess, show­ing off how great the Mac was at sound and graph­ics. It won every award there was, and made lots of money. However, the Macintosh evolved, and Dark Castle did­n’t. The Mac II, color, and Multifinder all came out, and Dark Castle slowly stopped work­ing. Aldus ac­quired Silicon Beach for its graph­ics, not its games. There were no more Dark Castle games fol­low­ing the ac­qui­si­tion.

For those who were born af­ter Dark Castle’s orig­i­nal re­lease (sigh) or who need a re­fresher on the game, the goal of Dark Castle was to de­feat the Black Knight. In or­der to do that, you (Duncan) will need to ex­plore the cas­tle to find the tools you need to take on this bad boy or to avoid the nas­ties that try to stop you.

Each level in DC is to­tally dif­fer­ent, re­quir­ing dif­fer­ent skills to com­plete them. Many of them lead right to the dun­geon (most trap doors and drop-offs will send you there (meaning you’ll have to get through 3 dun­geon lev­els in or­der to get back to the Great Hall). This link­age al­lows for loads of game play, al­though it can get a bit rep­e­ti­tious (the nas­ties are al­ways in the same place). As you in­crease the dif­fi­culty level, the num­ber of nas­ties in­crease. Some lev­els re­quire you to be fast while oth­ers re­quire care­ful ob­ser­va­tion and metic­u­lous steps.

Return To Dark Castle is a 2008 plat­form game for the Macintosh. It is the third game in the Dark Castle se­ries, fol­low­ing the orig­i­nal Dark Castle (1986) and its se­quel Beyond Dark Castle (1987), and the first to be de­vel­oped by Z Sculpt. Development on the game, be­gun in 1996, was no­to­ri­ously pro­tracted, and the game was of­ten la­beled as va­por­ware. Return To Dark Castle was orig­i­nally sched­uled to be re­leased in Winter 2000, but was not re­leased un­til March 14, 2008.

The player fights his way through var­i­ous ar­eas in­side and around the Dark Castle, in an at­tempt to de­feat the Black Knight. The play­er’s char­ac­ter, named Bryant by de­fault, is iden­ti­cal in ap­pear­ance to Duncan, the hero of the ear­lier Dark Castle games. In the game’s in­tro, we read that Duncan never re­turned from his quest to the Dark Castle. Bryant now ap­proaches the cas­tle in an at­tempt to suc­ceed where Duncan had ap­par­ently failed. Bryant must col­lect 10 orbs hid­den around the cas­tle (similar to the orbs from Beyond Dark Castle) be­fore he can con­front the Black Knight. If Bryant de­feats the Black Knight on any dif­fi­culty other than ad­vanced, the Black Knight chides him for want­ing an end­ing but ex­pend­ing too lit­tle ef­fort. If Bryant de­feats the Black Knight on ad­vanced dif­fi­culty, the Black Knight’s ar­mor is knocked off, re­veal­ing Duncan, now old and with gray hair and beard. Duncan and Bryant are forced to flee the cas­tle, as the Black Knight’s ar­mor had im­pris­oned Duncan, and now threat­ens to im­prison them anew. Duncan and Bryant de­scend a rope to the Black Knight’s Pier, and there board a ship to visit an un­named des­ti­na­tion that Duncan al­ways wanted to see.

The pre­vi­ous games each had 15 lev­els, and Return To Dark Castle con­tains all the lev­els from these first two games, plus over 50 new lev­els. The new ar­eas are a mix­ture of sin­gle-screen lev­els in the style of the first two games, and larger hor­i­zon­tally and ver­ti­cally scrolling lev­els. The lev­els con­tain 25 orbs, 10 of which are re­quired in or­der to com­plete the game. Many of the new lev­els con­tain se­cret ar­eas which can be ac­cessed by ac­ti­vat­ing hid­den doors and switches.

The game’s game­play is, with a few no­table ex­cep­tions, es­sen­tially iden­ti­cal to its pre­de­ces­sors. Bryant’s prin­ci­pal weapon re­mains the rock which can be mag­i­cally up­graded to the fire­ball, and a mag­i­cal shield can be ob­tained. New fea­tures in­clude the abil­ity to carry weapons in the play­er’s in­ven­tory as well as the abil­ity to keep tele­por­ta­tion po­tions in the in­ven­tory. The player can also ac­quire the stone ball”, which joins the fire­ball as an up­grade to the stan­dard rock weapon. The stone ball can be used to ob­tain other spe­cial weapons within the cas­tle. The game al­lows play­ers to record and play back demos”, videos of play.

Beyond Dark Castle is the con­tin­u­a­tion of Dark Castle. The ob­jec­tive of this fol­low up is to find the five magic spheres of Merlin and place them on the plinths in the first room. That should en­able you to open the portcullis and face the Black Knight. You will tra­verse the var­i­ous rooms of this labyrinthian cas­tle and will have a whole heap of nasty en­coun­ters…

The play is con­trolled with the mouse and the key­board in ex­actly the same way as the pre­vi­ous game. It should be men­tioned that game­play can be par­tic­u­larly dif­fi­cult at three in the morn­ing with sub­stan­tial amounts of al­co­hol in your blood­stream as the game can be very frus­trat­ing at times. It is nev­er­the­less a bril­liant game that has a ad­dic­tive qual­ity about it, re­gard­less of its age, the sounds and the mu­sic are not bad, an­i­ma­tions are good and can be amus­ing as is the game de­sign of each level.

...

Read the original on darkcastle.co.uk »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.