10 interesting stories served every morning and every evening.




1 1,631 shares, 58 trendiness

Google Broke Its Promise to Me. Now ICE Has My Data.

In September 2024, Amandla Thomas-Johnson was a Ph. D. candidate study­ing in the U.S. on a stu­dent visa when he briefly at­tended a pro-Pales­tin­ian protest. In April 2025, Immigration and Customs Enforcement (ICE) sent Google an ad­min­is­tra­tive sub­poena re­quest­ing his data. The next month, Google gave Thomas-Johnson’s information to ICE with­out giv­ing him the chance to chal­lenge the sub­poena, break­ing a nearly decade-long promise to no­tify users be­fore hand­ing their data to law en­force­ment.

Google names a hand­ful of ex­cep­tions to this promise (such as if Google re­ceives a gag or­der from a court) that do not ap­ply to Thomas-Johnson’s case. While ICE requested” that Google not no­tify Thomas-Johnson, the re­quest was not en­force­able or man­dated by a court. To­day, the Electronic Frontier Foundation sent com­plaints to the California and New York Attorneys General ask­ing them to in­ves­ti­gate Google for de­cep­tive trade prac­tices for break­ing that promise. You can read about the com­plaints here. Below is Thomas-Johnson’s ac­count of his or­deal.

I thought my or­deal with U. S. immigration au­thor­i­ties was over a year ago, when I left the coun­try, cross­ing into Canada at Ni­a­gara Falls.

By that point, the Trump ad­min­is­tra­tion had ef­fec­tively turned fed­eral power against in­ter­na­tional stu­dents like me. After I attended a pro-Palestine protest at Cornell University—for all of five min­utes—the ad­min­is­tra­tion’s rhetoric about crack­ing down on stu­dents protest­ing what we saw as geno­cide forced me into hid­ing for three months. Federal agents came to my home look­ing for me. A friend was de­tained at an air­port in Tampa and in­ter­ro­gated about my where­abouts.

I’m currently a Ph. D. stu­dent. Before that, I was a re­porter. I’m a dual British and Trinadad and Tobago cit­i­zen. I have not been ac­cused of any crime.

I be­lieved that once I left U. S. territory, I had also left the reach of its au­thor­i­ties. I was wrong.

Weeks later, in Geneva, Switzerland, I re­ceived what looked like a rou­tine email from Google. It in­formed me that the com­pany had al­ready handed over my ac­count data to the Department of Homeland Security.

At first, I wasn’t alarmed. I had seen some­thing sim­i­lar be­fore. An as­so­ci­ate of mine, Momodou Taal, had re­ceived ad­vance no­tice from Google and Facebook that his data had been re­quested. He was given ad­vanced no­tice of the sub­poe­nas, and law en­force­ment even­tu­ally with­drew them be­fore the com­pa­nies turned over his data.

Google had al­ready dis­closed my data with­out telling me.

I as­sumed I would be given the same op­por­tu­nity. But the lan­guage in my email was dif­fer­ent. It was fi­nal: Google has re­ceived and re­sponded to le­gal process from a law en­force­ment au­thor­ity com­pelling the re­lease of in­for­ma­tion re­lated to your Google Account.”

Google had al­ready dis­closed my data with­out telling me. There was no op­por­tu­nity to con­test it.

To be clear, this should not have hap­pened this way. Google promises that it will no­tify users be­fore their data is handed over in re­sponse to le­gal processes, in­clud­ing ad­min­is­tra­tive sub­poe­nas. That no­tice is meant to pro­vide a chance to chal­lenge the re­quest. In my case, that safe­guard was by­passed. My data was handed over with­out warn­ing—at the re­quest of an ad­min­is­tra­tion tar­get­ing stu­dents en­gaged in pro­tected po­lit­i­cal speech.

Months later, my lawyer at the Electronic Frontier Foundation obtained the sub­poena it­self. On pa­per, the re­quest fo­cused largely on sub­scriber in­for­ma­tion: IP ad­dresses, phys­i­cal ad­dress, other iden­ti­fiers, and ses­sion times and du­ra­tions.

But taken to­gether, these frag­ments form some­thing far more pow­er­ful—a de­tailed sur­veil­lance pro­file. IP logs can be used to ap­prox­i­mate lo­ca­tion. Phys­i­cal ad­dresses show where you sleep. Ses­sion times would show when you were com­mu­ni­cat­ing with friends or fam­ily. Even with­out mes­sage con­tent, the pic­ture that emerges is in­ti­mate and in­va­sive.

What this ex­pe­ri­ence has made clear is that any­one can be tar­geted by law en­force­ment. And with their mas­sive stores of data, tech­nol­ogy com­pa­nies can fa­cil­i­tate those ar­bi­trary in­ves­ti­ga­tions. Together, they can com­bine state power, cor­po­rate data, and al­go­rith­mic in­fer­ence in ways that are dif­fi­cult to see—and even harder to chal­lenge.

The con­se­quences of what hap­pened to me are not ab­stract. I left the United States. But I do not feel that I have left its reach. Being in­ves­ti­gated by the fed­eral gov­ern­ment is in­tim­i­dat­ing. Questions run through your head. Am I now a marked in­di­vid­ual? Will I face height­ened scrutiny if I con­tinue my re­port­ing? Can I travel safely to see fam­ily in the Caribbean?

Who, ex­actly, can I hold ac­count­able?

Update: This post has been up­dated to in­clude more in­for­ma­tion about Google’s ex­cep­tions to their no­ti­fi­ca­tion pol­icy, none of which ap­plied to the sub­poena tar­get­ing Thomas-Johnson.

...

Read the original on www.eff.org »

2 664 shares, 226 trendiness

Introducing Claude Opus 4.7

Our lat­est model, Claude Opus 4.7, is now gen­er­ally avail­able. Opus 4.7 is a no­table im­prove­ment on Opus 4.6 in ad­vanced soft­ware en­gi­neer­ing, with par­tic­u­lar gains on the most dif­fi­cult tasks. Users re­port be­ing able to hand off their hard­est cod­ing work—the kind that pre­vi­ously needed close su­per­vi­sion—to Opus 4.7 with con­fi­dence. Opus 4.7 han­dles com­plex, long-run­ning tasks with rigor and con­sis­tency, pays pre­cise at­ten­tion to in­struc­tions, and de­vises ways to ver­ify its own out­puts be­fore re­port­ing back.The model also has sub­stan­tially bet­ter vi­sion: it can see im­ages in greater res­o­lu­tion. It’s more taste­ful and cre­ative when com­plet­ing pro­fes­sional tasks, pro­duc­ing higher-qual­ity in­ter­faces, slides, and docs. And—although it is less broadly ca­pa­ble than our most pow­er­ful model, Claude Mythos Preview—it shows bet­ter re­sults than Opus 4.6 across a range of bench­marks:Last week we an­nounced Project Glasswing, high­light­ing the risks—and ben­e­fits—of AI mod­els for cy­ber­se­cu­rity. We stated that we would keep Claude Mythos Preview’s re­lease lim­ited and test new cy­ber safe­guards on less ca­pa­ble mod­els first. Opus 4.7 is the first such model: its cy­ber ca­pa­bil­i­ties are not as ad­vanced as those of Mythos Preview (indeed, dur­ing its train­ing we ex­per­i­mented with ef­forts to dif­fer­en­tially re­duce these ca­pa­bil­i­ties). We are re­leas­ing Opus 4.7 with safe­guards that au­to­mat­i­cally de­tect and block re­quests that in­di­cate pro­hib­ited or high-risk cy­ber­se­cu­rity uses. What we learn from the real-world de­ploy­ment of these safe­guards will help us work to­wards our even­tual goal of a broad re­lease of Mythos-class mod­els.Se­cu­rity pro­fes­sion­als who wish to use Opus 4.7 for le­git­i­mate cy­ber­se­cu­rity pur­poses (such as vul­ner­a­bil­ity re­search, pen­e­tra­tion test­ing, and red-team­ing) are in­vited to join our new Cyber Verification Program.Opus 4.7 is avail­able to­day across all Claude prod­ucts and our API, Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry. Pricing re­mains the same as Opus 4.6: $5 per mil­lion in­put to­kens and $25 per mil­lion out­put to­kens. Developers can use claude-opus-4-7 via the Claude API.Claude Opus 4.7 has gar­nered strong feed­back from our early-ac­cess testers:In early test­ing, we’re see­ing the po­ten­tial for a sig­nif­i­cant leap for our de­vel­op­ers with Claude Opus 4.7. It catches its own log­i­cal faults dur­ing the plan­ning phase and ac­cel­er­ates ex­e­cu­tion, far be­yond pre­vi­ous Claude mod­els. As a fi­nan­cial tech­nol­ogy plat­form serv­ing mil­lions of con­sumers and busi­nesses at sig­nif­i­cant scale, this com­bi­na­tion of speed and pre­ci­sion could be game-chang­ing: ac­cel­er­at­ing de­vel­op­ment ve­loc­ity for faster de­liv­ery of the trusted fi­nan­cial so­lu­tions our cus­tomers rely on every day.An­thropic has al­ready set the stan­dard for cod­ing mod­els, and Claude Opus 4.7 pushes that fur­ther in a mean­ing­ful way as the state-of-the-art model on the mar­ket. In our in­ter­nal evals, it stands out not just for raw ca­pa­bil­ity, but for how well it han­dles real-world async work­flows—au­toma­tions, CI/CD, and long-run­ning tasks. It also thinks more deeply about prob­lems and brings a more opin­ion­ated per­spec­tive, rather than sim­ply agree­ing with the user.Claude Opus 4.7 is the strongest model Hex has eval­u­ated. It cor­rectly re­ports when data is miss­ing in­stead of pro­vid­ing plau­si­ble-but-in­cor­rect fall­backs, and it re­sists dis­so­nant-data traps that even Opus 4.6 falls for. It’s a more in­tel­li­gent, more ef­fi­cient Opus 4.6: low-ef­fort Opus 4.7 is roughly equiv­a­lent to medium-ef­fort Opus 4.6.On our 93-task cod­ing bench­mark, Claude Opus 4.7 lifted res­o­lu­tion by 13% over Opus 4.6, in­clud­ing four tasks nei­ther Opus 4.6 nor Sonnet 4.6 could solve. Combined with faster me­dian la­tency and strict in­struc­tion fol­low­ing, it’s par­tic­u­larly mean­ing­ful for com­plex, long-run­ning cod­ing work­flows. It cuts the fric­tion from those multi-step tasks so de­vel­op­ers can stay in the flow and fo­cus on build­ing.Based on our in­ter­nal re­search-agent bench­mark, Claude Opus 4.7 has the strongest ef­fi­ciency base­line we’ve seen for multi-step work. It tied for the top over­all score across our six mod­ules at 0.715 and de­liv­ered the most con­sis­tent long-con­text per­for­mance of any model we tested. On General Finance—our largest mod­ule—it im­proved mean­ing­fully on Opus 4.6, scor­ing 0.813 ver­sus 0.767, while also show­ing the best dis­clo­sure and data dis­ci­pline in the group. And on de­duc­tive logic, an area where Opus 4.6 strug­gled, Opus 4.7 is solid.Claude Opus 4.7 ex­tends the limit of what mod­els can do to in­ves­ti­gate and get tasks done. Anthropic has clearly op­ti­mized for sus­tained rea­son­ing over long runs, and it shows with mar­ket-lead­ing per­for­mance. As en­gi­neers shift from work­ing 1:1 with agents to man­ag­ing them in par­al­lel, this is ex­actly the kind of fron­tier ca­pa­bil­ity that un­locks new work­flows.We’re see­ing ma­jor im­prove­ments in Claude Opus 4.7’s mul­ti­modal un­der­stand­ing, from read­ing chem­i­cal struc­tures to in­ter­pret­ing com­plex tech­ni­cal di­a­grams. The higher res­o­lu­tion sup­port is help­ing Solve Intelligence build best-in-class tools for life sci­ences patent work­flows, from draft­ing and pros­e­cu­tion to in­fringe­ment de­tec­tion and in­va­lid­ity chart­ing.Claude Opus 4.7 takes long-hori­zon au­ton­omy to a new level in Devin. It works co­her­ently for hours, pushes through hard prob­lems rather than giv­ing up, and un­locks a class of deep in­ves­ti­ga­tion work we could­n’t re­li­ably run be­fore.For Replit, Claude Opus 4.7 was an easy up­grade de­ci­sion. For the work our users do every day, we ob­served it achiev­ing the same qual­ity at lower cost—more ef­fi­cient and pre­cise at tasks like an­a­lyz­ing logs and traces, find­ing bugs, and propos­ing fixes. Personally, I love how it pushes back dur­ing tech­ni­cal dis­cus­sions to help me make bet­ter de­ci­sions. It re­ally feels like a bet­ter coworker.Claude Opus 4.7 demon­strates strong sub­stan­tive ac­cu­racy on BigLaw Bench for Harvey, scor­ing 90.9% at high ef­fort with bet­ter rea­son­ing cal­i­bra­tion on re­view ta­bles and no­tice­ably smarter han­dling of am­bigu­ous doc­u­ment edit­ing tasks. It cor­rectly dis­tin­guishes as­sign­ment pro­vi­sions from change-of-con­trol pro­vi­sions, a task that has his­tor­i­cally chal­lenged fron­tier mod­els. Substance was con­sis­tently rated as a strength across our eval­u­a­tions: cor­rect, thor­ough, and well-cited.Claude Opus 4.7 is a very im­pres­sive cod­ing model, par­tic­u­larly for its au­ton­omy and more cre­ative rea­son­ing. On CursorBench, Opus 4.7 is a mean­ing­ful jump in ca­pa­bil­i­ties, clear­ing 70% ver­sus Opus 4.6 at 58%.For com­plex multi-step work­flows, Claude Opus 4.7 is a clear step up: plus 14% over Opus 4.6 at fewer to­kens and a third of the tool er­rors. It’s the first model to pass our im­plicit-need tests, and it keeps ex­e­cut­ing through tool fail­ures that used to stop Opus cold. This is the re­li­a­bil­ity jump that makes Notion Agent feel like a true team­mate.In our evals, we saw a dou­ble-digit jump in ac­cu­racy of tool calls and plan­ning in our core or­ches­tra­tor agents. As users lever­age Hebbia to plan and ex­e­cute on use cases like re­trieval, slide cre­ation, or doc­u­ment gen­er­a­tion, Claude Opus 4.7 shows the po­ten­tial to im­prove agent de­ci­sion-mak­ing in these work­flows.On Rakuten-SWE-Bench, Claude Opus 4.7 re­solves 3x more pro­duc­tion tasks than Opus 4.6, with dou­ble-digit gains in Code Quality and Test Quality. This is a mean­ing­ful lift and a clear up­grade for the en­gi­neer­ing work our teams are ship­ping every day.For CodeRabbit’s code re­view work­loads, Claude Opus 4.7 is the sharpest model we’ve tested. Recall im­proved by over 10%, sur­fac­ing some of the most dif­fi­cult-to-de­tect bugs in our most com­plex PRs, while pre­ci­sion re­mained sta­ble de­spite the in­creased cov­er­age. It’s a bit faster than GPT-5.4 xhigh on our har­ness, and we’re lin­ing it up for our heav­i­est re­view work at launch.For Genspark’s Super Agent, Claude Opus 4.7 nails the three pro­duc­tion dif­fer­en­tia­tors that mat­ter most: loop re­sis­tance, con­sis­tency, and grace­ful er­ror re­cov­ery. Loop re­sis­tance is the most crit­i­cal. A model that loops in­def­i­nitely on 1 in 18 queries wastes com­pute and blocks users. Lower vari­ance means fewer sur­prises in prod. And Opus 4.7 achieves the high­est qual­ity-per-tool-call ra­tio we’ve mea­sured.Claude Opus 4.7 is a mean­ing­ful step up for Warp. Opus 4.6 is one of the best mod­els out there for de­vel­op­ers, and this model is mea­sur­ably more thor­ough on top of that. It passed Terminal Bench tasks that prior Claude mod­els had failed, and worked through a tricky con­cur­rency bug Opus 4.6 could­n’t crack. For us, that’s the sig­nal.Claude Opus 4.7 is the best model in the world for build­ing dash­boards and data-rich in­ter­faces. The de­sign taste is gen­uinely sur­pris­ing—it makes choices I’d ac­tu­ally ship. It’s my de­fault daily dri­ver now.Claude Opus 4.7 is the most ca­pa­ble model we’ve tested at Quantium. Evaluated against lead­ing AI mod­els through our pro­pri­etary bench­mark­ing so­lu­tion, the biggest gains showed up where they mat­ter most: rea­son­ing depth, struc­tured prob­lem-fram­ing, and com­plex tech­ni­cal work. Fewer cor­rec­tions, faster it­er­a­tions, and stronger out­puts to solve the hard­est prob­lems our clients bring us.Claude Opus 4.7 feels like a real step up in in­tel­li­gence. Code qual­ity is no­tice­ably im­proved, it’s cut­ting out the mean­ing­less wrap­per func­tions and fall­back scaf­fold­ing that used to pile up, and fixes its own code as it goes. It’s the clean­est jump we’ve seen since the move from Sonnet 3.7 to the Claude 4 se­ries.For the com­puter-use work that sits at the heart of XBOWs au­tonomous pen­e­tra­tion test­ing, the new Claude Opus 4.7 is a step change: 98.5% on our vi­sual-acu­ity bench­mark ver­sus 54.5% for Opus 4.6. Our sin­gle biggest Opus pain point ef­fec­tively dis­ap­peared, and that un­locks its use for a whole class of work where we could­n’t use it be­fore.Claude Opus 4.7 is a solid up­grade with no re­gres­sions for Vercel. It’s phe­nom­e­nal on one-shot cod­ing tasks, more cor­rect and com­plete than Opus 4.6, and no­tice­ably more hon­est about its own lim­its. It even does proofs on sys­tems code be­fore start­ing work, which is new be­hav­ior we haven’t seen from ear­lier Claude mod­els.Claude Opus 4.7 is very strong and out­per­forms Opus 4.6 with a 10% to 15% lift in task suc­cess for Factory Droids, with fewer tool er­rors and more re­li­able fol­low-through on val­i­da­tion steps. It car­ries work all the way through in­stead of stop­ping halfway, which is ex­actly what en­ter­prise en­gi­neer­ing teams need.Claude Opus 4.7 au­tonomously built a com­plete Rust text-to-speech en­gine from scratch—neural model, SIMD ker­nels, browser demo—then fed its own out­put through a speech rec­og­nizer to ver­ify it matched the Python ref­er­ence. Months of se­nior en­gi­neer­ing, de­liv­ered au­tonomously. The step up from Opus 4.6 is clear, and the code­base is pub­lic.Claude Opus 4.7 passed three TBench tasks that prior Claude mod­els could­n’t, and it’s land­ing fixes our pre­vi­ous best model missed, in­clud­ing a race con­di­tion. It demon­strates strong pre­ci­sion in iden­ti­fy­ing real is­sues, and sur­faces im­por­tant find­ings that other mod­els ei­ther gave up on or did­n’t re­solve. In Qodo’s real-world code re­view bench­mark, we ob­served top-tier pre­ci­sion.On Databricks’ OfficeQA Pro, Claude Opus 4.7 shows mean­ing­fully stronger doc­u­ment rea­son­ing, with 21% fewer er­rors than Opus 4.6 when work­ing with source in­for­ma­tion. Across our agen­tic rea­son­ing over data bench­marks, it is the best-per­form­ing Claude model for en­ter­prise doc­u­ment analy­sis.For Ramp, Claude Opus 4.7 stands out in agent-team work­flows. We’re see­ing stronger role fi­delity, in­struc­tion-fol­low­ing, co­or­di­na­tion, and com­plex rea­son­ing, es­pe­cially on en­gi­neer­ing tasks that span tools, code­bases, and de­bug­ging con­text. Compared with Opus 4.6, it needs much less step-by-step guid­ance, help­ing us scale the in­ter­nal agent work­flows our en­gi­neer­ing teams run.Claude Opus 4.7 is mea­sur­ably bet­ter than Opus 4.6 for Bolt’s longer-run­ning app-build­ing work, up to 10% bet­ter in the best cases, with­out the re­gres­sions we’ve come to ex­pect from very agen­tic mod­els. It pushes the ceil­ing on what our users can ship in a sin­gle ses­sion.Be­low are some high­lights and notes from our early test­ing of Opus 4.7:Instruction fol­low­ing. Opus 4.7 is sub­stan­tially bet­ter at fol­low­ing in­struc­tions. Interestingly, this means that prompts writ­ten for ear­lier mod­els can some­times now pro­duce un­ex­pected re­sults: where pre­vi­ous mod­els in­ter­preted in­struc­tions loosely or skipped parts en­tirely, Opus 4.7 takes the in­struc­tions lit­er­ally. Users should re-tune their prompts and har­nesses ac­cord­ingly.Im­proved mul­ti­modal sup­port. Opus 4.7 has bet­ter vi­sion for high-res­o­lu­tion im­ages: it can ac­cept im­ages up to 2,576 pix­els on the long edge (~3.75 megapix­els), more than three times as many as prior Claude mod­els. This opens up a wealth of mul­ti­modal uses that de­pend on fine vi­sual de­tail: com­puter-use agents read­ing dense screen­shots, data ex­trac­tions from com­plex di­a­grams, and work that needs pixel-per­fect ref­er­ences.1Real-world work. As well as its state-of-the-art score on the Finance Agent eval­u­a­tion (see table above), our in­ter­nal test­ing showed Opus 4.7 to be a more ef­fec­tive fi­nance an­a­lyst than Opus 4.6, pro­duc­ing rig­or­ous analy­ses and mod­els, more pro­fes­sional pre­sen­ta­tions, and tighter in­te­gra­tion across tasks. Opus 4.7 is also state-of-the-art on GDPval-AA, a third-party eval­u­a­tion of eco­nom­i­cally valu­able knowl­edge work across fi­nance, le­gal, and other do­mains.Mem­ory. Opus 4.7 is bet­ter at us­ing file sys­tem-based mem­ory. It re­mem­bers im­por­tant notes across long, multi-ses­sion work, and uses them to move on to new tasks that, as a re­sult, need less up-front con­text.The charts be­low dis­play more eval­u­a­tion re­sults from our pre-re­lease test­ing, across a range of dif­fer­ent do­mains:Over­all, Opus 4.7 shows a sim­i­lar safety pro­file to Opus 4.6: our eval­u­a­tions show low rates of con­cern­ing be­hav­ior such as de­cep­tion, syco­phancy, and co­op­er­a­tion with mis­use. On some mea­sures, such as hon­esty and re­sis­tance to ma­li­cious prompt in­jec­tion” at­tacks, Opus 4.7 is an im­prove­ment on Opus 4.6; in oth­ers (such as its ten­dency to give overly de­tailed harm-re­duc­tion ad­vice on con­trolled sub­stances), Opus 4.7 is mod­estly weaker. Our align­ment as­sess­ment con­cluded that the model is largely well-aligned and trust­wor­thy, though not fully ideal in its be­hav­ior”. Note that Mythos Preview re­mains the best-aligned model we’ve trained ac­cord­ing to our eval­u­a­tions. Our safety eval­u­a­tions are dis­cussed in full in the Claude Opus 4.7 System Card.Overall mis­aligned be­hav­ior score from our au­to­mated be­hav­ioral au­dit. On this eval­u­a­tion, Opus 4.7 is a mod­est im­prove­ment on Opus 4.6 and Sonnet 4.6, but Mythos Preview still shows the low­est rates of mis­aligned be­hav­ior.In ad­di­tion to Claude Opus 4.7 it­self, we’re launch­ing the fol­low­ing up­dates:More ef­fort con­trol: Opus 4.7 in­tro­duces a new xhigh (“extra high”) ef­fort level be­tween high and max, giv­ing users finer con­trol over the trade­off be­tween rea­son­ing and la­tency on hard prob­lems. In Claude Code, we’ve raised the de­fault ef­fort level to xhigh for all plans. When test­ing Opus 4.7 for cod­ing and agen­tic use cases, we rec­om­mend start­ing with high or xhigh ef­fort.On the Claude Platform (API): as well as sup­port for higher-res­o­lu­tion im­ages, we’re also launch­ing task bud­gets in pub­lic beta, giv­ing de­vel­op­ers a way to guide Claude’s to­ken spend so it can pri­or­i­tize work across longer runs.In Claude Code: The new /ultrareview slash com­mand pro­duces a ded­i­cated re­view ses­sion that reads through changes and flags bugs and de­sign is­sues that a care­ful re­viewer would catch. We’re giv­ing Pro and Max Claude Code users three free ul­tra­reviews to try it out. In ad­di­tion, we’ve ex­tended auto mode to Max users. Auto mode is a new per­mis­sions op­tion where Claude makes de­ci­sions on your be­half, mean­ing that you can run longer tasks with fewer in­ter­rup­tions—and with less risk than if you had cho­sen to skip all per­mis­sions.Opus 4.7 is a di­rect up­grade to Opus 4.6, but two changes are worth plan­ning for be­cause they af­fect to­ken us­age. First, Opus 4.7 uses an up­dated to­k­enizer that im­proves how the model processes text. The trade­off is that the same in­put can map to more to­kens—roughly 1.0–1.35× de­pend­ing on the con­tent type. Second, Opus 4.7 thinks more at higher ef­fort lev­els, par­tic­u­larly on later turns in agen­tic set­tings. This im­proves its re­li­a­bil­ity on hard prob­lems, but it does mean it pro­duces more out­put to­kens. Users can con­trol to­ken us­age in var­i­ous ways: by us­ing the ef­fort pa­ra­me­ter, ad­just­ing their task bud­gets, or prompt­ing the model to be more con­cise. In our own test­ing, the net ef­fect is fa­vor­able—to­ken us­age across all ef­fort lev­els is im­proved on an in­ter­nal cod­ing eval­u­a­tion, as shown be­low—but we rec­om­mend mea­sur­ing the dif­fer­ence on real traf­fic. We’ve writ­ten a mi­gra­tion guide that pro­vides fur­ther ad­vice on up­grad­ing from Opus 4.6 to Opus 4.7.Score on an in­ter­nal agen­tic cod­ing eval­u­a­tion as a func­tion of to­ken us­age at each ef­fort level. In this eval­u­a­tion, the model works au­tonomously from a sin­gle user prompt, and re­sults may not be rep­re­sen­ta­tive of to­ken us­age in in­ter­ac­tive cod­ing. See the mi­gra­tion guide for more on tun­ing ef­fort lev­els.

...

Read the original on www.anthropic.com »

3 647 shares, 62 trendiness

IPv6 – Google

Skip to con­tent

Google col­lects sta­tis­tics about IPv6 adop­tion in the Internet on an on­go­ing ba­sis. We hope that pub­lish­ing this in­for­ma­tion will help Internet providers, web­site own­ers, and pol­icy mak­ers as the in­dus­try rolls out IPv6.

We are con­tin­u­ously mea­sur­ing the avail­abil­ity of IPv6 con­nec­tiv­ity among Google users. The graph shows the per­cent­age of users that ac­cess Google over IPv6.

The chart above shows the avail­abil­ity of IPv6 con­nec­tiv­ity around the world.

Regions where IPv6 is more widely de­ployed (the darker the green, the greater the de­ploy­ment) and users ex­pe­ri­ence in­fre­quent is­sues con­nect­ing to IPv6-enabled web­sites.

Regions where IPv6 is more widely de­ployed but users still ex­pe­ri­ence sig­nif­i­cant re­li­a­bil­ity or la­tency is­sues con­nect­ing to IPv6-enabled web­sites.

Regions where IPv6 is not widely de­ployed and users ex­pe­ri­ence sig­nif­i­cant re­li­a­bil­ity or la­tency is­sues con­nect­ing to IPv6-enabled web­sites.

...

Read the original on www.google.com »

4 600 shares, 28 trendiness

マクドナルド公式

*Menu prices may dif­fer at spe­cial lo­ca­tion restau­rants, se­lected restau­rants and for de­liv­ery.

English menu is avail­able for your con­ve­nience

McDonald’s menu and al­ler­gen/​nu­tri­tion in­for­ma­tion is avail­able in English for the con­ve­nience of our cus­tomers, ex­cept for the in­for­ma­tion listed be­low, which is cur­rently avail­able only in Japanese in McDonald’s Japan web­site.

Information and notes on prod­ucts and avail­abil­ity

*McDonald’s Japan’s al­ler­gen in­for­ma­tion only cov­ers 8 in­gre­di­ents which must be in­di­cated on the la­bel and 20 which are rec­om­mended by Japanese Food Labeling Standard (Food Labeling Act) as of September 2024. You can also place an or­der in English on our of­fi­cial app. Several restau­rants also have English menus on hand, so please ask our crew if you are look­ing for an English menu.

※Click the im­age or prod­uct name to learn more about al­ler­gen/​nu­tri­tion in­for­ma­tion, and other de­tails.

※All dis­played prices are tax in­cluded and a sin­gle, tax-in­clu­sive price ap­plies for both eat-in and take­out (inc. drive-thru) or­ders (tax-exclusive price may dif­fer).

※Menu prices may dif­fer at spe­cial lo­ca­tion restau­rants and se­lected restau­rants.

※Some prod­ucts are not avail­able at all restau­rants.

※“Bai Burger” menu is avail­able for all reg­u­lar burg­ers ex­cept for Roasted Soy Sauce Double Thick Beef” and Roasted Soy Sauce Egg Bacon Thick Beef”.

※Breakfast is avail­able un­til 10:30am, Regular Menu is avail­able from 10:30am and Yoru Mac menu is avail­able from 5:00pm

※Asa Mac or­ders are ac­cepted un­til 10:20am for Mobile Order & Pay and McDelivery

※HiruMac is avail­able be­tween 10:30am and 2:00pm on week­days

※McShake®, McFloat®, Soft Twist, McFlurry® are avail­able be­tween 10:30am and 1:00 am the next day

※McShake® may be mixed with other fla­vors due to the na­ture of the ma­chine. For this rea­son, the al­lergy in­for­ma­tion may dif­fer from the usual in­for­ma­tion dur­ing lim­ited-time prod­uct sales. Please check the lat­est in­for­ma­tion each time you or­der.

※For cus­tomized prod­ucts, ex­act in­for­ma­tion may vary. Please be aware that cus­tomiza­tion is not a ser­vice that com­pletely elim­i­nates al­ler­gens.

※Oreo and the de­sign of the Oreo cookie are trade­marks li­censed by the Mondelez International Group.

※ Coke is a reg­is­tered trade­marks of The Coca-Cola Company.

※McCafé® menu at McCafé by Barista stores avail­abil­ity is sub­ject to McCafé by Barista counter busi­ness hours.

※McCafé® menu is not avail­able for pur­chase at the drive-thru at some McCafé by Barista stores.

※Images are for il­lus­tra­tive pur­poses only.

※Coupons for share­hold­ers are not re­deemable for Shaka Shaka Potato® Buttered Potato Flavor.

...

Read the original on www.mcdonalds.co.jp »

5 574 shares, 48 trendiness

Friends Don't Let Friends Use Ollama

Ollama is the most pop­u­lar way to run lo­cal LLMs. It should­n’t be. It gained that po­si­tion by be­ing first, the first tool that made llama.cpp ac­ces­si­ble to peo­ple who did­n’t want to com­pile C++ or write their own server con­figs. That was a real con­tri­bu­tion, briefly. But the pro­ject has since spent years sys­tem­at­i­cally ob­scur­ing where its ac­tual tech­nol­ogy comes from, mis­lead­ing users about what they’re run­ning, and drift­ing from the lo­cal-first mis­sion that earned it trust in the first place. All while tak­ing ven­ture cap­i­tal money.

This is­n’t a both sides” piece. I’ve used Ollama. I’ve moved on. Here’s why you should too.

Ollama’s en­tire in­fer­ence ca­pa­bil­ity comes from llama.cpp, the C++ in­fer­ence en­gine cre­ated by Georgi Gerganov in March 2023. Gerganov’s pro­ject is what made it pos­si­ble to run LLaMA mod­els on con­sumer lap­tops at all, he hacked to­gether the first ver­sion in an evening, and it kicked off the en­tire lo­cal LLM move­ment. Today llama.cpp has over 100,000 stars on GitHub, 450+ con­trib­u­tors, and is the foun­da­tion that nearly every GGUF-based tool de­pends on.

Ollama was founded in 2021 by Jeffrey Morgan and Michael Chiang, both pre­vi­ously be­hind Kitematic, a Docker GUI that was ac­quired by Docker Inc. They went through Y Combinator’s Winter 2021 batch, raised pre-seed fund­ing, and launched pub­licly in 2023. From day one, the pitch was Docker for LLMs”, a con­ve­nient wrap­per that down­loads and runs mod­els with a sin­gle com­mand. Under the hood, it was llama.cpp do­ing all the work.

For over a year, Ollama’s README con­tained no men­tion of llama.cpp. Not in the README, not on the web­site, not in their mar­ket­ing ma­te­ri­als. The pro­jec­t’s bi­nary dis­tri­b­u­tions did­n’t in­clude the re­quired MIT li­cense no­tice for the llama.cpp code they were ship­ping. This is­n’t a mat­ter of open-source eti­quette, the MIT li­cense has ex­actly one ma­jor re­quire­ment: in­clude the copy­right no­tice. Ollama did­n’t.

The com­mu­nity no­ticed. GitHub is­sue #3185 was opened in early 2024 re­quest­ing li­cense com­pli­ance. It went over 400 days with­out a re­sponse from main­tain­ers. When is­sue #3697 was opened in April 2024 specif­i­cally re­quest­ing llama.cpp ac­knowl­edg­ment, com­mu­nity PR #3700 fol­lowed within hours. Ollama’s co-founder Michael Chiang even­tu­ally added a sin­gle line to the bot­tom of the README: llama.cpp pro­ject founded by Georgi Gerganov.”

The re­sponse to the PR was re­veal­ing. Ollama’s team wrote: We spend a large chunk of time fix­ing and patch­ing it up to en­sure a smooth ex­pe­ri­ence for Ollama users… Overtime, we will be tran­si­tion­ing to more sys­tem­at­i­cally built en­gines.” Translation: we’re not go­ing to give llama.cpp promi­nent credit, and we plan to dis­tance our­selves from it any­way.

As one Hacker News com­menter put it: I’m con­tin­u­ally puz­zled by their ap­proach, it’s such self-in­flicted neg­a­tive PR. Building on llama is per­fectly valid and they’re adding value on ease of use here. Just give the llama team proper credit.” Another: The fact that Ollama has been down­play­ing their re­liance on llama.cpp has been known in the lo­cal LLM com­mu­nity for a long time.”

In mid-2025, Ollama fol­lowed through on that dis­tanc­ing. They moved away from us­ing llama.cpp as their in­fer­ence back­end and built a cus­tom im­ple­men­ta­tion di­rectly on top of ggml, the lower-level ten­sor li­brary that llama.cpp it­self uses. Their stated rea­son was sta­bil­ity, llama.cpp moves fast and breaks things, and Ollama’s en­ter­prise part­ners need re­li­a­bil­ity.

The re­sult was the op­po­site. Ollama’s cus­tom back­end rein­tro­duced bugs that llama.cpp had solved years ago. Community mem­bers flagged bro­ken struc­tured out­put sup­port, vi­sion model fail­ures, and GGML as­ser­tion crashes across mul­ti­ple ver­sions. Models that worked fine in up­stream llama.cpp failed in Ollama, in­clud­ing new re­leases like GPT-OSS 20B, where Ollama’s im­ple­men­ta­tion lacked sup­port for ten­sor types that the model re­quired. Georgi Gerganov him­self iden­ti­fied that Ollama had forked and made bad changes to GGML.

The irony is thick. They down­played their de­pen­dence on llama.cpp for years, then when they fi­nally tried to go it alone, they pro­duced an in­fe­rior ver­sion of the thing they re­fused to credit.

Benchmarks tell the story. Multiple com­mu­nity tests show llama.cpp run­ning 1.8x faster than Ollama on the same hard­ware with the same model, 161 to­kens per sec­ond ver­sus 89. On CPU, the gap is 30-50%. A re­cent com­par­i­son on Qwen-3 Coder 32B showed ~70% higher through­put with llama.cpp. The per­for­mance over­head comes from Ollama’s dae­mon layer, poor GPU of­fload­ing heuris­tics, and a ven­dored back­end that trails up­stream.

When DeepSeek re­leased its R1 model fam­ily in January 2025, Ollama listed the smaller dis­tilled ver­sions, mod­els like DeepSeek-R1-Distill-Qwen-32B, which are fine-tuned Qwen and Llama mod­els, not the ac­tual 671-billion-parameter R1, sim­ply as DeepSeek-R1” in their li­brary and CLI. Running ol­lama run deepseek-r1 pulls an 8B Qwen-derived dis­til­late that be­haves noth­ing like the real model.

This was­n’t an over­sight. DeepSeek them­selves named these mod­els with the R1-Distill” pre­fix. Hugging Face listed them cor­rectly. Ollama stripped the dis­tinc­tion. The re­sult was a flood of so­cial me­dia posts from peo­ple claim­ing they were run­ning DeepSeek-R1” on con­sumer hard­ware, fol­lowed by con­fu­sion about why it per­formed poorly, do­ing rep­u­ta­tional dam­age to DeepSeek in the process.

GitHub is­sues #8557 and #8698 re­quested sep­a­ra­tion of the mod­els. Both were closed as du­pli­cates with no fix. As of to­day, ol­lama run deepseek-r1 still launches a tiny dis­tilled model. Ollama knew the dif­fer­ence and chose to ob­scure it, pre­sum­ably be­cause DeepSeek-R1” dri­ves more down­loads than DeepSeek-R1-Distill-Qwen-32B” does.

In July 2025, Ollama re­leased a GUI desk­top app for ma­cOS and Windows. The app was de­vel­oped in a pri­vate repos­i­tory (github.com/​ol­lama/​app), shipped with­out a li­cense, and the source code was­n’t pub­licly avail­able. For a pro­ject that had built its rep­u­ta­tion on be­ing open-source, this was a jar­ring move.

Community mem­bers im­me­di­ately raised con­cerns. The li­cense is­sue re­ceived 40 up­votes. Developers found po­ten­tial AGPL-3.0 de­pen­den­cies in the bi­nary. The web­site placed the down­load but­ton next to a GitHub link, giv­ing the im­pres­sion users were down­load­ing the MIT-licensed open-source tool when they were ac­tu­ally get­ting an un­li­censed closed-source ap­pli­ca­tion. Maintainers were silent for months. The code was even­tu­ally merged into the main repo in November 2025, but the ini­tial roll­out re­vealed where the pro­jec­t’s in­stincts lie.

As XDA put it: If your pro­ject trades on be­ing open source, you do not get to be vague about what is and is not open at launch.”

GGUF, the model for­mat cre­ated by Georgi Gerganov, was de­signed with one core prin­ci­ple: sin­gle-file de­ploy­ment. Bullet point #1 in the GGUF spec reads: Full in­for­ma­tion: all in­for­ma­tion needed to load a model is con­tained in the model file, and no ad­di­tional in­for­ma­tion needs to be pro­vided by the user.” Chat tem­plates, stop to­kens, model meta­data, it’s all em­bed­ded in the file. You point llama.cpp at a GGUF and it works.

Ollama added the Modelfile on top of this. It’s a sep­a­rate con­fig­u­ra­tion file, in­spired by Dockerfiles, nat­u­rally, that spec­i­fies the base model, chat tem­plate, sys­tem prompt, sam­pling pa­ra­me­ters, and stop to­kens. Most of this in­for­ma­tion al­ready ex­ists in­side the GGUF file. As one Hacker News com­menter put it: We lit­er­ally just got rid of that multi-file chaos only for Ollama to add it back.”

The prob­lems with this ap­proach com­pound quickly. Ollama only auto-de­tects chat tem­plates it al­ready knows about from a hard­coded list. If a GGUF file has a valid Jinja chat tem­plate em­bed­ded in its meta­data but it does­n’t match one of Ollama’s known tem­plates, Ollama falls back to a bare {{ .Prompt }} tem­plate, silently break­ing the mod­el’s in­struc­tion for­mat. The user has to man­u­ally ex­tract the chat tem­plate from the GGUF, trans­late it into Go tem­plate syn­tax (which is dif­fer­ent from Jinja), and write it into a Modelfile. Meanwhile, llama.cpp reads the em­bed­ded tem­plate and just uses it.

Modifying pa­ra­me­ters is worse. If you want to change the tem­per­a­ture or sys­tem prompt on a model you pulled from Ollama’s reg­istry, the work­flow is: ex­port the Modelfile with ol­lama show –modelfile, edit it, then run ol­lama cre­ate to build a new model en­try. Users have re­ported that this process copies the en­tire model, 30 to 60 GB, to change one pa­ra­me­ter. As one user de­scribed it: The modelfile’ work­flow is a pain in the booty. It’s a dog­wa­ter pat­tern and I hate it. Some of these mod­els are 30 to 60GB and copy­ing the en­tire thing to change one pa­ra­me­ter is just dumb.”

Compare this to llama.cpp, where pa­ra­me­ters are com­mand-line flags. Want a dif­fer­ent tem­per­a­ture? Pass –temp 0.7. Different sys­tem prompt? Pass it in the API re­quest. No files to cre­ate, no gi­ga­bytes to copy, no pro­pri­etary for­mat to learn.

The Modelfile also locks users into Ollama’s Go tem­plate syn­tax, which is a dif­fer­ent lan­guage from the Jinja tem­plates that model cre­ators ac­tu­ally pub­lish. LM Studio ac­cepts Jinja tem­plates di­rectly. llama.cpp reads them from the GGUF. Only Ollama re­quires you to trans­late be­tween tem­plate lan­guages, and gets it wrong of­ten enough that en­tire GitHub is­sues are ded­i­cated to mis­matched tem­plates be­tween Ollama’s li­brary and the up­stream GGUF meta­data.

When a new model drops, say a new Qwen, Gemma, or DeepSeek vari­ant, GGUFs typ­i­cally ap­pear on Hugging Face within hours, quan­tized by com­mu­nity mem­bers like Unsloth or Bartowski. With llama.cpp, you can run them im­me­di­ately: llama-server -hf un­sloth/​Qwen3.5-35B-A3B-GGUF:Q4_K_M. One com­mand, straight from Hugging Face, no in­ter­me­di­ary.

With Ollama, you wait. Someone at Ollama has to pack­age the model for their reg­istry, choose which quan­ti­za­tions to of­fer (typically just Q4_K_M and Q8_0, no Q5, Q6, or IQ quants), con­vert the chat tem­plate to Go for­mat, and push it. Until then, the model does­n’t ex­ist in Ollama’s world un­less you do the Modelfile dance your­self.

This cre­ates a re­cur­ring pat­tern on r/​Lo­cal­L­LaMA: new model launches, peo­ple try it through Ollama, it’s bro­ken or slow or has botched chat tem­plates, and the model gets blamed in­stead of the run­time. A re­cent PSA post ti­tled If you want to test new mod­els, use llama.cpp/​trans­form­ers/​vLLM/​SGLang doc­u­mented how Qwen mod­els showed prob­lems with tool calls and garbage re­sponses that only hap­pen with Ollama” due to their ven­dored back­end and bro­ken tem­plate han­dling. As one com­menter put it: Friends don’t let friends use ol­lama.”

The quan­ti­za­tion lim­i­ta­tion is par­tic­u­larly frus­trat­ing. Ollama only sup­ports cre­at­ing Q4_K_S, Q4_K_M, Q8_0, F16, and F32 quan­ti­za­tions. If you need Q5_K_M, Q6_K, or any IQ quant, for­mats that llama.cpp has sup­ported for years, you’re out of luck un­less you do the quan­ti­za­tion your­self out­side of Ollama. When a user asked about Q2_K sup­port, the re­sponse was ef­fec­tively use a dif­fer­ent tool.” For a pro­ject that mar­kets it­self as the easy way to run mod­els, telling users to go else­where for ba­sic quan­ti­za­tion op­tions is telling.

Ollama even­tu­ally added ol­lama run hf.co/{​repo}:{quant} to pull di­rectly from Hugging Face, which par­tially ad­dresses the avail­abil­ity prob­lem. But even then, the file gets copied into Ollama’s hashed blob stor­age, you still can’t share the GGUF with other tools, and the tem­plate de­tec­tion is­sues still ap­ply. The fun­da­men­tal ar­chi­tec­ture re­mains: Ollama in­serts it­self as a mid­dle­man be­tween you and your mod­els, and that mid­dle­man is slower, less ca­pa­ble, and less com­pat­i­ble than the tools it sits on top of.

In late 2025, Ollama in­tro­duced cloud-hosted mod­els along­side its lo­cal li­brary. The tool that was syn­ony­mous with lo­cal, pri­vate in­fer­ence started rout­ing prompts to third-party cloud providers. Proprietary mod­els like MiniMax ap­peared in the model list with­out clear dis­clo­sure that se­lect­ing them would send your data off-ma­chine.

Users raised con­cerns about data rout­ing, when you run a closed-source model like MiniMax-m2.7 through Ollama Cloud,” your prompts may be for­warded to the ex­ter­nal provider who ac­tu­ally hosts the model. Ollama’s own doc­u­men­ta­tion says we process your prompts and re­sponses to pro­vide the ser­vice but do not store or log that con­tent,” but says noth­ing about what the third-party provider does with it. For mod­els hosted by Alibaba Cloud, users noted there is no zero-data-re­ten­tion guar­an­tee.

This was com­pounded by CVE-2025-51471, a to­ken ex­fil­tra­tion vul­ner­a­bil­ity that af­fects all Ollama ver­sions. A ma­li­cious reg­istry server can trick Ollama into send­ing its au­then­ti­ca­tion to­ken to an at­tacker-con­trolled end­point dur­ing a nor­mal model pull. The fix ex­ists as a PR but took months to land. In a tool that built its brand on lo­cal pri­vacy, a vul­ner­a­bil­ity that leaks cre­den­tials to ar­bi­trary servers is not a mi­nor is­sue, it’s an ar­chi­tec­tural phi­los­o­phy prob­lem.

All of this makes more sense when you look at the in­cen­tive struc­ture. Ollama is a Y Combinator-backed (W21) startup, founded by en­gi­neers who pre­vi­ously built a Docker GUI that was ac­quired by Docker Inc. The play­book is fa­mil­iar: wrap an ex­ist­ing open-source pro­ject in a user-friendly in­ter­face, build a user base, raise money, then fig­ure out mon­e­ti­za­tion.

The pro­gres­sion fol­lows the pat­tern cleanly:

Minimize at­tri­bu­tion, make the prod­uct look self-suf­fi­cient to in­vestors

Create lock-in, pro­pri­etary model reg­istry for­mat, hashed file­names that don’t work with other tools

The model reg­istry is worth ex­am­in­ing. Ollama stores down­loaded mod­els us­ing hashed file­names in its own for­mat. If you’ve been pulling mod­els through Ollama for months, you can’t just point llama.cpp or LM Studio at those files with­out ex­tra work. You can bring your own GGUFs to Ollama via a Modelfile, but it’s de­lib­er­ately fric­tion-filled to take them out. This is a form of ven­dor lock-in that most users don’t no­tice un­til they try to leave.

The tools Ollama wraps are di­rectly ac­ces­si­ble, and they’re not much harder to set up.

llama.cpp is the en­gine. It has an OpenAI-compatible API server (llama-server), a built-in web UI, full con­trol over con­text win­dows and sam­pling pa­ra­me­ters, and con­sis­tently bet­ter through­put than Ollama. In February 2026, Gerganov’s ggml.ai joined Hugging Face to en­sure the long-term sus­tain­abil­ity of the pro­ject. It’s truly com­mu­nity-dri­ven, MIT-licensed, and un­der ac­tive de­vel­op­ment with 450+ con­trib­u­tors.

llama-swap han­dles multi-model or­ches­tra­tion, load­ing, un­load­ing, and hot-swap­ping mod­els on de­mand be­hind a sin­gle API end­point. Pair it with LiteLLM and you get a uni­fied OpenAI-compatible proxy that routes across mul­ti­ple back­ends with proper model alias­ing.

LM Studio gives you a GUI if that’s what you want. It uses llama.cpp un­der the hood, ex­poses all the knobs, and sup­ports any GGUF model with­out lock-in. Jan is an­other open-source desk­top app with a clean chat in­ter­face and lo­cal-first de­sign. Msty of­fers a pol­ished GUI with multi-model sup­port and built-in RAG. kobold­cpp is an­other op­tion with a web UI and ex­ten­sive con­fig­u­ra­tion op­tions.

Red Hat’s ra­malama is worth a look too, a con­tainer-na­tive model run­ner that ex­plic­itly cred­its its up­stream de­pen­den­cies front and cen­ter. Exactly what Ollama should have done from the start.

None of these tools re­quire more than a few min­utes to set up. The idea that Ollama is the only ac­ces­si­ble op­tion has­n’t been true for a long time.

Georgi Gerganov hacked to­gether llama.cpp in an evening in March 2023 and kicked off a rev­o­lu­tion in lo­cal AI. He and a com­mu­nity of hun­dreds of con­trib­u­tors have spent years mak­ing it pos­si­ble to run in­creas­ingly pow­er­ful mod­els on con­sumer hard­ware. That work is gen­uinely im­por­tant, it’s the foun­da­tion that keeps lo­cal in­fer­ence open and ac­ces­si­ble.

Ollama wrapped that work in a nice CLI, raised VC money on the back of it, spent over a year re­fus­ing to credit it, forked it badly, shipped a closed-source app along­side it, and then piv­oted the whole thing to­ward cloud ser­vices. At every de­ci­sion point where they could have been good open-source cit­i­zens, they chose the path that made them look more self-suf­fi­cient to in­vestors.

The lo­cal LLM ecosys­tem does­n’t need Ollama. It needs llama.cpp. The rest is pack­ag­ing, and bet­ter pack­ag­ing al­ready ex­ists.

...

Read the original on sleepingrobots.com »

6 514 shares, 24 trendiness

Cybersecurity Looks Like Proof of Work Now

Last week we learned about Anthropic’s Mythos, a new LLM so strikingly ca­pa­ble at com­puter se­cu­rity tasks” that Anthropic did­n’t re­lease it pub­licly. Instead, only crit­i­cal soft­ware mak­ers have been granted ac­cess, pro­vid­ing them time to harden their sys­tems.

We quickly blew through our stan­dard stages of pro­cess­ing big AI claims: shock, ex­is­ten­tial fear, hype, skep­ti­cism, crit­i­cism, and (finally) mov­ing onto the next thing. I en­cour­aged peo­ple to take a wait-and-see ap­proach, as se­cu­rity ca­pa­bil­i­ties are tai­lor-made for im­pres­sive demos. Finding ex­ploits is a clearly de­fined, ver­i­fi­able search prob­lem. You’re not build­ing a com­plex sys­tem, but pok­ing at one that ex­ists. A prob­lem well suited to throw­ing mil­lions of to­kens at.

Yesterday, the first 3rd party analy­sis landed, from the AI Security Institute (AISI), largely sup­port­ing Anthropic’s claims. Mythos is re­ally good, a step up over pre­vi­ous fron­tier mod­els in a land­scape where cy­ber per­for­mance was al­ready rapidly im­prov­ing.”

The en­tire re­port is worth read­ing, but I want to fo­cus on the fol­low­ing chart, de­tail­ing the abil­ity of dif­fer­ent mod­els to suc­cess­fully com­plete a sim­u­lated, com­plex cor­po­rate net­work at­tack:

The Last Ones” is, a 32-step cor­po­rate net­work at­tack sim­u­la­tion span­ning ini­tial re­con­nais­sance through to full net­work takeover, which AISI es­ti­mates to re­quire hu­mans 20 hours to com­plete.” The lines are the av­er­age per­for­mance across mul­ti­ple runs (10 runs for Mythos, Opus 4.6, and GPT-5.4), with the max” lines rep­re­sent­ing the best of each batch. Mythos was the only model to com­plete the task, in 3 out of its 10 at­tempts.

This chart sug­gests an in­ter­est­ing se­cu­rity econ­omy: to harden a sys­tem we need to spend more to­kens dis­cov­er­ing ex­ploits than at­tack­ers spend ex­ploit­ing them.

AISI bud­geted 100M to­kens for each at­tempt. That’s $12,500 per Mythos at­tempt, $125k for all ten runs. Worryingly, none of the mod­els given a 100M bud­get showed signs of di­min­ish­ing re­turns. Models con­tinue mak­ing progress with in­creased to­ken bud­gets across the to­ken bud­gets tested,” AISI notes.

If Mythos con­tin­ues to find ex­ploits so long as you keep throw­ing money at it, se­cu­rity is re­duced to a bru­tally sim­ple equa­tion: to harden a sys­tem you need to spend more to­kens dis­cov­er­ing ex­ploits than at­tack­ers will spend ex­ploit­ing them.

You don’t get points for be­ing clever. You win by pay­ing more. It is a sys­tem that echoes cryp­tocur­ren­cy’s proof of work sys­tem, where suc­cess is tied to raw com­pu­ta­tional work. It’s a low tem­per­a­ture lot­tery: buy the to­kens, maybe you find an ex­ploit. Hopefully you keep try­ing longer than your at­tack­ers.

This cal­cu­lus has a few im­me­di­ate take­aways:

For those of you who aren’t ex­posed to AI max­i­mal­ists, this state­ment feels ab­surd. But lately, af­ter the LiteLLM and Axios sup­ply chain scares, many have ar­gued for reim­ple­ment­ing de­pen­dency func­tion­al­ity us­ing cod­ing agents.

Classical soft­ware en­gi­neer­ing would have you be­lieve that de­pen­den­cies are good (we’re build­ing pyra­mids from bricks), but imo this has to be re-eval­u­ated, and it’s why I’ve been so grow­ingly averse to them, pre­fer­ring to use LLMs to yoink” func­tion­al­ity when it’s sim­ple enough and pos­si­ble.

If se­cu­rity is purely a mat­ter of throw­ing to­kens at a sys­tem, Linus’s law that, given enough eye­balls, all bugs are shal­low,” ex­pands to in­clude to­kens. If cor­po­ra­tions that rely on OSS li­braries spend to se­cure them with to­kens, it’s likely go­ing to be more se­cure than your bud­get al­lows. Certainly, this has com­plex­i­ties: crack­ing a widely used OSS pack­age is in­her­ently more valu­able than hack­ing a one-off im­ple­men­ta­tion, which in­cen­tivizes at­tack­ers to spend more on OSS tar­gets.

Second, hard­en­ing will be an ad­di­tional phase for agen­tic coders.

We’ve al­ready been see­ing de­vel­op­ers break their process into two steps, de­vel­op­ment and code re­view, of­ten us­ing dif­fer­ent mod­els for each phase. As this ma­tures, we’re see­ing pur­pose-built tool­ing meet­ing this pat­tern. Anthropic launched a code re­view prod­uct that costs $15-20 per re­view.

If the above Mythos claims hold, I sus­pect we’ll see a three phase cy­cle: de­vel­op­ment, re­view, and hard­en­ing.

Review: Document, refac­tor, and other gar­den­ing tasks, async, ap­ply­ing best prac­tices with each PR.

Hardening: Identify ex­ploits, au­tonomously, un­til the bud­get runs out.

Critically, hu­man in­put is the lim­iter for the first phase and money is the lim­iter for the last. This qual­ity in­her­ently makes them sep­a­rate stages (why spend to harden be­fore you have some­thing?). Previously, se­cu­rity au­dits were rare, dis­crete, and in­con­sis­tent. Now we can ap­ply them con­stantly, within an op­ti­mal (we hope!) bud­get.

Code re­mains cheap, un­less it needs to be se­cure. Even if costs go down as in­fer­ence op­ti­miza­tions, un­less mod­els reach the point of di­min­ish­ing se­cu­rity re­turns, you still need to buy more to­kens than at­tack­ers do. The cost is fixed by the mar­ket value of an ex­ploit.

...

Read the original on www.dbreunig.com »

7 457 shares, 105 trendiness

Qwen Studio

...

Read the original on qwen.ai »

8 408 shares, 30 trendiness

Darkbloom — Private AI Inference on Apple Silicon

We pre­sent Darkbloom, a de­cen­tral­ized in­fer­ence net­work. AI com­pute to­day flows through three lay­ers of markup — GPU man­u­fac­tur­ers to hy­per­scalers to API providers to end users. Meanwhile, over 100 mil­lion Apple Silicon ma­chines sit idle for most of each day. We built a net­work that con­nects them di­rectly to de­mand. Operators can­not ob­serve in­fer­ence data. The API is OpenAI-compatible. Our mea­sure­ments show up to 70% lower costs com­pared to cen­tral­ized al­ter­na­tives. Operators re­tain 95% of rev­enue.

Idle hard­ware has near-zero mar­ginal cost. That sav­ing passes through to price. OpenAI-compatible API for chat, im­age gen­er­a­tion, and speech-to-text. Every re­quest is end-to-end en­crypted.

Open Console ↗

Your Mac al­ready has the hard­ware. Operators keep 100% of in­fer­ence rev­enue. Electricity cost on Apple Silicon runs $0.01–0.03 per hour de­pend­ing on work­load. The rest is profit.

Start Earning ↗

The AI com­pute mar­ket has three lay­ers of mar­gin.

NVIDIA sells GPUs to hy­per­scalers. AWS, Google, Azure, and CoreWeave mark them up and rent ca­pac­ity to AI com­pa­nies. AI com­pa­nies mark them up again and charge end users per to­ken. Each layer takes a cut. End users pay mul­ti­ples of what the sil­i­con ac­tu­ally costs to run.

This con­cen­trates both wealth and ac­cess. A small num­ber of com­pa­nies con­trol the sup­ply. Everyone else rents.

Meanwhile, Apple has shipped over 100 mil­lion ma­chines with se­ri­ous ML hard­ware. Unified mem­ory ar­chi­tec­tures. 273 to 819 GB/s mem­ory band­width. Neural Engines. Machines ca­pa­ble of run­ning 235-billion-parameter mod­els. Most sit idle 18 or more hours a day. Their own­ers earn noth­ing from this com­pute.

That is not a tech­nol­ogy prob­lem. It is a mar­ket­place prob­lem.

The pat­tern is fa­mil­iar. Airbnb con­nected idle rooms to trav­el­ers. Uber con­nected idle cars to rid­ers. Rooftop so­lar turned idle rooftops into en­ergy as­sets. In each case, dis­trib­uted idle ca­pac­ity un­der­cut cen­tral­ized in­cum­bents on price be­cause the mar­ginal cost was near zero.

Darkbloom does this for AI com­pute. Idle Macs serve in­fer­ence. Users pay less be­cause there is no hy­per­scaler in the mid­dle. Operators earn from hard­ware they al­ready own. Unlike those other net­works, the op­er­a­tor can­not see the user’s data.

of rev­enue goes to the hard­ware owner

Other de­cen­tral­ized com­pute net­works con­nect buy­ers and sell­ers. That is the easy part.

The hard part is trust. You are send­ing prompts to a ma­chine you do not own, op­er­ated by some­one you have never met. Your com­pa­ny’s in­ter­nal data. Your users’ con­ver­sa­tions. Your com­pet­i­tive ad­van­tage, run­ning on hard­ware in some­one else’s house.

No en­ter­prise will do this with­out guar­an­tees stronger than a terms-of-ser­vice doc­u­ment.

Without ver­i­fi­able pri­vacy, de­cen­tral­ized in­fer­ence does not work.

We elim­i­nate every soft­ware path through which an op­er­a­tor could ob­serve in­fer­ence data. Four in­de­pen­dent lay­ers, each in­de­pen­dently ver­i­fi­able.

Requests are en­crypted on the user’s de­vice be­fore trans­mis­sion. The co­or­di­na­tor routes ci­pher­text. Only the tar­get node’s hard­ware-bound key can de­crypt.

Each node holds a key gen­er­ated in­side Apple’s tam­per-re­sis­tant se­cure hard­ware. The at­tes­ta­tion chain traces back to Apple’s root cer­tifi­cate au­thor­ity.

The in­fer­ence process is locked at the OS level. Debugger at­tach­ment is blocked. Memory in­spec­tion is blocked. The op­er­a­tor can­not ex­tract data from a run­ning process.

Every re­sponse is signed by the spe­cific ma­chine that pro­duced it. The full at­tes­ta­tion chain is pub­lished. Anyone can ver­ify it in­de­pen­dently.

en­crypted be­fore it leaves your de­vice

↑ op­er­a­tor is here — every path in­ward is elim­i­nated

The op­er­a­tor runs your in­fer­ence. They can­not see your data.

Prompts are en­crypted be­fore they leave your ma­chine. The co­or­di­na­tor routes traf­fic it can­not read. The provider de­crypts in­side a hard­ened process it can­not in­spect. The at­tes­ta­tion chain is pub­lic.

Read the pa­per ↗

...

Read the original on darkbloom.dev »

9 363 shares, 14 trendiness

Cal.com Goes Closed Source: Why AI Security Is Forcing Our Decision

This is not an easy post to write.

When we started Cal.com, we be­lieved deeply in open source. It’s a core prin­ci­ple we built this com­pany around, and some­thing we’ve been in­cred­i­bly proud of.

Today, we are mak­ing the very dif­fi­cult de­ci­sion to move to closed source, and there’s one sim­ple rea­son: se­cu­rity.

AI is chang­ing every­thing. It’s trans­form­ing how we write con­tent, build soft­ware, and op­er­ate day to day. But what’s talked about far less is how dra­mat­i­cally AI is chang­ing the world of se­cu­rity.

In the past, ex­ploit­ing an ap­pli­ca­tion re­quired a highly skilled hacker with years of ex­pe­ri­ence and a sig­nif­i­cant in­vest­ment of time to find and ex­ploit vul­ner­a­bil­i­ties. The re­al­ity is that hu­mans don’t have the time, at­ten­tion, or pa­tience to find every­thing.

Today, AI can be pointed at an open source code­base and sys­tem­at­i­cally scan it for vul­ner­a­bil­i­ties.

Being open source is in­creas­ingly like giv­ing at­tack­ers the blue­prints to the vault. When the struc­ture is fully vis­i­ble, it be­comes much eas­ier to iden­tify weak­nesses and ex­ploit them.

In re­cent months, we’ve seen a wave of AI se­cu­rity star­tups pro­duc­tiz­ing this ca­pa­bil­ity. Each plat­form sur­faces dif­fer­ent vul­ner­a­bil­i­ties, mak­ing it dif­fi­cult to es­tab­lish a sin­gle, re­li­able source of truth for what is ac­tu­ally se­cure.

This un­cer­tainty forced us to make a choice: re­main open source and ac­cept in­creas­ing risk to cus­tomer data, or move to closed source to re­duce that risk. It’s not a per­fect so­lu­tion, but we have to do every­thing we can to pro­tect our users.

At the same time, we still care deeply about open source. That’s why we are re­leas­ing a ver­sion of our code­base to the com­mu­nity un­der the MIT li­cense as Cal.diy. While our pro­duc­tion code­base has sig­nif­i­cantly di­verged, in­clud­ing ma­jor rewrites of core sys­tems like au­then­ti­ca­tion and data han­dling, we want to en­sure there is still a truly open ver­sion avail­able for de­vel­op­ers, hob­by­ists, and any­one who wants to ex­plore and ex­per­i­ment.

The risk land­scape is ac­cel­er­at­ing quickly. Advanced AI mod­els are now ca­pa­ble of iden­ti­fy­ing and ex­ploit­ing vul­ner­a­bil­i­ties at un­prece­dented speed. In one re­cent ex­am­ple, AI un­cov­ered a 27-year-old vul­ner­a­bil­ity in the BSD ker­nel, one of the most widely used and se­cu­rity-fo­cused open source pro­jects, and gen­er­ated work­ing ex­ploits in a mat­ter of hours.

Continuing as open source would put our ap­pli­ca­tion, our cus­tomers, and the sen­si­tive data we han­dle at sig­nif­i­cant risk. We are tak­ing every step we can to re­duce that risk and pro­tect our users, and for now, that means mov­ing to closed source de­spite how dif­fi­cult that de­ci­sion is.

We hope that one day we can re­turn to open source as the se­cu­rity land­scape evolves. But for now, we have to put our cus­tomers first.

...

Read the original on cal.com »

10 333 shares, 34 trendiness

Firebase browser key without API restrictions used for Gemini requests

We are look­ing for guid­ance re­gard­ing an un­ex­pected €54,000+ Gemini API charge that oc­curred within a few hours af­ter en­abling Firebase AI Logic on an ex­ist­ing Firebase pro­ject.

We cre­ated the pro­ject over a year ago and ini­tially used it only for Firebase Authentication. Recently, we added a sim­ple AI fea­ture (generating a web snip­pet from a text prompt) and en­abled Firebase AI Logic.

Shortly af­ter en­abling this, we ex­pe­ri­enced a sud­den and ex­treme spike in Gemini API us­age. The traf­fic was not cor­re­lated with our ac­tual users and ap­peared to be au­to­mated. The ac­tiv­ity oc­curred within a short overnight win­dow and stopped once we dis­abled the API and ro­tated cre­den­tials.

We had a bud­get alert (€80) and a cost anom­aly alert, both of which trig­gered with a de­lay of a few hours

By the time we re­acted, costs were al­ready around €28,000

The fi­nal amount set­tled at €54,000+ due to de­layed cost re­port­ing

This de­scribes our is­sue in more de­tail:

Google API Keys Weren’t Secrets. But then Gemini Changed the Rules. ◆ Truffle…

Google spent over a decade telling de­vel­op­ers that Google API keys (like those used in Maps, Firebase, etc.) are not se­crets. But that’s no longer true.

We worked with Google Cloud sup­port and pro­vided logs and analy­sis. The charges were clas­si­fied as valid us­age be­cause they orig­i­nated from our pro­ject, and our re­quest for a billing ad­just­ment was ul­ti­mately de­nied.

This us­age was clearly anom­alous, not user-dri­ven, and does not re­flect in­tended or mean­ing­ful con­sump­tion of the ser­vice.

Has any­one en­coun­tered a sim­i­lar is­sue af­ter en­abling Firebase AI Logic or Gemini?

Are there rec­om­mended safe­guards be­yond App Check, quo­tas, and mov­ing calls server-side?

Is there any es­ca­la­tion path we may have missed for cases like this?

Any guid­ance or shared ex­pe­ri­ence would be greatly ap­pre­ci­ated.

Hey @zanbezi ! Sorry to hear about this. A few things:

We have billing ac­count caps rolled out to users of the Gemini API, see: https://​ai.google.dev/​gem­ini-api/​docs/​billing#tier-spend-caps, tier 1 users can spend $250 a month and then are cut off by de­fault (there is a 10 minute de­lay in all of the re­port­ing)

We now sup­port pro­ject spend caps, if you want to set a cus­tomer spend cap, you can also do that (I have my ac­count set at $50 so I don’t spend too much ac­ci­denlty when build­ing, the same 10 minute de­lay ap­plies here too): https://​ai.google.dev/​gem­ini-api/​docs/​billing#pro­ject-spend-caps

We are mov­ing to dis­able the us­age of un­re­stricted API keys in the Gemini API, should have more up­dates there soon.

We now gen­er­ate Auth keys by de­fault for new users (more se­cure key which did­n’t ex­ist when the Gemini API was orig­i­nally cre­ated a few years ago) and will have more to share there soon.

You should gen­er­ally avoid putting a key in client side code as if it is ex­posed, even with the re­stric­tions above you can in­cur costs.

In many cases, we can au­to­mat­i­cally de­tect when a key is vis­i­ble on the pub­lic web and shut down those keys au­to­mat­i­cally for se­cu­rity rea­sons (this hap­pened to me per­son­ally, I ac­ci­den­tally pushed my API key to the pub­lic API docs and it was shut down in min­utes).

By de­fault, keys gen­er­ated in Google AI Studio are re­stricted to just the Gemini API, no other ser­vices are en­abled. However keys gen­er­ated from other parts of Google Cloud have this cross ser­vice ca­pa­bil­ity, you can dou­ble check keys and make sure they are re­stricted for just the re­source you need.

Pls email me and our team can take a look into this case (Lkilpatrick@google.com), we take this all very se­ri­ous and have been push­ing hard to land all the fea­tures men­tioned above and more.

We just started the pre­paid billing roll­out which means you have to pay ahead of time to use the Gemini API, this is rolled out to all new US billing ac­counts as of yes­ter­day and rolling out glob­ally right now. This is yet an­other way to give de­vel­op­ers more con­trol over their spend­ing / costs and en­sure you know what you are sign­ing up for when us­ing the Gemini API.

I hope this helps and sorry for the has­sle on this ex­pe­ri­ence, pls email me if there is more to chat about!

Thanks for the de­tailed re­sponse, we re­ally ap­pre­ci­ate it. It is good to see that ad­di­tional safe­guards (like spend caps) are be­ing in­tro­duced.

I will reach out via email with the de­tails so your team can take a closer look.

Thanks again for tak­ing the time to re­spond.

Great to see you here Logan. This is the proper way to deal with a fi­asco like this one.

...

Read the original on discuss.ai.google.dev »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.