10 interesting stories served every morning and every evening.




1 1,775 shares, 72 trendiness

Chrome Web Store

Finally, an ef­fi­cient blocker. Easy on CPU and mem­ory. IMPORTANT: uBlock Origin is com­pletely un­re­lated to the site ublock.org”.

uBlock Origin is not an ad blocker”, it’s a wide-spec­trum con­tent blocker with CPU and mem­ory ef­fi­ciency as a pri­mary fea­ture.

Out of the box, these lists of fil­ters are loaded and en­forced:

- uBlock Origin fil­ter lists - EasyList (ads) - EasyPrivacy (tracking) - Peter Lowe’s Ad server list (ads and track­ing) - Online Malicious URL Blocklist

More lists are avail­able for you to se­lect if you wish:

- Annoyances (cookie warn­ings, over­lays, etc.) - hosts-based lists - And many oth­ers

Additionally, you can point-and-click to block JavaScript lo­cally or glob­ally, cre­ate your own global or lo­cal rules to over­ride en­tries from fil­ter lists, and many more ad­vanced fea­tures.

Free. Open source with pub­lic li­cense (GPLv3) For users by users.

If ever you re­ally do want to con­tribute some­thing, think about the peo­ple work­ing hard to main­tain the fil­ter lists you are us­ing, which were made avail­able to use by all for free.

Documentation: https://​github.com/​gorhill/​uBlock#ublock-ori­gin

Project change log: https://​github.com/​gorhill/​uBlock/​re­leases

Contributors @ Github: https://​github.com/​gorhill/​uBlock/​graphs/​con­trib­u­tors

Contributors @ Crowdin: https://​crowdin.net/​pro­ject/​ublock­Google does­n’t ver­ify re­views. Learn more about re­sults and re­views.This de­vel­oper has not iden­ti­fied it­self as a trader. For con­sumers in the European Union, please note that con­sumer rights do not ap­ply to con­tracts be­tween you and this de­vel­oper.The de­vel­oper has dis­closed that it will not col­lect or use your data. To learn more, see the de­vel­op­er’s pri­vacy pol­icy This de­vel­oper de­clares that your data is­Not be­ing sold to third par­ties, out­side of the ap­proved use cas­es­Not be­ing used or trans­ferred for pur­poses that are un­re­lated to the item’s core func­tion­al­i­tyNot be­ing used or trans­ferred to de­ter­mine cred­it­wor­thi­ness or for lend­ing pur­poses

...

Read the original on chromewebstore.google.com »

2 1,775 shares, 54 trendiness

Chrome Web Store

Finally, an ef­fi­cient blocker. Easy on CPU and mem­ory. IMPORTANT: uBlock Origin is com­pletely un­re­lated to the site ublock.org”.

uBlock Origin is not an ad blocker”, it’s a wide-spec­trum con­tent blocker with CPU and mem­ory ef­fi­ciency as a pri­mary fea­ture.

Out of the box, these lists of fil­ters are loaded and en­forced:

- uBlock Origin fil­ter lists - EasyList (ads) - EasyPrivacy (tracking) - Peter Lowe’s Ad server list (ads and track­ing) - Online Malicious URL Blocklist

More lists are avail­able for you to se­lect if you wish:

- Annoyances (cookie warn­ings, over­lays, etc.) - hosts-based lists - And many oth­ers

Additionally, you can point-and-click to block JavaScript lo­cally or glob­ally, cre­ate your own global or lo­cal rules to over­ride en­tries from fil­ter lists, and many more ad­vanced fea­tures.

Free. Open source with pub­lic li­cense (GPLv3) For users by users.

If ever you re­ally do want to con­tribute some­thing, think about the peo­ple work­ing hard to main­tain the fil­ter lists you are us­ing, which were made avail­able to use by all for free.

Documentation: https://​github.com/​gorhill/​uBlock#ublock-ori­gin

Project change log: https://​github.com/​gorhill/​uBlock/​re­leases

Contributors @ Github: https://​github.com/​gorhill/​uBlock/​graphs/​con­trib­u­tors

Contributors @ Crowdin: https://​crowdin.net/​pro­ject/​ublock­Google does­n’t ver­ify re­views. Learn more about re­sults and re­views.This de­vel­oper has not iden­ti­fied it­self as a trader. For con­sumers in the European Union, please note that con­sumer rights do not ap­ply to con­tracts be­tween you and this de­vel­oper.The de­vel­oper has dis­closed that it will not col­lect or use your data. To learn more, see the de­vel­op­er’s pri­vacy pol­icy This de­vel­oper de­clares that your data is­Not be­ing sold to third par­ties, out­side of the ap­proved use cas­es­Not be­ing used or trans­ferred for pur­poses that are un­re­lated to the item’s core func­tion­al­i­tyNot be­ing used or trans­ferred to de­ter­mine cred­it­wor­thi­ness or for lend­ing pur­poses

...

Read the original on chromewebstore.google.com »

3 1,695 shares, 63 trendiness

A 10x Faster TypeScript

Today I’m ex­cited to an­nounce the next steps we’re tak­ing to rad­i­cally im­prove TypeScript per­for­mance.

The core value propo­si­tion of TypeScript is an ex­cel­lent de­vel­oper ex­pe­ri­ence. As your code­base grows, so does the value of TypeScript it­self, but in many cases TypeScript has not been able to scale up to the very largest code­bases. Developers work­ing in large pro­jects can ex­pe­ri­ence long load and check times, and have to choose be­tween rea­son­able ed­i­tor startup time or get­ting a com­plete view of their source code. We know de­vel­op­ers love when they can re­name vari­ables with con­fi­dence, find all ref­er­ences to a par­tic­u­lar func­tion, eas­ily nav­i­gate their code­base, and do all of those things with­out de­lay. New ex­pe­ri­ences pow­ered by AI ben­e­fit from large win­dows of se­man­tic in­for­ma­tion that need to be avail­able with tighter la­tency con­straints. We also want fast com­mand-line builds to val­i­date that your en­tire code­base is in good shape.

To meet those goals, we’ve be­gun work on a na­tive port of the TypeScript com­piler and tools. The na­tive im­ple­men­ta­tion will dras­ti­cally im­prove ed­i­tor startup, re­duce most build times by 10x, and sub­stan­tially re­duce mem­ory us­age. By port­ing the cur­rent code­base, we ex­pect to be able to pre­view a na­tive im­ple­men­ta­tion of tsc ca­pa­ble of com­mand-line type­check­ing by mid-2025, with a fea­ture-com­plete so­lu­tion for pro­ject builds and a lan­guage ser­vice by the end of the year.

You can build and run the Go code from our new work­ing repo, which is of­fered un­der the same li­cense as the ex­ist­ing TypeScript code­base. Check the README for in­struc­tions on how to build and run tsc and the lan­guage server, and to see a sum­mary of what’s im­ple­mented so far. We’ll be post­ing reg­u­lar up­dates as new func­tion­al­ity be­comes avail­able for test­ing.

Our na­tive im­ple­men­ta­tion is al­ready ca­pa­ble of load­ing many pop­u­lar TypeScript pro­jects, in­clud­ing the TypeScript com­piler it­self. Here are times to run tsc on some pop­u­lar code­bases on GitHub of vary­ing sizes:

While we’re not yet fea­ture-com­plete, these num­bers are rep­re­sen­ta­tive of the or­der of mag­ni­tude per­for­mance im­prove­ment you’ll see check­ing most code­bases.

We’re in­cred­i­bly ex­cited about the op­por­tu­ni­ties that this mas­sive speed boost cre­ates. Features that once seemed out of reach are now within grasp. This na­tive port will be able to pro­vide in­stant, com­pre­hen­sive er­ror list­ings across an en­tire pro­ject, sup­port more ad­vanced refac­tor­ings, and en­able deeper in­sights that were pre­vi­ously too ex­pen­sive to com­pute. This new foun­da­tion goes be­yond to­day’s de­vel­oper ex­pe­ri­ence and will en­able the next gen­er­a­tion of AI tools to en­hance de­vel­op­ment, pow­er­ing new tools that will learn, adapt, and im­prove the cod­ing ex­pe­ri­ence.

Most de­vel­oper time is spent in ed­i­tors, and it’s where per­for­mance is most im­por­tant. We want ed­i­tors to load large pro­jects quickly, and re­spond quickly in all sit­u­a­tions. Modern ed­i­tors like Visual Studio and Visual Studio Code have ex­cel­lent per­for­mance as long as the un­der­ly­ing lan­guage ser­vices are also fast. With our na­tive im­ple­men­ta­tion, we’ll be able to pro­vide in­cred­i­bly fast ed­i­tor ex­pe­ri­ences.

Again us­ing the Visual Studio Code code­base as a bench­mark, the cur­rent time to load the en­tire pro­ject in the ed­i­tor on a fast com­puter is about 9.6 sec­onds. This drops down to about 1.2 sec­onds with the na­tive lan­guage ser­vice, an 8x im­prove­ment in pro­ject load time in ed­i­tor sce­nar­ios. What this trans­lates to is a faster work­ing ex­pe­ri­ence from the time you open your ed­i­tor to your first key­stroke in any TypeScript code­base. We ex­pect all pro­jects to see this level of im­prove­ment in load time.

Overall mem­ory us­age also ap­pears to be roughly half of the cur­rent im­ple­men­ta­tion, though we haven’t ac­tively in­ves­ti­gated op­ti­miz­ing this yet and ex­pect to re­al­ize fur­ther im­prove­ments. Editor re­spon­sive­ness for all lan­guage ser­vice op­er­a­tions (including com­ple­tion lists, quick info, go to de­f­i­n­i­tion, and find all ref­er­ences) will also see sig­nif­i­cant speed gains. We’ll also be mov­ing to the Language Server Protocol (LSP), a long­stand­ing in­fra­struc­tural work item to bet­ter align our im­ple­men­ta­tion with other lan­guages.

Our most re­cent TypeScript re­lease was TypeScript 5.8, with TypeScript 5.9 com­ing soon. The JS-based code­base will con­tinue de­vel­op­ment into the 6.x se­ries, and TypeScript 6.0 will in­tro­duce some dep­re­ca­tions and break­ing changes to align with the up­com­ing na­tive code­base.

When the na­tive code­base has reached suf­fi­cient par­ity with the cur­rent TypeScript, we’ll be re­leas­ing it as TypeScript 7.0. This is still in de­vel­op­ment and we’ll be an­nounc­ing sta­bil­ity and fea­ture mile­stones as they oc­cur.

For the sake of clar­ity, we’ll re­fer to them sim­ply as TypeScript 6 (JS) and TypeScript 7 (native), since this will be the nomen­cla­ture for the fore­see­able fu­ture. You may also see us re­fer to Strada” (the orig­i­nal TypeScript co­de­name) and Corsa” (the co­de­name for this ef­fort) in in­ter­nal dis­cus­sions or code com­ments.

While some pro­jects may be able to switch to TypeScript 7 upon re­lease, oth­ers may de­pend on cer­tain API fea­tures, legacy con­fig­u­ra­tions, or other con­straints that ne­ces­si­tate us­ing TypeScript 6. Recognizing TypeScript’s crit­i­cal role in the JS de­vel­op­ment ecosys­tem, we’ll still be main­tain­ing the JS code­base in the 6.x line un­til TypeScript 7+ reaches suf­fi­cient ma­tu­rity and adop­tion.

Our long-term goal is to keep these ver­sions as closely aligned as pos­si­ble so that you can up­grade to TypeScript 7 as soon as it meets your re­quire­ments, or fall back to TypeScript 6 if nec­es­sary.

In the com­ing months we’ll be shar­ing more about this ex­cit­ing ef­fort, in­clud­ing deeper looks into per­for­mance, a new com­piler API, LSP, and more. We’ve writ­ten up some FAQs on the GitHub repo to ad­dress some ques­tions we ex­pect you might have. We also in­vite you to join us for an AMA at the TypeScript Community Discord at 10 AM PDT | 5 PM UTC on March 13th.

A 10x per­for­mance im­prove­ment rep­re­sents a mas­sive leap in the TypeScript and JavaScript de­vel­op­ment ex­pe­ri­ence, so we hope you are as en­thu­si­as­tic as we are for this ef­fort!

...

Read the original on devblogs.microsoft.com »

4 1,479 shares, 60 trendiness

Mark Klein, AT&T Whistleblower Who Revealed NSA Mass Spying

Skip to main con­tent

Email up­dates on news, ac­tions,

and events in your area.

EFFecting Change Livestream Series: Is There Hope for Social Media?

EFF is deeply sad­dened to learn of the pass­ing of Mark Klein, a bona fide hero who risked civil li­a­bil­ity and crim­i­nal pros­e­cu­tion to help ex­pose a mas­sive spy­ing pro­gram that vi­o­lated the rights of mil­lions of Americans.

Mark did­n’t set out to change the world. For 22 years, he was a telecom­mu­ni­ca­tions tech­ni­cian for AT&T, most of that in San Francisco. But he al­ways had a strong sense of right and wrong and a com­mit­ment to pri­vacy.

Mark not only saw how it works, he had the doc­u­ments to prove it.

When the New York Times re­ported in late 2005 that the NSA was en­gag­ing in spy­ing in­side the U. S., Mark re­al­ized that he had wit­nessed how it was hap­pen­ing. He also re­al­ized that the President was not telling Americans the truth about the pro­gram. And, though newly re­tired, he knew that he had to do some­thing. He showed up at EFFs front door in early 2006 with a sim­ple ques­tion: Do you folks care about pri­vacy?”

We did. And what Mark told us changed every­thing. Through his work, Mark had learned that the National Security Agency (NSA) had in­stalled a se­cret, se­cure room at AT&T’s cen­tral of­fice in San Francisco, called Room 641A. Mark was as­signed to con­nect cir­cuits car­ry­ing Internet data to op­ti­cal splitters” that sat just out­side of the se­cret NSA room but were hard­wired into it. Those split­ters—as well as sim­i­lar ones in cities around the U.S.—made a copy of all data go­ing through those cir­cuits and de­liv­ered it into the se­cret room.

A photo of the NSA-controlled secret room’ in the AT&T fa­cil­ity in San Francisco (Credit: Mark Klein)

Mark not only saw how it works, he had the doc­u­ments to prove it. He brought us over a hun­dred pages of au­then­ti­cated AT&T schematic di­a­grams and ta­bles. Mark also shared this in­for­ma­tion with ma­jor me­dia out­lets, nu­mer­ous Congressional staffers, and at least two sen­a­tors per­son­ally. One, Senator Chris Dodd, took the floor of the Senate to ac­knowl­edge Mark as the great American hero he was.

We used Mark’s ev­i­dence to bring two law­suits against the NSA spy­ing that he un­cov­ered. The first was Hepting v. AT&T and the sec­ond was Jewel v. NSA. Mark also came with us to Washington D.C. to push for an end to the spy­ing and de­mand ac­count­abil­ity for it hap­pen­ing in se­cret for so many years.  He wrote an ac­count of his ex­pe­ri­ence called Wiring Up the Big Brother Machine . . . And Fighting It.

Mark stood up and told the truth at great per­sonal risk to him­self and his fam­ily. AT&T threat­ened to sue him, al­though it wisely de­cided not to do so. While we were able to use his ev­i­dence to make some change, both EFF and Mark were ul­ti­mately let down by Congress and the Courts, which have re­fused to take the steps nec­es­sary to end the mass spy­ing even af­ter Edward Snowden pro­vided even more ev­i­dence of it in 2013.

But Mark cer­tainly in­spired all of us at EFF, and he helped in­spire and in­form hun­dreds of thou­sands of or­di­nary Americans to de­mand an end to il­le­gal mass sur­veil­lance. While we have not yet seen the suc­cess in end­ing the spy­ing that we all have hoped for, his brav­ery helped to usher nu­mer­ous re­forms so far.

And the fight is not over. The law, called Section 702, that now au­tho­rizes the con­tin­ued sur­veil­lance that Mark first re­vealed, ex­pires in early 2026. EFF and oth­ers will con­tinue to push for con­tin­ued re­forms and, ul­ti­mately, for the il­le­gal spy­ing to end en­tirely.

Mark’s legacy lives on in our con­tin­u­ing fights to re­form sur­veil­lance and honor the Fourth Amendment’s promise of pro­tect­ing per­sonal pri­vacy. We are for­ever grate­ful to him for hav­ing the courage to stand up and will do our best to honor that legacy by con­tin­u­ing the fight.

Email up­dates on news, ac­tions, events in your area, and more.

Thanks, you’re awe­some! Please check your email for a con­fir­ma­tion link.

Oops some­thing is bro­ken right now, please try again later.

Email up­dates on news, ac­tions, events in your area, and more.

Thanks, you’re awe­some! Please check your email for a con­fir­ma­tion link.

Oops some­thing is bro­ken right now, please try again later.

JavaScript li­cense in­for­ma­tion

...

Read the original on www.eff.org »

5 820 shares, 31 trendiness

Introducing Gemini Robotics and Gemini Robotics-ER, AI models designed for robots to understand, act and react to the physical world.

At Google DeepMind, we’ve been mak­ing progress in how our Gemini mod­els solve com­plex prob­lems through mul­ti­modal rea­son­ing across text, im­ages, au­dio and video. So far how­ever, those abil­i­ties have been largely con­fined to the dig­i­tal realm. In or­der for AI to be use­ful and help­ful to peo­ple in the phys­i­cal realm, they have to demon­strate embodied” rea­son­ing — the hu­man­like abil­ity to com­pre­hend and re­act to the world around us— as well as safely take ac­tion to get things done.

Today, we are in­tro­duc­ing two new AI mod­els, based on Gemini 2.0, which lay the foun­da­tion for a new gen­er­a­tion of help­ful ro­bots.

The first is Gemini Robotics, an ad­vanced vi­sion-lan­guage-ac­tion (VLA) model that was built on Gemini 2.0 with the ad­di­tion of phys­i­cal ac­tions as a new out­put modal­ity for the pur­pose of di­rectly con­trol­ling ro­bots. The sec­ond is Gemini Robotics-ER, a Gemini model with ad­vanced spa­tial un­der­stand­ing, en­abling ro­boti­cists to run their own pro­grams us­ing Gemini’s em­bod­ied rea­son­ing (ER) abil­i­ties.

Both of these mod­els en­able a va­ri­ety of ro­bots to per­form a wider range of real-world tasks than ever be­fore. As part of our ef­forts, we’re part­ner­ing with Apptronik to build the next gen­er­a­tion of hu­manoid ro­bots with Gemini 2.0. We’re also work­ing with a se­lected num­ber of trusted testers to guide the fu­ture of Gemini Robotics-ER.

We look for­ward to ex­plor­ing our mod­els’ ca­pa­bil­i­ties and con­tin­u­ing to de­velop them on the path to real-world ap­pli­ca­tions.

To be use­ful and help­ful to peo­ple, AI mod­els for ro­bot­ics need three prin­ci­pal qual­i­ties: they have to be gen­eral, mean­ing they’re able to adapt to dif­fer­ent sit­u­a­tions; they have to be in­ter­ac­tive, mean­ing they can un­der­stand and re­spond quickly to in­struc­tions or changes in their en­vi­ron­ment; and they have to be dex­ter­ous, mean­ing they can do the kinds of things peo­ple gen­er­ally can do with their hands and fin­gers, like care­fully ma­nip­u­late ob­jects.

While our pre­vi­ous work demon­strated progress in these ar­eas, Gemini Robotics rep­re­sents a sub­stan­tial step in per­for­mance on all three axes, get­ting us closer to truly gen­eral pur­pose ro­bots.

Gemini Robotics lever­ages Gemini’s world un­der­stand­ing to gen­er­al­ize to novel sit­u­a­tions and solve a wide va­ri­ety of tasks out of the box, in­clud­ing tasks it has never seen be­fore in train­ing. Gemini Robotics is also adept at deal­ing with new ob­jects, di­verse in­struc­tions, and new en­vi­ron­ments. In our tech re­port, we show that on av­er­age, Gemini Robotics more than dou­bles per­for­mance on a com­pre­hen­sive gen­er­al­iza­tion bench­mark com­pared to other state-of-the-art vi­sion-lan­guage-ac­tion mod­els.

To op­er­ate in our dy­namic, phys­i­cal world, ro­bots must be able to seam­lessly in­ter­act with peo­ple and their sur­round­ing en­vi­ron­ment, and adapt to changes on the fly.

Because it’s built on a foun­da­tion of Gemini 2.0, Gemini Robotics is in­tu­itively in­ter­ac­tive. It taps into Gemini’s ad­vanced lan­guage un­der­stand­ing ca­pa­bil­i­ties and can un­der­stand and re­spond to com­mands phrased in every­day, con­ver­sa­tional lan­guage and in dif­fer­ent lan­guages.

It can un­der­stand and re­spond to a much broader set of nat­ural lan­guage in­struc­tions than our pre­vi­ous mod­els, adapt­ing its be­hav­ior to your in­put. It also con­tin­u­ously mon­i­tors its sur­round­ings, de­tects changes to its en­vi­ron­ment or in­struc­tions, and ad­justs its ac­tions ac­cord­ingly. This kind of con­trol, or steerability,” can bet­ter help peo­ple col­lab­o­rate with ro­bot as­sis­tants in a range of set­tings, from home to the work­place.

If an ob­ject slips from its grasp, or some­one moves an item around, Gemini Robotics quickly re­plans and car­ries on — a cru­cial abil­ity for ro­bots in the real world, where sur­prises are the norm.

The third key pil­lar for build­ing a help­ful ro­bot is act­ing with dex­ter­ity. Many every­day tasks that hu­mans per­form ef­fort­lessly re­quire sur­pris­ingly fine mo­tor skills and are still too dif­fi­cult for ro­bots. By con­trast, Gemini Robotics can tackle ex­tremely com­plex, multi-step tasks that re­quire pre­cise ma­nip­u­la­tion such as origami fold­ing or pack­ing a snack into a Ziploc bag.

Finally, be­cause ro­bots come in all shapes and sizes, Gemini Robotics was also de­signed to eas­ily adapt to dif­fer­ent ro­bot types. We trained the model pri­mar­ily on data from the bi-arm ro­botic plat­form, ALOHA 2, but we also demon­strated that it could con­trol a bi-arm plat­form, based on the Franka arms used in many aca­d­e­mic labs. Gemini Robotics can even be spe­cial­ized for more com­plex em­bod­i­ments, such as the hu­manoid Apollo ro­bot de­vel­oped by Apptronik, with the goal of com­plet­ing real world tasks.

Gemini Robotics works on dif­fer­ent kinds of ro­bots

Alongside Gemini Robotics, we’re in­tro­duc­ing an ad­vanced vi­sion-lan­guage model called Gemini Robotics-ER (short for “embodied rea­son­ing”). This model en­hances Gemini’s un­der­stand­ing of the world in ways nec­es­sary for ro­bot­ics, fo­cus­ing es­pe­cially on spa­tial rea­son­ing, and al­lows ro­boti­cists to con­nect it with their ex­ist­ing low level con­trollers.

Gemini Robotics-ER im­proves Gemini 2.0’s ex­ist­ing abil­i­ties like point­ing and 3D de­tec­tion by a large mar­gin. Combining spa­tial rea­son­ing and Gemini’s cod­ing abil­i­ties, Gemini Robotics-ER can in­stan­ti­ate en­tirely new ca­pa­bil­i­ties on the fly. For ex­am­ple, when shown a cof­fee mug, the model can in­tuit an ap­pro­pri­ate two-fin­ger grasp for pick­ing it up by the han­dle and a safe tra­jec­tory for ap­proach­ing it.

Gemini Robotics-ER can per­form all the steps nec­es­sary to con­trol a ro­bot right out of the box, in­clud­ing per­cep­tion, state es­ti­ma­tion, spa­tial un­der­stand­ing, plan­ning and code gen­er­a­tion. In such an end-to-end set­ting the model achieves a 2x-3x suc­cess rate com­pared to Gemini 2.0. And where code gen­er­a­tion is not suf­fi­cient, Gemini Robotics-ER can even tap into the power of in-con­text learn­ing, fol­low­ing the pat­terns of a hand­ful of hu­man demon­stra­tions to pro­vide a so­lu­tion.

Gemini Robotics-ER ex­cels at em­bod­ied rea­son­ing ca­pa­bil­i­ties in­clud­ing de­tect­ing ob­jects and point­ing at ob­ject parts, find­ing cor­re­spond­ing points and de­tect­ing ob­jects in 3D.

As we ex­plore the con­tin­u­ing po­ten­tial of AI and ro­bot­ics, we’re tak­ing a lay­ered, holis­tic ap­proach to ad­dress­ing safety in our re­search, from low-level mo­tor con­trol to high-level se­man­tic un­der­stand­ing.

The phys­i­cal safety of ro­bots and the peo­ple around them is a long­stand­ing, foun­da­tional con­cern in the sci­ence of ro­bot­ics. That’s why ro­boti­cists have clas­sic safety mea­sures such as avoid­ing col­li­sions, lim­it­ing the mag­ni­tude of con­tact forces, and en­sur­ing the dy­namic sta­bil­ity of mo­bile ro­bots. Gemini Robotics-ER can be in­ter­faced with these low-level’ safety-crit­i­cal con­trollers, spe­cific to each par­tic­u­lar em­bod­i­ment. Building on Gemini’s core safety fea­tures, we en­able Gemini Robotics-ER mod­els to un­der­stand whether or not a po­ten­tial ac­tion is safe to per­form in a given con­text, and to gen­er­ate ap­pro­pri­ate re­sponses.

To ad­vance ro­bot­ics safety re­search across acad­e­mia and in­dus­try, we are also re­leas­ing a new dataset to eval­u­ate and im­prove se­man­tic safety in em­bod­ied AI and ro­bot­ics. In pre­vi­ous work, we showed how a Robot Constitution in­spired by Isaac Asimov’s Three Laws of Robotics could help prompt an LLM to se­lect safer tasks for ro­bots. We have since de­vel­oped a frame­work to au­to­mat­i­cally gen­er­ate data-dri­ven con­sti­tu­tions - rules ex­pressed di­rectly in nat­ural lan­guage — to steer a ro­bot’s be­hav­ior. This frame­work would al­low peo­ple to cre­ate, mod­ify and ap­ply con­sti­tu­tions to de­velop ro­bots that are safer and more aligned with hu­man val­ues. Finally, the new ASIMOV dataset will help re­searchers to rig­or­ously mea­sure the safety im­pli­ca­tions of ro­botic ac­tions in real-world sce­nar­ios.

To fur­ther as­sess the so­ci­etal im­pli­ca­tions of our work, we col­lab­o­rate with ex­perts in our Responsible Development and Innovation team and as well as our Responsibility and Safety Council, an in­ter­nal re­view group com­mit­ted to en­sure we de­velop AI ap­pli­ca­tions re­spon­si­bly. We also con­sult with ex­ter­nal spe­cial­ists on par­tic­u­lar chal­lenges and op­por­tu­ni­ties pre­sented by em­bod­ied AI in ro­bot­ics ap­pli­ca­tions.

In ad­di­tion to our part­ner­ship with Apptronik, our Gemini Robotics-ER model is also avail­able to trusted testers in­clud­ing Agile Robots, Agility Robots, Boston Dynamics, and Enchanted Tools. We look for­ward to ex­plor­ing our mod­els’ ca­pa­bil­i­ties and con­tin­u­ing to de­velop AI for the next gen­er­a­tion of more help­ful ro­bots.

This work was de­vel­oped by the Gemini Robotics team. For a full list of au­thors and ac­knowl­edge­ments please view our tech­ni­cal re­port.

...

Read the original on deepmind.google »

6 766 shares, 28 trendiness

The DuckDB Local UI

The DuckDB pro­ject was built to make it sim­ple to lever­age mod­ern data­base tech­nol­ogy. DuckDB can be used from many pop­u­lar lan­guages and runs on a wide va­ri­ety of plat­forms. The in­cluded Command Line Interface (CLI) pro­vides a con­ve­nient way to in­ter­ac­tively run SQL queries from a ter­mi­nal win­dow, and sev­eral third-party tools of­fer more so­phis­ti­cated UIs.

The DuckDB CLI pro­vides ad­vanced fea­tures like in­ter­ac­tive multi-line edit­ing, auto-com­plete, and progress in­di­ca­tors. However, it can be cum­ber­some for work­ing with lengthy SQL queries, and its data ex­plo­ration tools are lim­ited. Many of the avail­able third party UIs are great, but se­lect­ing, in­stalling, and con­fig­ur­ing one is not straight­for­ward. Using DuckDB through a UI should be as sim­ple as us­ing the CLI. And now it is!

The DuckDB UI is the re­sult of a col­lab­o­ra­tion be­tween DuckDB Labs and MotherDuck and is shipped as part of the ui ex­ten­sion.

Starting with DuckDB v1.2.1, a full-fea­tured lo­cal web user in­ter­face is avail­able out-of-the-box! You can start it from the ter­mi­nal by launch­ing the DuckDB CLI client with the -ui ar­gu­ment:

You can also run the fol­low­ing SQL com­mand from a DuckDB client (e.g., CLI, Python, Java, etc.):

Both of these ap­proaches in­stall the ui ex­ten­sion (if it is­n’t in­stalled yet), then open the DuckDB UI in your browser:

The DuckDB UI uses in­ter­ac­tive note­books to de­fine SQL scripts and show the re­sults of queries. However, its ca­pa­bil­i­ties go far be­yond this. Let’s go over its main fea­tures.

The DuckDB UI runs all your queries lo­cally: your queries and data never leave your com­puter. If you would like to use MotherDuck through the UI, you have to opt-in ex­plic­itly.

Your at­tached data­bases are shown on the left. This list in­cludes in-mem­ory data­bases plus any files and URLs you’ve loaded. You can ex­plore ta­bles and views by ex­pand­ing data­bases and schemas.

Click on a table or view to show a sum­mary be­low. The UI shows the num­ber of rows, the name and type of each col­umn, and a pro­file of the data in each col­umn.

Select a col­umn to see a more de­tailed sum­mary of its data. You can use the Preview data” but­ton near the top right to in­spect the first 100 rows. You can also find the SQL de­f­i­n­i­tion of the table or view here.

You can or­ga­nize your work into named note­books. Each cell of the note­book can ex­e­cute one or more SQL state­ments. The UI sup­ports syn­tax high­light­ing and au­to­com­plete to as­sist with writ­ing your queries.

You can run the whole cell, or just a se­lec­tion, then sort, fil­ter, or fur­ther trans­form the re­sults us­ing the pro­vided con­trols.

The right panel con­tains the Column Explorer, which shows a sum­mary of your re­sults. You can dive into each col­umn to gain in­sights.

If you would like to con­nect to MotherDuck, you can sign into MotherDuck to per­sist files and ta­bles to a cloud data ware­house crafted for us­ing DuckDB at scale and shar­ing data with your team.

The DuckDB UI is un­der ac­tive de­vel­op­ment. Expect ad­di­tions and im­prove­ments!

Like the DuckDB CLI, the DuckDB UI cre­ates some files in the .duckdb di­rec­tory in your home di­rec­tory. The UI puts its files in a sub-di­rec­tory, ex­ten­sion_­data/​ui:

* Your note­books and some other state are stored in a DuckDB data­base, ui.db.

* When you ex­port data to the clip­board or a file (using the con­trols be­low the re­sults), some tiny in­ter­me­di­ate files (e.g. ui_­ex­port.csv) are gen­er­ated.

Your data is cleared from these files af­ter the ex­port is com­pleted, but some near-empty files re­main, one per file type.

Support for the UI is im­ple­mented in a DuckDB ex­ten­sion. The ex­ten­sion em­beds a lo­cal­host HTTP server, which serves the UI browser ap­pli­ca­tion, and also ex­poses an API for com­mu­ni­ca­tion with DuckDB. In this way, the UI lever­ages the na­tive DuckDB in­stance from which it was started, en­abling full ac­cess to your lo­cal mem­ory, com­pute, and file sys­tem.

Results are re­turned in an ef­fi­cient bi­nary form closely match­ing DuckDB’s in-mem­ory rep­re­sen­ta­tion (DataChunk).

Server-sent events en­able prompt no­ti­fi­ca­tion of up­dates such as at­tach­ing data­bases. These tech­niques and oth­ers make for a low-la­tency ex­pe­ri­ence that keeps you in your flow.

See the UI ex­ten­sion doc­u­men­ta­tion for more de­tails.

In this blog post, we pre­sented the new DuckDB UI, a pow­er­ful web in­ter­face for DuckDB.

The DuckDB UI shares many of its de­sign prin­ci­ples with the DuckDB data­base. It’s sim­ple, fast, fea­ture-rich, and portable, and runs lo­cally on your com­puter. The DuckDB UI ex­ten­sion is also open source: visit the duckdb/​duckdb-ui repos­i­tory if you want to dive in deeper into the ex­ten­sion’s code.

The repos­i­tory does not con­tain the source code for the fron­tend, which is cur­rently not avail­able as open-source. Releasing it as open-source is un­der con­sid­er­a­tion.

For help or to share feed­back, please file an is­sue, join the #ui chan­nel in ei­ther the DuckDB Discord or the MotherDuck Community Slack.

...

Read the original on duckdb.org »

7 736 shares, 30 trendiness

It is as if you were on your phone

It is as if you were on your phone

Look at you! On your phone! But you’ve got a se­cret! And you won’t tell! You’re not on your phone! It is only as if you were on your phone! You’re just pre­tend­ing to be on your phone! On your phone!

It is as if you were on your phone is an al­most spec­u­la­tive game about an in­cred­i­bly near fu­ture in which we’re all si­mul­ta­ne­ously un­der sig­nif­i­cant pres­sure to be on our phones all the time, but also to not be on our phones all the time. Our fin­gers want to touch the screen, our eyes want to watch the sur­face, our brains want to be oc­cu­pied ef­fi­ciently and al­ways. But it’s also ex­haust­ing lik­ing pho­tos, swip­ing pro­files, watch­ing short-form video, and every­thing else we’re al­ways do­ing. It is as if you were on your phone pre­sents an al­ter­na­tive: pre­tend to be on your phone so that you pass as hu­man, but ac­tu­ally do es­sen­tially noth­ing in­stead. Follow the prompts and be free.

It is as if you were on your phone was cre­ated us­ing p5 along with Hammer.js for touch ges­tures.

Iwan Morris. It’s As If You Were On Your Phone is a bizarre new in­tro­spec­tive desk­top mo­bile re­lease. Pocket Gamer. 6 March 2025.

Jason Kottke. A game called It is as if you were on your phone” is de­signed to make you look like you’re on your phone.. Kottke.org. 7 March 2025.

Dan Q. It is as if you were on your phone. Dan Q (Blog). 10 March 2025. (This guy recorded a video of him play­ing which I love!)

de Rochefort, Simone. Finally, I can pre­tend I’m on my phone - And it’s giv­ing me an ex­is­ten­tial cri­sis!. Polygon. 10 March 2025.

Read the Process Documentation for to­dos and de­sign ex­plo­rations

Read the Commit History for de­tailed, mo­ment-to-mo­ment in­sights into the de­vel­op­ment process

Look at the Code Repository for source code etc.

It is as if you were on your phone is li­censed un­der a Creative Commons Attribution-NonCommercial 3.0 Unported License.

...

Read the original on pippinbarr.com »

8 701 shares, 27 trendiness

OpenAI Asks White House for Relief From State AI Rules

(Bloomberg) — OpenAI has asked the Trump ad­min­is­tra­tion to help shield ar­ti­fi­cial in­tel­li­gence com­pa­nies from a grow­ing num­ber of pro­posed state reg­u­la­tions if they vol­un­tar­ily share their mod­els with the fed­eral gov­ern­ment.

In a 15-page set of pol­icy sug­ges­tions re­leased on Thursday, the ChatGPT maker ar­gued that the hun­dreds of AI-related bills cur­rently pend­ing across the US risk un­der­cut­ting America’s tech­no­log­i­cal progress at a time when it faces re­newed com­pe­ti­tion from China. OpenAI said the ad­min­is­tra­tion should con­sider pro­vid­ing some re­lief for AI com­pa­nies big and small from state rules — if and when en­acted — in ex­change for vol­un­tary ac­cess to mod­els.

The rec­om­men­da­tion was one of sev­eral in­cluded in OpenAI’s re­sponse to a re­quest for pub­lic in­put is­sued by the White House Office of Science and Technology Policy in February as the ad­min­is­tra­tion drafts a new pol­icy to en­sure US dom­i­nance in AI. President Donald Trump pre­vi­ously re­scinded the Biden ad­min­is­tra­tion’s sprawl­ing ex­ec­u­tive or­der on AI and tasked the sci­ence of­fice with de­vel­op­ing an AI Action Plan by July.

To date, there has been a no­table ab­sence of fed­eral leg­is­la­tion gov­ern­ing the AI sec­tor. The Trump ad­min­is­tra­tion has gen­er­ally sig­naled its in­ten­tion to take a hands-off ap­proach to reg­u­lat­ing the tech­nol­ogy. But many states are ac­tively weigh­ing new mea­sures on every­thing from deep­fakes to bias in AI sys­tems.

Chris Lehane, OpenAI’s vice pres­i­dent of global af­fairs, said in an in­ter­view that the US AI Safety Institute — a key gov­ern­ment group fo­cused on AI — could act as the main point of con­tact be­tween the fed­eral gov­ern­ment and the pri­vate sec­tor. If com­pa­nies work with the group vol­un­tar­ily to re­view mod­els, the gov­ern­ment could pro­vide them with li­a­bil­ity pro­tec­tions in­clud­ing pre­emp­tion from state based reg­u­la­tions that fo­cus on fron­tier model se­cu­rity,” ac­cord­ing to the pro­posal.

Part of the in­cen­tive for do­ing that ought to be that you don’t have to go through the state stuff, which is not go­ing to be any­where near as good as what the fed­eral level would be,” Lehane said.

In its pol­icy rec­om­men­da­tions, OpenAI also re­it­er­ated its call for the gov­ern­ment to take steps to sup­port AI in­fra­struc­ture in­vest­ments and called for copy­right re­form, ar­gu­ing that America’s fair use doc­trine is crit­i­cal to main­tain­ing AI lead­er­ship. OpenAI and other AI de­vel­op­ers have faced nu­mer­ous copy­right law­suits over the data used to build their mod­els.

...

Read the original on finance.yahoo.com »

9 685 shares, 26 trendiness

Factorio Learning Environment

Large Language Models (LLMs) are rapidly sat­u­rat­ing ex­ist­ing bench­marks, ne­ces­si­tat­ing new open-ended eval­u­a­tions. We in­tro­duce the Factorio Learning Environment (FLE), based on the game of Factorio, that tests agents in long-term plan­ning, pro­gram syn­the­sis, and re­source op­ti­miza­tion.

FLE pro­vides open-ended and ex­po­nen­tially scal­ing chal­lenges - from ba­sic au­toma­tion to com­plex fac­to­ries pro­cess­ing mil­lions of re­source units per sec­ond. We pro­vide two set­tings:

Open-play with the un­bounded task of build­ing the largest fac­tory from scratch on a pro­ce­du­rally gen­er­ated map.

We demon­strate across both set­tings that mod­els still lack strong spa­tial rea­son­ing. In lab-play, we find that LLMs ex­hibit promis­ing short-hori­zon skills, yet are un­able to op­er­ate ef­fec­tively in con­strained en­vi­ron­ments, re­flect­ing lim­i­ta­tions in er­ror analy­sis. In open-play, while LLMs dis­cover au­toma­tion strate­gies that im­prove growth (e.g elec­tric-pow­ered drilling), they fail to achieve com­plex au­toma­tion (e.g elec­tronic-cir­cuit man­u­fac­tur­ing).

Large Language Models (LLMs) have demon­strated re­mark­able ca­pa­bil­i­ties at solv­ing com­plex ques­tion-an­swer (QA) prob­lems, sat­u­rat­ing bench­marks in fac­tual rec­ol­lec­tion, rea­son­ing and code gen­er­a­tion. Benchmark sat­u­ra­tion pre­sents a crit­i­cal chal­lenge for the AI re­search com­mu­nity: how do we mean­ing­fully eval­u­ate and dif­fer­en­ti­ate in­creas­ingly ca­pa­ble mod­els?

We in­tro­duce the Factorio Learning Environment (FLE): a novel frame­work built upon the game of Factorio that ad­dresses this chal­lenge by en­abling un­bounded agent eval­u­a­tion. FLE pro­vides the in­fra­struc­ture, API, and met­rics for as­sess­ing fron­tier LLM agents in code gen­er­a­tion, spa­tial rea­son­ing and long-term plan­ning. In this en­vi­ron­ment, agents must nav­i­gate rapidly scal­ing chal­lenges—from ba­sic re­source ex­trac­tion pro­duc­ing ~30 units/​minute to so­phis­ti­cated pro­duc­tion chains pro­cess­ing mil­lions of units/​sec­ond. This dra­matic growth in com­plex­ity, dri­ven by geo­met­ric in­creases in re­search costs and the com­bi­na­to­r­ial ex­pan­sion of in­ter­de­pen­dent pro­duc­tion chains, cre­ates nat­ural cur­ric­ula for eval­u­at­ing in­creas­ingly ca­pa­ble agents.

Within FLE, we de­fine two com­ple­men­tary eval­u­a­tion pro­to­cols: (1) lab-play with struc­tured, goal-ori­ented tasks that have clear com­ple­tion cri­te­ria, al­low­ing tar­geted as­sess­ment of spe­cific ca­pa­bil­i­ties, and (2) open-play with no pre­de­ter­mined end-state, sup­port­ing truly un­bounded eval­u­a­tion of an agen­t’s abil­ity to au­tonomously set and achieve in­creas­ingly com­plex goals.

Agents in FLE aim to op­ti­mise fac­to­ries pro­gram­mat­i­cally. Left: Agents aim to cre­ate in­creas­ingly ef­fi­cient fac­to­ries, ad­vanc­ing through tech­no­log­i­cal tiers to pro­duce more re­sources per sec­ond. Middle: We pro­vide a Python API to Factorio which en­ables di­rect in­ter­ac­tion with the en­vi­ron­ment through code. Right: Agents sub­mit pro­grams to the game server and re­ceive rich feed­back, en­abling them to re­fine their strate­gies through an it­er­a­tive process of ex­plo­ration and re­fine­ment.

Agents de­velop poli­cies through an in­ter­ac­tive feed­back loop.

Using 23 core API tools, agents com­pose pro­grams that in­ter­act with the en­vi­ron­ment and ob­serve the re­sults through std­out and stderr streams.

The Python name­space al­lows agents to store vari­ables and de­fine func­tions for later use, en­abling in­creas­ingly so­phis­ti­cated strate­gies as ex­pe­ri­ence grows.

This ap­proach mir­rors the way hu­man pro­gram­mers learn - through it­er­a­tion, de­bug­ging, and re­fine­ment based on di­rect feed­back.

Agent pro­grams yield both a Production Score (PS) rep­re­sent­ing the eco­nomic value of all items pro­duced, and mile­stones that re­flect tech­no­log­i­cal ad­vance­ments.

To sys­tem­at­i­cally eval­u­ate agent ca­pa­bil­i­ties in the Factorio Learning Environment, we in­tro­duce two com­ple­men­tary ex­per­i­men­tal set­tings that test dif­fer­ent as­pects of plan­ning, au­toma­tion, and re­source man­age­ment; namely open-play and lab-play.

We eval­u­ate six fron­tier lan­guage mod­els across both set­tings: Claude 3.5-Sonnet, GPT-4o, GPT-4o-Mini, Deepseek-v3, Gemini-2-Flash, and Llama-3.3-70B-Instruct. Each model in­ter­acts with the en­vi­ron­ment through a con­sis­tent prompt­ing ap­proach, re­ceiv­ing the API schema, a guide de­scrib­ing com­mon pat­terns, and mem­ory of past ac­tions and ob­ser­va­tions.

Agents be­gin in a pro­ce­du­rally gen­er­ated world with in­struc­tion to build the largest pos­si­ble fac­tory”. This set­ting tests agents’ abil­ity to set ap­pro­pri­ate goals, bal­ance short-term pro­duc­tion against long-term re­search, and nav­i­gate the com­plex tech tree and game map with­out ex­ter­nal guid­ance.

Agent ca­pa­bil­i­ties are clearly dif­fer­en­ti­ated by their pro­duc­tion scores in open-play.

Left: By plot­ting Production Score (PS) against steps on a log/​log scale, we can ob­serve dis­tinct per­for­mance tra­jec­to­ries for each model.

More ca­pa­ble mod­els not only achieve higher scores but demon­strate steeper growth curves, in­di­cat­ing bet­ter long-term plan­ning.

Milestone an­no­ta­tions show when the me­dian agent first cre­ated key en­ti­ties, re­veal­ing how quickly each model pro­gresses through the tech tree.

Right: Final re­wards re­veal how weaker mod­els strug­gle to ad­vance when com­plex au­toma­tion and lo­gis­tics be­come nec­es­sary.

Production strate­gies re­veal dif­fer­ences in agent plan­ning and ca­pa­bil­i­ties.

We track how var­i­ous mod­els pro­duce items with mul­ti­ple an­tecedent in­gre­di­ents in open-play, show­ing not just what they build but how they ap­proach fac­tory de­sign.

Claude 3.5-Sonnet demon­strates so­phis­ti­cated strat­egy by im­me­di­ately be­gin­ning com­plex craft­ing and in­vest­ing in re­search and au­toma­tion, ul­ti­mately un­lock­ing elec­tric-min­ing-drills around step 3k - a de­ci­sion that boosts iron-plate pro­duc­tion by 50% there­after.

In con­trast, less ad­vanced mod­els like GPT-4o-Mini pro­duce min­i­mal quan­ti­ties of multi-in­gre­di­ent items, re­veal­ing lim­i­ta­tions in plan­ning hori­zons.

Interestingly, Deepseek showed stronger ca­pa­bil­i­ties in lab-play than open-play, sug­gest­ing that its gen­eral ca­pa­bil­i­ties ex­ceed its ob­jec­tive-set­ting abil­i­ties in open-ended en­vi­ron­ments.

Agents are pro­vided with re­sources and given a time-limit to achieve an ob­jec­tive. We task agents to build pro­duc­tion lines of 24 dis­tinct tar­get en­ti­ties of in­creas­ing com­plex­ity, start­ing from a sin­gle re­source mine re­quir­ing at most 2 ma­chines (making iron-ore) to a late game en­tity re­quir­ing the co­or­di­na­tion of close to 100 ma­chines (making util­ity-sci­ence-pack). The tar­get en­ti­ties cover items from early to late game, re­quir­ing agents to use a wide va­ri­ety of ma­chines pre­sent in Factorio (drills, fur­naces, as­sem­bling ma­chines, oil re­finer­ies, chem­i­cal plants). As the task dif­fi­culty nat­u­rally in­creases with re­source re­quire­ments, this pro­vides a mea­sure of the com­plex­ity that agents are ca­pa­ble of cre­at­ing in a lim­ited num­ber of steps. All tasks pro­vide the agent with suf­fi­cient re­sources to com­plete the task with all tech­nolo­gies un­locked.

Item pro­duc­tion com­plex­ity cre­ates a nat­ural dif­fi­culty gra­di­ent for agent eval­u­a­tion. Top: We mea­sure task suc­cess rates across the first 8 com­plex­ity lev­els, re­veal­ing a clear de­cline as tar­get en­tity craft­ing com­plex­ity in­creases. Even the most ca­pa­ble mod­els strug­gle with co­or­di­nat­ing more than six ma­chines when pro­duc­ing items with three or more in­gre­di­ents. Bottom: Production progress over time shows a pat­tern of ini­tial rapid ad­vance­ment fol­lowed by stag­na­tion or re­gres­sion. This re­veals a key lim­i­ta­tion in cur­rent agents’ abil­i­ties: they of­ten break ex­ist­ing func­tional struc­tures when at­tempt­ing to scale pro­duc­tion or add new fac­tory sec­tions. The high vari­ance in task progress across runs fur­ther demon­strates the chal­lenge of con­sis­tent per­for­mance in com­plex au­toma­tion tasks.

Plastic bar man­u­fac­tur­ing is the most chal­leng­ing task suc­cess­fully com­pleted in lab-play.

The fac­tory con­sists of a elec­tric­ity steam gen­er­a­tor (top-left), a coal mine with stor­age buffer (top), a crude-oil to pe­tro­leum gas pipeline (bottom) and a chem­i­cal plant (bottom-right).

The chem­i­cal plant cre­ates plas­tic bars us­ing the coal and pe­tro­leum gas as in­puts. By them­selves, the cu­mu­la­tive raw re­sources gen­er­ate a pro­duc­tion score of $224$.

With this spe­cific lay­out, the fac­tory cre­ates $40$ plas­tic bars per $60$ in-game sec­onds, for a pro­duc­tion score of $352$.

This fac­tory was cre­ated by Claude Sonnet 3.5.

Even the strongest model (Claude) only com­pleted 7/24 tasks in lab-play, il­lus­trat­ing sub­stan­tial room for im­prove­ment in this bench­mark.

Our ex­per­i­ments re­vealed sev­eral key pat­terns that high­light both the ca­pa­bil­i­ties and lim­i­ta­tions of cur­rent AI agents when faced with open-ended in­dus­trial chal­lenges:

Models with stronger cod­ing abil­i­ties (Claude 3.5-Sonnet, GPT-4o) achieved higher Production Scores and com­pleted more lab tasks. Claude out­per­formed oth­ers with a PS of 293,206 and 28 mile­stones, pro­gress­ing be­yond early-game re­source ex­trac­tion.

Only Claude con­sis­tently in­vested re­sources in re­search­ing new tech­nolo­gies, de­spite their im­por­tance for long-term pro­gres­sion. After de­ploy­ing elec­tric min­ing drills at step 3k, Claude’s PS grew by 50% (from 200k to 300k), demon­strat­ing the value of strate­gic in­vest­ment.

In open-play, agents fre­quently pur­sue short-sighted ob­jec­tives — like Gemini-2.0 man­u­ally craft­ing 300+ wooden chests over 100 steps — rather than in­vest­ing in re­search or scal­ing ex­ist­ing pro­duc­tion. This re­veals a telling dis­crep­ancy: while Gemini-2 and Deepseek demon­strate early-game au­toma­tion ca­pa­bil­i­ties in struc­tured lab-play, they rarely at­tempt to cre­ate co­he­sive fac­to­ries dur­ing open-ended ex­plo­ration, re­sult­ing in poorer over­all per­for­mance.

All mod­els ex­hib­ited lim­i­ta­tions in spa­tial plan­ning when con­struct­ing multi-sec­tion fac­to­ries. Common fail­ures in­cluded plac­ing en­ti­ties too close to­gether, not al­lo­cat­ing space for con­nec­tions, or in­cor­rect in­serter place­ment - is­sues that se­verely im­pacted per­for­mance in com­plex tasks re­quir­ing co­or­di­na­tion of mul­ti­ple pro­duc­tion lines.

Models fre­quently be­come trapped in repet­i­tive er­ror pat­terns, at­tempt­ing the same in­valid op­er­a­tions re­peat­edly rather than ex­plor­ing al­ter­na­tive so­lu­tions. For in­stance, GPT-4o re­peated the same API method in­cor­rectly for 78 con­sec­u­tive steps de­spite iden­ti­cal er­ror mes­sages.

Models ex­hib­ited dis­tinct cod­ing ap­proaches: Claude fa­vored a REPL style with ex­ten­sive print state­ments (43.3% of code lines) but few as­ser­tions (2.0%), while GPT-4o used a de­fen­sive style with more val­i­da­tion checks (12.8% as­ser­tions) and fewer prints (10.3%).

With thanks to Jack Kleeman and Minqi Jiang for their in­valu­able help with set­ting up com­pute re­sources and ad­vice dur­ing the in­cep­tion of this pro­ject. Thanks to Wube and the Factorio team for de­vel­op­ing such a stim­u­lat­ing game.

...

Read the original on jackhopkins.github.io »

10 675 shares, 27 trendiness

seven39

Social me­dia that’s only open from 7:39pm to 10:39pm EST.

Create an ac­count now and we’ll email you when sev­en39 opens!

Because so­cial me­dia is bet­ter when we’re all on­line to­gether.

No end­less scrolling. No FOMO. Just 3 hours of fun every evening.

The do­main was avail­able.

...

Read the original on www.seven39.com »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.