10 interesting stories served every morning and every evening.




1 501 shares, 30 trendiness

What is the origin of the lake tank image that has become a meme?

It’s a Panzer IVD of the 31st Panzer Regiment as­signed to the 5th Panzer Div. com­manded by Lt. Heinz Zobel lost on May 13th, 1940. The lake” is the Meuse River. The man is a German pi­o­neer.

All credit to find­ing the Panzer of the Lake goes to ConeOfArc for co­or­di­nat­ing the search, and miller786 and their team for find­ing the Panzer. Full sources and de­tails are in Panzer Of The Lake - Meuse River Theory

The photo was taken about co­or­di­nates 50.29092467073664, 4.893099128823844 near mod­ern Wallonia, Belgium on the Meuse River. The tank was not re­cov­ered un­til much later in 1941. The man is an un­named German pi­o­neer likely at the time of re­cov­ery.

Comparison of an al­ter­na­tive orig­i­nal photo and the most re­cent im­age avail­able of the lo­ca­tion (July 2020, Google Street View)

On May 12th, 1940 the 31st Panzer Regiment, as­signed to the 5th Panzer Division, at­tempted to cap­ture a bridge over the Meuse River at Yvoir. The bridge was de­mol­ished by 1st Lieutenant De Wispelaere of the Belgian Engineers.

Werner Advance Detachment (under Oberst Paul Hermann Werner, com­man­der, 31st Panzer Regiment), which be­longed to the 5th Panzer Division, un­der Rommel’s com­mand… Werner re­ceived a mes­sage from close sup­port air re­con­nais­sance in the af­ter­noon that the bridge at Yvoir (seven kilo­me­ters north of Dinant) was still in­tact. He (Werner) im­me­di­ately or­dered Leutnant [Heinz] Zobel’s ar­mored as­sault team of two ar­mored scout cars and one Panzer pla­toon to head to the bridge at top speed… Belgian en­gi­neers un­der the com­mand of 1st Lieutenant de Wispelaere had pre­pared the bridge for de­mo­li­tion while a pla­toon of Ardennes Light Infantry and el­e­ments of a French in­fantry bat­tal­ion screened the bridge… Although the last sol­diers had al­ready passed the bridge, de Wispelaere de­layed the de­mo­li­tion be­cause civil­ian refugees were still ap­proach­ing… two German ar­mored scout cars charged to­ward the bridge while the fol­low­ing three Panzers opened fire. De Wispelaere im­me­di­ately pushed the elec­tri­cal ig­ni­tion, but there was no ex­plo­sion… Wispelaere now left his shel­ter and worked the man­ual ig­ni­tion de­vice. Trying to get back to his bunker, he was hit by a burst from a German ma­chine gun and fell to the ground, mor­tally wounded. At the same time, the ex­plo­sive charge went off. After the gi­gan­tic smoke cloud had drifted away, only the rem­nants of the pil­lars could be seen.

A few kilo­me­ters south at Houx, the Germans used a por­tion of a pon­toon bridge (Bruckengerat B) rated to carry 16 tons to ferry their 25 ton tanks across.

By noon on May 13, Pioniere com­pleted an eight-ton ferry and crossed twenty anti-tank guns to the west bank, how­ever to main­tain the tempo of his di­vi­sions ad­vance, he needed ar­mor and mo­tor­ized units across the river. Rommel per­son­ally or­dered the ferry con­verted to a heav­ier six­teen-ton vari­ant to fa­cil­i­tate the cross­ing of the light Panzers and ar­mored cars. Simultaneously, the Pioniere be­gan con­struc­tion on a bridge ca­pa­ble of cross­ing the di­vi­sion’s heav­ier Panzers and mo­tor­ized units.

Major Erich Schnee in The German Pionier: Case Study of the Combat Engineer’s Employment During Sustained Ground Combat”

On the evening of the 13th, Lt. Zobel’s tank is cross­ing. Approaching the shore, the ferry lifts, the load shifts, and the tank falls into the river.

The panzer IV of Lieutenant Zabel [sic] of the 31. Panzer Regiment of the 5. Panzer-Division, on May 13, 1940, in Houx, as good as un­der­wa­ter ex­cept for the ve­hi­cle com­man­der’s cupola. Close to the west bank, at the pon­toon cross­ing site and later site of 5. Panzer Division bridge, a 16 tonne ferry (Bruckengerat B) gave way to the ap­proach­ing shore­line, likely due to the ro­tat­ing move­ment of the panzer, which turned right when dis­em­bark­ing (the only pos­si­ble di­rec­tion to quickly leave the Meuse’s shore due to the wall cre­ated by the rail line). The tank would be fished out in 1941 dur­ing the re­con­struc­tion of the bridge.

Sometime later the pho­to­graph was taken of a German pi­o­neer in­fantry­man look­ing at the tank. Later the tank was re­cov­ered and its ul­ti­mate fate is un­known.

Available ev­i­dence sug­gests the sol­dier in the photo is a Pioneer/Tank re­cov­ery crew, hold­ing a Kar98k and wear­ing an EM/NCO’S Drill & Work uni­form, more com­monly known as Drillich”.

His role is proven by the pres­ence of pon­toon fer­ries on the Meuse river, used by the 5th Panzer Division. That is also proven by his uni­form, which, as ev­i­dence sug­gests, was used dur­ing work to pre­vent dam­age to their stan­dard woolen uni­form.

An early ver­sion of the Drillich

While I can’t iden­tify the photo, I can nar­row down the tank. I be­lieve it is a Panzer IV D.

It has the short bar­relled 7.5 cm KwK 37 nar­row­ing it down to a Panzer IV Ausf. A through F1 or a Panzer III N.

Both had very sim­i­lar tur­rets, but the Panzer III N has a wider gun mant­let, a more an­gu­lar shroud, and lacked (or cov­ered) the dis­tinc­tive an­gu­lar view ports (I be­lieve they’re view ports) on ei­ther side of the tur­ret face.

This leaves the Panzer IV. The dis­tinc­tive cupola was added in model B. The ex­ter­nal gun mant­let was added in model D.

Panzer IV model D in France 1940 with the ex­ter­nal gun mant­let and periscope. source

Note the front half of the tur­ret top is smooth. There is a pro­tru­sion to the front left of the cupola (I be­lieve it’s a periscope sight) and an­other cir­cu­lar open­ing to the front right. Finally, note the large ven­ti­la­tion hatch just in front of the cupola.

Model E would elim­i­nate the ven­ti­la­tion hatch and re­place it with a fan. The periscope was re­placed with a hatch for sig­nal flags.

Panzer IV model D en­tered mass pro­duc­tion in October 1939 which means it would be too late for Poland, but could have seen ser­vice in France, Norway, or the Soviet Union.

As for the sol­dier…

The ri­fle has a turned down bolt han­dle, a bay­o­net lug (missing from late ri­fles), a dis­tinc­tive dis­as­sem­bly disc on the side of the stock (also miss­ing from late ri­fles), no front site hood (indicative of an early ri­fle), and you can just about make out ex­tra de­tail in the nose cap (also early). This is likely an early Karabiner 98k which is miss­ing its clean­ing rod. See Forgotten Weapons: Evolution of the Karabiner 98k, From Prewar to Kriegsmodell.

ConeOfArc posted a video The Search for Panzer of the Lake.

He broke down what he could iden­tify about the sol­der, prob­a­bly German.

For the tank he con­firms it’s a Panzer IV D us­ing sim­i­lar cri­te­ria I used and he found two ad­di­tional pho­tos of what ap­pear to be the same tank claim­ing to be from the Western front in 1940.

He then found a Russian source claim­ing it was found in Romania at the on­set of Barbarossa in 1941.

Unfortunately that’s all for now. ConeOfArc has put a bounty of $100 US for de­fin­i­tive proof of the tank’s lo­ca­tion. More de­tail can be had on ConeOfArc’s Discord.

...

Read the original on history.stackexchange.com »

2 482 shares, 23 trendiness

auonsson (@auonsson.bsky.social)

This is a heav­ily in­ter­ac­tive web ap­pli­ca­tion, and JavaScript is re­quired. Simple HTML in­ter­faces are pos­si­ble, but that is not what this is.

Learn more about Bluesky at bsky.so­cial and at­proto.com. Chinese-flagged cargo ship Yi Peng 3 crossed both sub­ma­rine ca­bles C-Lion 1 and BSC at times match­ing when they broke.

She was shad­owed by Danish navy for a while dur­ing night and is now in Danish Straits leav­ing Baltics.

No signs of board­ing. AIS-caveats ap­ply.

...

Read the original on bsky.app »

3 480 shares, 19 trendiness

Delivering SSL/TLS Everywhere

Vital per­sonal and busi­ness in­for­ma­tion flows over the Internet more fre­quently than ever, and we don’t al­ways know when it’s hap­pen­ing. It’s clear at this point that en­crypt­ing is some­thing all of us should be do­ing. Then why don’t we use TLS (the suc­ces­sor to SSL) every­where? Every browser in every de­vice sup­ports it. Every server in every data cen­ter sup­ports it. Why don’t we just flip the switch?

The chal­lenge is server cer­tifi­cates. The an­chor for any TLS-protected com­mu­ni­ca­tion is a pub­lic-key cer­tifi­cate which demon­strates that the server you’re ac­tu­ally talk­ing to is the server you in­tended to talk to. For many server op­er­a­tors, get­ting even a ba­sic server cer­tifi­cate is just too much of a has­sle. The ap­pli­ca­tion process can be con­fus­ing. It usu­ally costs money. It’s tricky to in­stall cor­rectly. It’s a pain to up­date.

Let’s Encrypt is a new free cer­tifi­cate au­thor­ity, built on a foun­da­tion of co­op­er­a­tion and open­ness, that lets every­one be up and run­ning with ba­sic server cer­tifi­cates for their do­mains through a sim­ple one-click process.

Mozilla Corporation, Cisco Systems, Inc., Akamai Technologies, Electronic Frontier Foundation, IdenTrust, Inc., and re­searchers at the University of Michigan are work­ing through the Internet Security Research Group (“ISRG), a California pub­lic ben­e­fit cor­po­ra­tion, to de­liver this much-needed in­fra­struc­ture in Q2 2015. The ISRG wel­comes other or­ga­ni­za­tions ded­i­cated to the same ideal of ubiq­ui­tous, open Internet se­cu­rity.

The key prin­ci­ples be­hind Let’s Encrypt are:

* Free: Anyone who owns a do­main can get a cer­tifi­cate val­i­dated for that do­main at zero cost.

* Automatic: The en­tire en­roll­ment process for cer­tifi­cates oc­curs pain­lessly dur­ing the server’s na­tive in­stal­la­tion or con­fig­u­ra­tion process, while re­newal oc­curs au­to­mat­i­cally in the back­ground.

* Secure: Let’s Encrypt will serve as a plat­form for im­ple­ment­ing mod­ern se­cu­rity tech­niques and best prac­tices.

* Transparent: All records of cer­tifi­cate is­suance and re­vo­ca­tion will be avail­able to any­one who wishes to in­spect them.

* Open: The au­to­mated is­suance and re­newal pro­to­col will be an open stan­dard and as much of the soft­ware as pos­si­ble will be open source.

* Cooperative: Much like the un­der­ly­ing Internet pro­to­cols them­selves, Let’s Encrypt is a joint ef­fort to ben­e­fit the en­tire com­mu­nity, be­yond the con­trol of any one or­ga­ni­za­tion.

If you want to help these or­ga­ni­za­tions in mak­ing TLS Everywhere a re­al­ity, here’s how you can get in­volved:

To learn more about the ISRG and our part­ners, check out our About page.

...

Read the original on letsencrypt.org »

4 472 shares, 31 trendiness

Analytical Anti-Aliasing

Today’s jour­ney is Anti-Aliasing and the des­ti­na­tion is Analytical Anti-Aliasing. Getting rid of ras­ter­i­za­tion jag­gies is an art-form with decades upon decades of maths, cre­ative tech­niques and non-stop in­no­va­tion. With so many years of re­search and de­vel­op­ment, there are many fla­vors.

From the sim­ple but re­source in­ten­sive SSAA, over the­ory dense SMAA, to us­ing ma­chine learn­ing with DLAA. Same goal - vastly dif­fer­ent ap­proaches. We’ll take a look at how they work, be­fore in­tro­duc­ing a new way to look a the prob­lem - the ✨analytical🌟 way. The per­fect Anti-Aliasing ex­ists and is sim­pler than you think.

Having im­ple­mented it mul­ti­ple times over the years, I’ll also share some juicy se­crets I have never read any­where be­fore.

To un­der­stand the Anti-Aliasing al­go­rithms, we will im­ple­ment them along the way! Following WebGL can­vases draw a mov­ing cir­cle. Anti-Aliasing can­not be fully un­der­stood with just im­ages, move­ment is es­sen­tial. The red box has 4x zoom. Rendering is done at na­tive res­o­lu­tion of your de­vice, im­por­tant to judge sharp­ness.

Please pixel-peep to judge sharp­ness and alias­ing closely. Resolution of your screen too high to see alias­ing? Lower the res­o­lu­tion with the fol­low­ing but­tons, which will in­te­ger-scale the ren­der­ing.

Let’s start out sim­ple. Using GLSL Shaders we tell the GPU of your de­vice to draw a cir­cle in the most sim­ple and naive way pos­si­ble, as seen in cir­cle.fs above: If the length() from the mid­dle point is big­ger than 1.0, we dis­card the pixel.

The cir­cle is blocky, es­pe­cially at smaller res­o­lu­tions. More painfully, there is strong pixel crawl­ing”, an ar­ti­fact that’s very ob­vi­ous when there is any kind of move­ment. As the cir­cle moves, rows of pix­els pop in and out of ex­is­tence and the stair steps of the pix­e­la­tion move along the side of the cir­cle like beads of dif­fer­ent speeds.

The low ¼ and ⅛ res­o­lu­tions aren’t just there for ex­treme pixel-peep­ing, but also to rep­re­sent small el­e­ments or ones at large dis­tance in 3D.

At lower res­o­lu­tions these ar­ti­facts come to­gether to de­stroy the cir­cu­lar form. The com­bi­na­tion of slow move­ment and low res­o­lu­tion causes one side’s pix­els to come into ex­is­tence, be­fore the other side’s pix­els dis­ap­pear, caus­ing a wob­ble. Axis-alignment with the pixel grid causes plateaus” of pix­els at every 90° and 45° po­si­tion.

Understanding the GPU code is not nec­es­sary to fol­low this ar­ti­cle, but will help to grasp whats hap­pen­ing when we get to the an­a­lyt­i­cal bits.

4 ver­tices mak­ing up a quad are sent to the GPU in the ver­tex shader cir­cle.vs, where they are re­ceived as at­tribute vec2 vtx. The co­or­di­nates are of a unit quad”, mean­ing the co­or­di­nates look like the fol­low­ing im­age. With one fa­mous ex­cep­tion, all GPUs use tri­an­gles, so the quad is ac­tu­ally made up of two tri­an­gles.

The ver­tices here are given to the frag­ment shader cir­cle.fs via vary­ing vec2 uv. The frag­ment shader is called per frag­ment (here frag­ments are pixel-sized) and the vary­ing is in­ter­po­lated lin­early with per­spec­tive cor­rected, barycen­tric co­or­di­nates, giv­ing us a uv co­or­di­nate per pixel from -1 to +1 with zero at the cen­ter.

By per­form­ing the check if (length(uv) < 1.0) we draw our color for frag­ments in­side the cir­cle and re­ject frag­ments out­side of it. What we are do­ing is known as Alpha test­ing”. Without div­ing too deeply and just to hint at what’s to come, what we have cre­ated with length(uv) is the signed dis­tance field of a point.

Just to clar­ify, the cir­cle is­n’t drawn with geom­e­try”, which would have fi­nite res­o­lu­tion of the shape, de­pend­ing on how many ver­tices we use. It’s drawn by the shader”.

SSAA stands for Super Sampling Anti-Aliasing. Render it big­ger, down­sam­ple to be smaller. The idea is as old as 3D ren­der­ing it­self. In fact, the first movies with CGI all re­lied on this with the most naive of im­ple­men­ta­tions. One ex­am­ple is the 1986 movie Flight of the Navigator”, as cov­ered by Captain Disillusion in the video be­low.

1986 did it, so can we. Implemented in mere sec­onds. Easy, right?

cir­cleSSAA.js draws at twice the res­o­lu­tion to a tex­ture, which frag­ment shader post.fs reads from at stan­dard res­o­lu­tion with GL_LINEAR to per­form SSAA. So we have four in­put pix­els for every one out­put pixel we draw to the screen. But it’s some­what strange: There is def­i­nitely Anti-Aliasing hap­pen­ing, but less than ex­pected.

There should be 4 steps of trans­parency, but we only get two!

Especially at lower res­o­lu­tions, we can see the cir­cle does ac­tu­ally have 4 steps of trans­parency, but mainly at the 45° diagonals” of the cir­cle. A cir­cle has of course no sides, but at the axis-aligned bottom” there are only 2 steps of trans­parency: Fully Opaque and 50% trans­par­ent, the 25% and 75% trans­parency steps are miss­ing.

We aren’t sam­pling against the cir­cle shape at twice the res­o­lu­tion, we are sam­pling against the quan­tized re­sult of the cir­cle shape. Twice the res­o­lu­tion, but dis­crete pix­els nonethe­less. The com­bi­na­tion of pix­e­la­tion and sam­ple place­ment does­n’t hold enough in­for­ma­tion where we need it the most: at the axis-aligned flat parts”.

Four times the mem­ory and four times the cal­cu­la­tion re­quire­ment, but only a half-assed re­sult.

Implementing SSAA prop­erly is a minute craft. Here we are draw­ing to a 2x res­o­lu­tion tex­ture and down-sam­pling it with lin­ear in­ter­po­la­tion. So ac­tu­ally, this im­ple­men­ta­tion needs 5x the amount of VRAM. A proper im­ple­men­ta­tion sam­ples the scene mul­ti­ple times and com­bines the re­sult with­out an in­ter­me­di­ary buffer.

With our im­ple­men­ta­tion, we can’t even do more than 2xSSAA with one tex­ture read, as lin­ear in­ter­po­la­tion hap­pens only with 2x2 sam­ples

To com­bat axis-align­ment ar­ti­facts like with our cir­cle above, we need to place our SSAA sam­ples bet­ter. There are mul­ti­ple ways to do so, all with pros and cons. To im­ple­ment SSAA prop­erly, we need deep in­te­gra­tion with the ren­der­ing pipeline. For 3D prim­i­tives, this hap­pens be­low API or en­gine, in the realm of ven­dors and dri­vers.

In fact, some of the best im­ple­men­ta­tions were dis­cov­ered by ven­dors on ac­ci­dent, like SGSSAA. There are also ways in which SSAA can make your scene look worse. Depending on im­ple­men­ta­tion, SSAA messes with mip-map cal­cu­la­tions. As a re­sult the mip-map lod-bias may need ad­just­ment, as ex­plained in the ar­ti­cle above.

WebXR UI pack­age three-mesh-ui , a pack­age ma­ture enough to be used by Meta , uses shader-based ro­tated grid su­per sam­pling to achieve sharp text ren­der­ing in VR, as seen in the code

MSAA is su­per sam­pling, but only at the sil­hou­ette of mod­els, over­lap­ping geom­e­try, and tex­ture edges if Alpha to Coverage” is en­abled. MSAA is im­ple­mented by the graph­ics card in-hard­ware by the graph­ics ven­dors and what is sup­ported de­pends on hard­ware. In the se­lect box be­low you can choose dif­fer­ent MSAA lev­els for our cir­cle.

There is up to MSAA x64, but what is avail­able is im­ple­men­ta­tion de­fined. WebGL 1 has no sup­port, which is why the next can­vas ini­tial­izes a WebGL 2 con­text. In WebGL, NVIDIA lim­its MSAA to 8x on Windows, even if more is sup­ported, whilst on Linux no such limit is in place. On smart­phones you will only get ex­actly 4x, as dis­cussed be­low.

What is edge smooth­ing and how does MSAA even know what to sam­ple against? For now we skip the shader code and im­ple­men­ta­tion. First let’s take a look at MSAAs pros and cons in gen­eral.

We rely on hard­ware to do the Anti-Aliasing, which ob­vi­ously leads to the prob­lem that user hard­ware may not sup­port what we need. The sam­pling pat­terns MSAA uses may also do things we don’t ex­pect. Depending on what your hard­ware does, you may see the cir­cle’s edge trans­parency steps ap­pear­ing in the wrong or­der”.

When MSAA be­came re­quired with OpenGL 3 & DirectX 10 era of hard­ware, sup­port was es­pe­cially hit & miss. Even lat­est Intel GMA iG­PUs ex­pose the OpenGL ex­ten­sion EXT_framebuffer_multisample, but don’t in-fact sup­port MSAA, which led to con­fu­sion. But also in more re­cent smart­phones, sup­port just was­n’t that clear-cut.

Mobile chips sup­port ex­actly MSAAx4 and things are weird. Android will let you pick 2x, but the dri­ver will force 4x any­ways. iPhones & iPads do some­thing rather stu­pid: Choosing 2x will make it 4x, but trans­parency will be rounded to near­est 50% mul­ti­ple, lead­ing to dou­ble edges in our ex­am­ple. There is hard­ware spe­cific rea­son:

Looking at mod­ern video games, one might be­lieve that MSAA is of the past. It usu­ally brings a hefty per­for­mance penalty af­ter all. Surprisingly, it’s still the king un­der cer­tain cir­cum­stances and in very spe­cific sit­u­a­tions, even per­for­mance free.

As a gamer, this goes against in­stinct…

Rahul Prasad: Use MSAA […] It’s ac­tu­ally not as ex­pen­sive on mo­bile as it is on desk­top, it’s one of the nice things you get on mo­bile. […] On some (mobile) GPUs 4x (MSAA) is free, so use it when you have it.

As ex­plained by Rahul Prasad in the above talk, in VR 4xMSAA is a must and may come free on cer­tain mo­bile GPUs. The spe­cific rea­son would de­rail the blog post, but in case you want to go down that par­tic­u­lar rab­bit hole, here is Epic Games’ Niklas Smedberg giv­ing a run-down.

In short, this is pos­si­ble un­der the con­di­tion of for­ward ren­der­ing with geom­e­try that is not too dense and the GPU hav­ing tiled-based ren­der­ing ar­chi­tec­ture, which al­lows the GPU to per­form MSAA cal­cu­la­tions with­out heavy mem­ory ac­cess and thus la­tency hid­ing the cost of the cal­cu­la­tion. Here’s deep dive, if you are in­ter­ested.

MSAA gives you ac­cess to the sam­ples, mak­ing cus­tom MSAA fil­ter­ing curves a pos­si­bil­ity. It also al­lows you to merge both stan­dard mesh-based and signed-dis­tance-field ren­der­ing via al­pha to cov­er­age. This com­plex fea­tures set made pos­si­ble the most out-of-the-box think­ing I ever wit­nessed in graph­ics pro­gram­ming:

Assassin’s Creed Unity used MSAA to ren­der at half res­o­lu­tion and re­con­struct only some buffers to full-res from MSAA sam­ples, as de­scribed on page 48 of the talk GPU-Driven Rendering Pipelines” by Ulrich Haar and Sebastian Aaltonen. Kinda like vari­able rate shad­ing, but im­ple­mented with duct-tape and with­out ven­dor sup­port.

The brain-melt­ing lengths to which graph­ics pro­gram­mers go to uti­lize hard­ware ac­cel­er­a­tion to the last drop has me some­times in awe.

In 2009 a pa­per by Alexander Reshetov struck the graph­ics pro­gram­ming world like a ton of bricks: take the blocky, aliased re­sult of the ren­dered im­age, find edges and clas­sify the pix­els into tetris-like shapes with per-shape fil­ter­ing rules and re­move the blocky edge. Anti-Aliasing based on the mor­phol­ogy of pix­els - MLAA was born.

Computationally cheap, easy to im­ple­ment. Later it was re­fined with more em­pha­sis on re­mov­ing sub-pixel ar­ti­facts to be­come SMAA. It be­came a fan fa­vorite, with an in­jec­tor be­ing de­vel­oped early on to put SMAA into games that did­n’t sup­port it. Some con­sid­ered these too blurry, the say­ing vaseline on the screen” was coined.

It was the fu­ture, a sign of things to come. No more shaky hard­ware sup­port. Like Fixed-Function pipelines died in fa­vor of pro­gram­ma­ble shaders Anti-Aliasing too be­came shader based”.

We’ll take a close look at an al­go­rithm that was in­spired by MLAA, de­vel­oped by Timothy Lottes. Fast ap­prox­i­mate anti-alias­ing”, FXAA. In fact, when it came into wide cir­cu­la­tion, it re­ceived some in­cred­i­ble press. Among oth­ers, Jeff Atwood pulled nei­ther bold fonts nor punches in his 2011 blog post, later re­pub­lished by Kotaku.

Jeff Atwood: The FXAA method is so good, in fact, it makes all other forms of full-screen anti-alias­ing pretty much ob­so­lete overnight. If you have an FXAA op­tion in your game, you should en­able it im­me­di­ately and ig­nore any other AA op­tions.

Let’s see what the hype was about. The fi­nal ver­sion pub­licly re­leased was FXAA 3.11 on August 12th 2011 and the fol­low­ing demos are based on this. First, let’s take a look at our cir­cle with FXAA do­ing the Anti-Aliasing at de­fault set­tings.

A bit of a weird re­sult. It looks good if the cir­cle would­n’t move. Perfectly smooth edges. But the cir­cle dis­torts as it moves. The axis-aligned top and bot­tom es­pe­cially have a lit­tle nub that ap­pears and dis­ap­pears. And switch­ing to lower res­o­lu­tions, the cir­cle even loses its round shape, wob­bling like Play Station 1 graph­ics.

Per-pixel, FXAA con­sid­ers only the 3x3 neigh­bor­hood, so it can’t pos­si­bly know that this area is part of a big shape. But it also does­n’t just blur edges”, as of­ten said. As ex­plained in the of­fi­cial whitepa­per, it finds the edge’s di­rec­tion and shifts the pix­el’s co­or­di­nates to let the per­for­mance free lin­ear in­ter­po­la­tion do the blend­ing.

For our demo here, wrong tool for the job. Really, we did­n’t do FXAA jus­tice with our ex­am­ple. FXAA was cre­ated for an­other use case and has many set­tings and pre­sets. It was cre­ated to anti-alias more com­plex scenes. Let’s give it a fair shot!

A scene from my fa­vorite piece of soft­ware in ex­is­tence: NeoTokyo°. I cre­ated a bright area light in an NT° map and moved a bench to cre­ate an area of strong alias­ing. The fol­low­ing demo uses the aliased out­put from NeoTokyo°, cal­cu­lates the re­quired lu­mi­nance chan­nel and ap­plies FXAA. All FXAA pre­sets and set­tings at your fin­ger tips.

This has fixed res­o­lu­tion and will only be at you de­vice’s na­tive res­o­lu­tion, if your de­vice has no dpi scal­ing and the browser is at 100% zoom.

Just look­ing at the full FXAA 3.11 source, you can see the pas­sion in every line. Portable across OpenGL and DirectX, a PC ver­sion, a XBOX 360 ver­sion, two finely op­ti­mized PS3 ver­sion fight­ing for every GPU cy­cle, in­clud­ing shader dis­as­sam­bly. Such level of pro­fes­sion­al­ism and ded­i­ca­tion, shared with the world in plain text.

The shar­ing and open­ness is why I’m in love with graph­ics pro­gram­ming.

It may be per­for­mance cheap, but only if you al­ready have post-pro­cess­ing in place or do de­ferred shad­ing. Especially in mo­bile graph­ics, mem­ory ac­cess is ex­pen­sive, so sav­ing the frame­buffer to per­form post pro­cess­ing is not al­ways a given. If you need to setup ren­der-to-tex­ture in or­der to have FXAA, then the F” in FXAA evap­o­rates.

In this ar­ti­cle we won’t jump into mod­ern tem­po­ral anti-alias­ing, but be­fore FXAA was even de­vel­oped, TAA was al­ready ex­per­i­mented with. In fact, FXAA was sup­posed to get a new ver­sion 4 and in­cor­po­rate tem­po­ral anti alias­ing in ad­di­tion to the stan­dard spa­tial one, but in­stead it evolved fur­ther and re­branded into TXAA.

Now we get to the good stuff. Analytical Anti-Aliasing ap­proaches the prob­lem back­wards - it knows the shape you need and draws the pixel al­ready Anti-Aliased to the screen. Whilst draw­ing the 2D or 3D shape you need, it fades the shape’s bor­der by ex­actly one pixel.

Always smooth with­out ar­ti­facts and you can ad­just the amount of fil­ter­ing. Preserves shape even at low res­o­lu­tions. No ex­tra buffers or ex­tra hard­ware re­quire­ments.

Even runs on ba­sic WebGL 1.0 or OpenGLES 2.0, with­out any ex­ten­sions.

With the above but­tons, you can set the smooth­ing to be equal to one pixel. This gives a sharp re­sult, but comes with the caveat that axis-aligned 90° sides may still be per­se­ved as flat” in spe­cific com­bi­na­tions of screen res­o­lu­tion, size and cir­cle po­si­tion.

Filtering based on the di­ag­o­nal pixel size of √2 px = 1.4142…, en­sures the tip” of the cir­cle in axis-aligned pixel rows and columns is al­ways non-opaque. This re­moves the per­cep­tion of flat­ness, but makes it shape ever so slightly more blurry.

Or in other words: as soon as the bor­der has an opaque pixel, there is al­ready a trans­par­ent pixel in front” of it.

This style of Anti-Aliasing is usu­ally im­ple­mented with 3 in­gre­di­ents:

But if you look at the code box above, you will find cir­cle-an­a­lyt­i­cal.fs hav­ing none of those. And this is the se­cret sauce we will look at. Before we dive into the im­ple­men­ta­tion, let’s clear the ele­phants in the room…

In graph­ics pro­gram­ming, Analytical refers to ef­fects cre­ated by know­ing the make-up of the in­tended shape be­fore­hand and per­form­ing cal­cu­la­tions against the rigid math­e­mat­i­cal de­f­i­n­i­tion of said shape. This term is used very loosely across com­puter graph­ics, sim­i­lar to su­per sam­pling re­fer­ring to mul­ti­ple things, de­pend­ing on con­text.

Very soft soft-shad­ows which in­clude con­tact-hard­en­ing, im­ple­mented by al­go­rithms like per­cent­age-closer soft shad­ows are very com­pu­ta­tion­ally in­tense and re­quire both high res­o­lu­tion shadow maps and/​or very ag­gres­sive fil­ter­ing to not pro­duce shim­mer­ing dur­ing move­ment.

This is why Naughty Dog’s The Last of Us re­lied on get­ting soft-shad­ows on the main char­ac­ter by cal­cu­lat­ing the shadow from a rigidly de­fined for­mula of a stretched sphere, mul­ti­ple of which were arranged in the shape of the main char­ac­ter, shown in red. An im­proved im­ple­men­ta­tion with shader code can be seen in this Shadertoy demo by ro­main­guy, with the more mod­ern cap­sule, as op­posed a stretched sphere.

This is now an in­te­gral part of mod­ern game en­gines, like Unreal. As op­posed to stan­dard shadow map­ping, we don’t ren­der the scene from the per­spec­tive of the light with fi­nite res­o­lu­tion. We eval­u­ate the shadow per-pixel against the math­e­mat­i­cal equa­tion of the stretched sphere or cap­sule. This makes cap­sule shad­ows an­a­lyt­i­cal.

Staying with the Last of Us, The Last of Us Part II uses the same logic for blurry real-time re­flec­tions of the main char­ac­ter, where Screen Space Reflections aren’t de­fined. Other op­tions like ray­trac­ing against the scene, or us­ing a real-time cube­map like in GTA V are ei­ther noisy and low res­o­lu­tion or high res­o­lu­tion, but low per­for­mance.

Here the re­flec­tion cal­cu­la­tion is part of the ma­te­r­ial shader, ren­der­ing against the rigidly de­fined math­e­mat­i­cal shape of the cap­sule per-pixel, mul­ti­ple of which are arranged in the shape of the main char­ac­ter. This makes cap­sule re­flec­tions an­a­lyt­i­cal.

An on­line demo with is worth at least a mil­lion…

…yeah the joke is get­ting old.

Ambient Occlusion is es­sen­tial in mod­ern ren­der­ing, bring­ing con­tact shad­ows and ap­prox­i­mat­ing global il­lu­mi­na­tion. Another topic as deep as the ocean, with so many im­ple­men­ta­tions. Usually im­ple­mented by some form of raytrace a bunch of rays and blur the re­sult”.

In this Shadertoy demo, the floor is eval­u­ated per-pixel against the rigidly de­fined math­e­mat­i­cal de­scrip­tion of the sphere to get a soft, non-noisy, non-flick­er­ing oc­clu­sion con­tri­bu­tion from the hov­er­ing ball. This im­ple­men­ta­tion is an­a­lyt­i­cal. Not just spheres, there are an­a­lyt­i­cal ap­proaches also for com­plex geom­e­try.

By ex­ten­sion, Unreal Engine has dis­tance field ap­proaches for Soft Shadows and Ambient Occlusion, though one may ar­gue, that this type of signed dis­tance field ren­der­ing does­n’t fit the de­scrip­tion of an­a­lyt­i­cal, con­sid­er­ing the dis­tance field is pre­cal­cu­lated into a 3D tex­ture.

Let’s dive into the sauce. We work with signed dis­tance fields, where for every point that we sam­ple, we know the dis­tance to the de­sired shape. This in­for­ma­tion may be baked into a tex­ture as done for SDF text ren­der­ing or maybe be de­rived per-pixel from a math­e­mat­i­cal for­mula for sim­pler shapes like bezier curves or hearts.

Based on that dis­tance we fade out the bor­der of the shape. If we fade by the size of one pixel, we get per­fectly smooth edges, with­out any strange side ef­fects. The se­cret sauce is in the im­ple­men­ta­tion and un­der the sauce is where the magic is. How does the shader know the size of pixel? How do we blend based on dis­tance?

This ap­proach gives mo­tion-sta­ble pixel-per­fec­tion, but does­n’t work with tra­di­tional ras­ter­i­za­tion. The full shape re­quires a signed dis­tance field.

Specifically, by how much do we fade the bor­der? If we hard­code a sta­tic value, eg. fade at 95% of the cir­cle’s ra­dius, we may get a pleas­ing re­sult for that cir­cle size at that screen res­o­lu­tion, but too much smooth­ing when the cir­cle is big­ger or closer to the cam­era and alias­ing if the cir­cle be­comes small.

We need to know the size of a pixel. This is in part what Screen Space de­riv­a­tives were cre­ated for. Shader func­tions like dFdx, dFdy and fwidth al­low you to get the size of a screen pixel rel­a­tive to some vec­tor. In the above cir­cle-an­a­lyt­i­cal­Com­pare.fs we de­ter­mine by how much the dis­tance changes via two meth­ods:

pix­el­Size = fwidth(dist);

/* or */

pix­el­Size = length(vec2(dFdx(dist), dFdy(dist)));

Relying on Screen Space de­riv­a­tives has the ben­e­fit, that we get the pixel size de­liv­ered to us by the graph­ics pipeline. It prop­erly re­spects any trans­for­ma­tions we might throw at it, in­clud­ing 3D per­spec­tive.

The down side is that it is not sup­ported by the WebGL 1 stan­dard and has to be pulled in via the ex­ten­sion GL_OES_standard_derivatives or re­quires the jump to WebGL 2.

Luckily I have never wit­nessed any de­vice that sup­ported WebGL 1, but not the Screen Space de­riv­a­tives. Even the GMA based Thinkpad X200 & T500 I hard­ware mod­ded do.

Generally, there are some nasty pit­falls when us­ing Screen Space de­riv­a­tives: how the cal­cu­la­tion hap­pens is up to the im­ple­men­ta­tion. This led to the split into dFdxFine() and dFdx­Coarse() in later OpenGL re­vi­sions. The de­fault case can be set via GL_FRAGMENT_SHADER_DERIVATIVE_HINT, but the stan­dard hates you:

OpenGL Docs: The im­ple­men­ta­tion may choose which cal­cu­la­tion to per­form based upon fac­tors such as per­for­mance or the value of the API GL_FRAGMENT_SHADER_DERIVATIVE_HINT hint.

Why do we have stan­dards again? As a graph­ics pro­gram­mer, any­thing with hint has me trau­ma­tized.

Luckily, nei­ther case con­cerns us, as the dif­fer­ence does­n’t show it­self in the con­text of Anti-Aliasing. Performance tech­ni­cally dFdx and dFdy are free (or rather, their cost is al­ready part of the ren­der­ing pipeline), though the pixel size cal­cu­la­tion us­ing length() or fwidth() is not. It is per­formed per-pixel.

This is why there ex­ist two ways of do­ing this: get­ting the length() of the vec­tor that dFdx and dFdy make up, a step in­volv­ing the his­tor­i­cally per­for­mance ex­pen­sive sqrt() func­tion or us­ing fwidth(), which is the ap­prox­i­ma­tion abs(dFdx()) + abs(dFdy()) of the above.

It de­pends on con­text, but on semi-mod­ern hard­ware a call to length() should be per­for­mance triv­ial though, even per-pixel.

To show­case the dif­fer­ence, the above Radius ad­just slider works off of the Pixel size method and ad­justs the SDF dis­tance. If you go with fwidth() and a strong ra­dius shrink, you’ll see some­thing weird.

The di­ag­o­nals shrink more than they should, as the ap­prox­i­ma­tion us­ing ad­di­tion scales too much di­ag­o­nally. We’ll talk about pro­fes­sional im­ple­men­ta­tions fur­ther be­low in a mo­ment, but us­ing fwidth() for AAA is what Unity ex­ten­sion Shapes” by Freya Holmér calls Fast Local Anti-Aliasing” with the fol­low­ing text:

Fast LAA has a slight bias in the di­ag­o­nal di­rec­tions, mak­ing cir­cu­lar shapes ap­pear ever so slightly rhom­bous and have a slightly sharper cur­va­ture in the or­thog­o­nal di­rec­tions, es­pe­cially when small. Sometimes the edges in the di­ag­o­nals are slightly fuzzy as well.

This ef­fects our fad­ing, which will fade more on di­ag­o­nals. Luckily, we fade by the amount of one pixel and thus the dif­fer­ence is re­ally only vis­i­ble when flick­ing be­tween the meth­ods. What to choose de­pends on what you care more about: Performance or Accuracy? But what if I told you can have your cake and eat it too…

…Calculate it your­self! For the 2D case, this is triv­ial and eas­ily ab­stracted away. We know the size our con­text is ren­der­ing at and how big our quad is that we draw on. Calculating the size of the pixel is thus done per-ob­ject, not per-pixel. This is what hap­pens in the above cir­cle­An­a­lyt­i­cal­Com­par­i­son.js.

/* Calculate pixel size based on height.

Simple case: Assumes Square pix­els and a square quad. */

gl.uni­for­m1f(pix­el­Size­Cir­cle, (2.0 / (canvas.height / res­Div)));

No WebGL 2, no ex­ten­sions, works on an­cient hard­ware.

...

Read the original on blog.frost.kiwi »

5 385 shares, 41 trendiness

Undergraduates with family income below $200,000 can expect to attend MIT tuition-free starting in 2025

Undergraduates with fam­ily in­come be­low $200,000 can ex­pect to at­tend MIT tu­ition-free start­ing next fall, thanks to newly ex­panded fi­nan­cial aid. Eighty per­cent of American house­holds meet this in­come thresh­old.

And for the 50 per­cent of American fam­i­lies with in­come be­low $100,000, par­ents can ex­pect to pay noth­ing at all to­ward the full cost of their stu­dents’ MIT ed­u­ca­tion, which in­cludes tu­ition as well as hous­ing, din­ing, fees, and an al­lowance for books and per­sonal ex­penses.

This $100,000 thresh­old is up from $75,000 this year, while next year’s $200,000 thresh­old for tu­ition-free at­ten­dance will in­crease from its cur­rent level of $140,000.

These new steps to en­hance MITs af­ford­abil­ity for stu­dents and fam­i­lies are the lat­est in a long his­tory of ef­forts by the Institute to free up more re­sources to make an MIT ed­u­ca­tion as af­ford­able and ac­ces­si­ble as pos­si­ble. Toward that end, MIT has ear­marked $167.3 mil­lion in need-based fi­nan­cial aid this year for un­der­grad­u­ate stu­dents — up some 70 per­cent from a decade ago.

MITs dis­tinc­tive model of ed­u­ca­tion — in­tense, de­mand­ing, and rooted in sci­ence and en­gi­neer­ing — has pro­found prac­ti­cal value to our stu­dents and to so­ci­ety,” MIT President Sally Kornbluth says. As the Wall Street Journal recently re­ported, MIT is bet­ter at im­prov­ing the fi­nan­cial fu­tures of its grad­u­ates than any other U. S. col­lege, and the Institute also ranks num­ber one in the world for the em­ploy­a­bil­ity of its grad­u­ates.”

The cost of col­lege is a real con­cern for fam­i­lies across the board,” Kornbluth adds, and we’re de­ter­mined to make this trans­for­ma­tive ed­u­ca­tional ex­pe­ri­ence avail­able to the most tal­ented stu­dents, what­ever their fi­nan­cial cir­cum­stances. So, to every stu­dent out there who dreams of com­ing to MIT: Don’t let con­cerns about cost stand in your way.”

MIT is one of only nine col­leges in the US that does not con­sider ap­pli­cants’ abil­ity to pay as part of its ad­mis­sions process and that meets the full demon­strated fi­nan­cial need ⁠for all un­der­grad­u­ates. MIT does not ex­pect stu­dents on aid to take loans, and, un­like many other in­sti­tu­tions, MIT does not pro­vide an ad­mis­sions ad­van­tage to the chil­dren of alumni or donors. Indeed, 18 per­cent of cur­rent MIT un­der­grad­u­ates are first-gen­er­a­tion col­lege stu­dents.

We be­lieve MIT should be the pre­em­i­nent des­ti­na­tion for the most tal­ented stu­dents in the coun­try in­ter­ested in an ed­u­ca­tion cen­tered on sci­ence and tech­nol­ogy, and ac­ces­si­ble to the best stu­dents re­gard­less of their fi­nan­cial cir­cum­stances,” says Stu Schmill, MITs dean of ad­mis­sions and stu­dent fi­nan­cial ser­vices.

With the need-based fi­nan­cial aid we pro­vide to­day, our ed­u­ca­tion is much more af­ford­able now than at any point in the past,” adds Schmill, who grad­u­ated from MIT in 1986, even though the sticker price’ of MIT is higher now than it was when I was an un­der­grad­u­ate.”

Last year, the me­dian an­nual cost paid by an MIT un­der­grad­u­ate re­ceiv­ing fi­nan­cial aid was $12,938⁠, al­low­ing 87 per­cent of stu­dents in the Class of 2024 to grad­u­ate debt-free. Those who did bor­row grad­u­ated with me­dian debt of $14,844. At the same time, grad­u­ates ben­e­fit from the life­long value of an MIT de­gree, with an av­er­age start­ing salary of $126,438 for grad­u­ates en­ter­ing in­dus­try, ac­cord­ing to MITs most re­cent sur­vey of its grad­u­at­ing stu­dents.

MITs en­dow­ment — made up of gen­er­ous gifts made by in­di­vid­ual alumni and friends — al­lows the Institute to pro­vide this level of fi­nan­cial aid, both now and into the fu­ture.

Today’s an­nounce­ment is a pow­er­ful ex­pres­sion of how much our grad­u­ates value their MIT ex­pe­ri­ence,” Kornbluth says, because our abil­ity to pro­vide fi­nan­cial aid of this scope de­pends on decades of in­di­vid­ual do­na­tions to our en­dow­ment, from gen­er­a­tions of MIT alumni and other friends. In ef­fect, our en­dow­ment is an in­ter-gen­er­a­tional gift from past MIT stu­dents to the stu­dents of to­day and to­mor­row.”

What MIT fam­i­lies can ex­pect in 2025

As noted ear­lier: Starting next fall, for fam­i­lies with in­come be­low $100,000, with typ­i­cal as­sets, par­ents can ex­pect to pay noth­ing for the full cost of at­ten­dance, which in­cludes tu­ition, hous­ing, din­ing, fees, and al­lowances for books and per­sonal ex­penses.

For fam­i­lies with in­come from $100,000 to $200,000, with typ­i­cal as­sets, par­ents can ex­pect to pay on a slid­ing scale from $0 up to a max­i­mum of around $23,970, which is this year’s to­tal cost for MIT hous­ing, din­ing, fees, and al­lowances for books and per­sonal ex­penses.

Put an­other way, next year all MIT fam­i­lies with in­come be­low $200,000 can ex­pect to con­tribute well be­low $27,146, which is the an­nual av­er­age cost for in-state stu­dents to at­tend and live on cam­pus at pub­lic uni­ver­si­ties in the US, ac­cord­ing to the Ed­u­ca­tion Data Initiative. And even among fam­i­lies with in­come above $200,000, many still re­ceive need-based fi­nan­cial aid from MIT, based on their unique fi­nan­cial cir­cum­stances. Families can use MITs on­line cal­cu­la­tors to es­ti­mate the cost of at­ten­dance for their spe­cific fam­ily.

...

Read the original on news.mit.edu »

6 362 shares, 20 trendiness

Pandas but 100x faster

My main back­ground is a hedge fund pro­fes­sional, so I deal with fi­nance data all the time and so far the Pandas li­brary has been an in­dis­pens­able tool in my work­flow and my most used Python li­brary.

Then came along Polars (written in Rust, btw!) which shook the ground of Python ecosys­tem due to its speed and ef­fi­ciency, you can check some of Polars bench­mark here.

I have around +/- 30 thou­sand lines of Pandas code, so you can un­der­stand why I’ve been hes­i­tant to rewrite them to Polars, de­spite my en­thu­si­asm for speed and op­ti­miza­tion. The sheer scale of the task has led to re­peated de­lays, as I weigh the po­ten­tial ben­e­fits of a faster and more ef­fi­cient li­brary against the sig­nif­i­cant ef­fort re­quired to refac­tor my ex­ist­ing code.

There has al­ways been this thought in the back of my mind:

Pandas is writ­ten in C and Cython, which means the main en­gine is King C…there got to be a way to op­ti­mize Pandas and lever­age the C en­gine!

Here comes FireDucks, the an­swer to my prayer: . It was launched on October 2023 by a team of pro­gram­mers from NEC Corporation which have 30+ years of ex­pe­ri­ence de­vel­op­ing su­per­com­put­ers, read the an­nounce­ment here.

Quick check the bench­mark page here! I’ll let the num­bers speak by them­selves.

* This is the cra­zi­est bench, FireDucks even beat DuckDB! Also check Pandas & Polars ranks.

* It’s even faster than Polars!

Alrighty those bench num­bers from FireDucks looks amaz­ing, but a good rule of thumb is never take num­bers for granted…don’t trust, ver­ify! Hence I’m mak­ing my own set of bench­marks on my ma­chine.

Yes the last two bench­mark num­bers are 130x and 200x faster than Pandas…are you not amused with these per­for­mance im­pact?! So yeah, the ti­tle of this post is not a click­bait, it’s real. Another key point I need to high­light, the most im­por­tant one:

you can just plug FireDucks into your ex­ist­ing Pandas code and ex­pect mas­sive speed im­prove­ments..im­pres­sive in­deed!

I’m lost for words..frankly! What else would Pandas users want?

A note for those group of peo­ple bash­ing Python for be­ing slow…yes pure Python is su­per slow I agree. But it has been proven time and again it can be op­ti­mized and once it’s been prop­erly op­ti­mized (FireDucks, Codon, Cython, etc) it can be speedy as well since Python back­end uses C en­gine!

Be smart folks! Noone sane would use pure Python” for se­ri­ous work­load…lever­age the vast ecosys­tem!

...

Read the original on hwisnu.bearblog.dev »

7 280 shares, 12 trendiness

Understanding the BM25 full text search algorithm

BM25, or Best Match 25, is a widely used al­go­rithm for full text search. It is the de­fault in Lucene/Elasticsearch and SQLite, among oth­ers. Recently, it has be­come com­mon to com­bine full text search and vec­tor sim­i­lar­ity search into hybrid search”. I wanted to un­der­stand how full text search works, and specif­i­cally BM25, so here is my at­tempt at un­der­stand­ing by re-ex­plain­ing.

For a quick bit of con­text on why I’m think­ing about search al­go­rithms, I’m build­ing a per­son­al­ized con­tent feed that scours noisy sources for con­tent re­lated to your in­ter­ests. I started off us­ing vec­tor sim­i­lar­ity search and wanted to also in­clude full-text search to im­prove the han­dling of ex­act key­words (for ex­am­ple, a friend has Solid.js” as an in­ter­est and us­ing vec­tor sim­i­lar­ity search alone, that turns up more con­tent re­lated to React than Solid).

The ques­tion that mo­ti­vated this deep dive into BM25 was: can I com­pare the BM25 scores of doc­u­ments across mul­ti­ple queries to de­ter­mine which query the doc­u­ment best matches?

Initially, both ChatGPT and Claude told me no — though an­noy­ingly, af­ter do­ing this deep dive and for­mu­lat­ing a more pre­cise ques­tion, they both said yes 🤦‍♂️. Anyway, let’s get into the de­tails of BM25 and then I’ll share my con­clu­sions about this ques­tion.

At the most ba­sic level, the goal of a full text search al­go­rithm is to take a query and find the most rel­e­vant doc­u­ments from a set of pos­si­bil­i­ties.

However, we don’t re­ally know which doc­u­ments are relevant”, so the best we can do is guess. Specifically, we can rank doc­u­ments based on the prob­a­bil­ity that they are rel­e­vant to the query. (This is called The Probability Ranking Principle.)

How do we cal­cu­late the prob­a­bil­ity that a doc­u­ment is rel­e­vant?

For full text or lex­i­cal search, we are only go­ing to use qual­i­ties of the search query and each of the doc­u­ments in our col­lec­tion. (In con­trast, vec­tor sim­i­lar­ity search might use an em­bed­ding model trained on an ex­ter­nal cor­pus of text to rep­re­sent the mean­ing or se­man­tics of the query and doc­u­ment.)

BM25 uses a cou­ple of dif­fer­ent com­po­nents of the query and the set of doc­u­ments:

* Query terms: if a search query is made up of mul­ti­ple terms, BM25 will cal­cu­late a sep­a­rate score for each term and then sum them up.

* Inverse Document Frequency (IDF): how rare is a given search term across the en­tire doc­u­ment col­lec­tion? We as­sume that com­mon words (such as the” or and”) are less in­for­ma­tive than rare words. Therefore, we want to boost the im­por­tance of rare words.

* Term fre­quency in the doc­u­ment: how many times does a search term ap­pear in a given doc­u­ment? We as­sume that more rep­e­ti­tion of a query term in a given doc­u­ment in­creases the like­li­hood that that doc­u­ment is re­lated to the term. However, BM25 also ad­justs this so that there are di­min­ish­ing re­turns each time a term is re­peated.

* Document length: how long is the given doc­u­ment com­pared to oth­ers? Long doc­u­ments might re­peat the search term more, just by virtue of be­ing longer. We don’t want to un­fairly boost long doc­u­ments, so BM25 ap­plies some nor­mal­iza­tion based on how the doc­u­men­t’s length com­pares to the av­er­age.

These four com­po­nents are what make up BM25. Now, let’s look at ex­actly how they’re used.

The BM25 al­go­rithm might look scary to non-math­e­mati­cians (my eyes glazed over the first time I saw it), but I promise, it’s not too hard to un­der­stand!

Here is the full equa­tion:

Now, let’s go through it piece-by-piece.

* is the full query, po­ten­tially com­posed of mul­ti­ple query terms

* is the num­ber of query terms

* is each of the query terms

This part of the equa­tion says: given a doc­u­ment and a query, sum up the scores for each of the query terms.

Now, let’s dig into how we cal­cu­late the score for each of the query terms.

The first com­po­nent of the score cal­cu­lates how rare the query term is within the whole col­lec­tion of doc­u­ments us­ing the Inverse Document Frequency (IDF).

The key el­e­ments to fo­cus on in this equa­tion are:

* is the to­tal num­ber of doc­u­ments in our col­lec­tion

* is the num­ber of doc­u­ments that con­tain the query term

* there­fore is the num­ber of doc­u­ments that do not con­tain the query term

In sim­ple lan­guage, this part boils down to the fol­low­ing: com­mon terms will ap­pear in many doc­u­ments. If the term ap­pears in many doc­u­ments, we will have a small num­ber (, or the num­ber of doc­u­ments that do not have the term) di­vided by . As a re­sult, com­mon terms will have a small ef­fect on the score.

In con­trast, rare terms will ap­pear in few doc­u­ments so will be small and will be large. Therefore, rare terms will have a greater im­pact on the score.

The con­stants and are there to smooth out the equa­tion and en­sure that we don’t end up with wildly vary­ing re­sults if the term is ei­ther very rare or very com­mon.

In the pre­vi­ous step, we looked at how rare the term is across the whole set of doc­u­ments. Now, let’s look at how fre­quent the given query is in the given doc­u­ment.

The terms in this equa­tion are:

* is the fre­quency of the given query in the given doc­u­ment

* is a tun­ing pa­ra­me­ter that is gen­er­ally set be­tween and

This equa­tion takes the term fre­quency within the doc­u­ment into ef­fect, but en­sures that term rep­e­ti­tion has di­min­ish­ing re­turns. The in­tu­ition here is that, at some point, the doc­u­ment is prob­a­bly re­lated to the query term and we don’t want an in­fi­nite amount of rep­e­ti­tion to be weighted too heav­ily in the score.

The pa­ra­me­ter con­trols how quickly the re­turns to term rep­e­ti­tion di­min­ish. You can see how the slope changes based on this set­ting:

The last thing we need is to com­pare the length of the given doc­u­ment to the lengths of the other doc­u­ments in the col­lec­tion.

From right to left this time, the pa­ra­me­ters are:

* is the length of the given doc­u­ment

* is the av­er­age doc­u­ment length in our col­lec­tion

* is an­other tun­ing pa­ra­me­ter that con­trols how much we nor­mal­ize by the doc­u­ment length

Long doc­u­ments are likely to con­tain the search term more fre­quently, just by virtue of be­ing longer. Since we don’t want to un­fairly boost long doc­u­ments, this whole term is go­ing to go in the de­nom­i­na­tor of our fi­nal equa­tion. That is, a doc­u­ment that is longer than av­er­age () will be pe­nal­ized by this ad­just­ment.

can be ad­justed by the user. Setting turns off doc­u­ment length nor­mal­iza­tion, while set­ting ap­plies it fully. It is nor­mally set to .

If we take all of the com­po­nents we’ve just dis­cussed and put them to­gether, we ar­rive back at the full BM25 equa­tion:

Reading from left to right, you can see that we are sum­ming up the scores for each query term. For each, we are tak­ing the Inverse Document Frequency, mul­ti­ply­ing it by the term fre­quency in the doc­u­ment (with di­min­ish­ing re­turns), and then nor­mal­iz­ing by the doc­u­ment length.

We’ve just gone through the com­po­nents of the BM25 equa­tion, but I think it’s worth paus­ing to em­pha­size two of its most in­ge­nious as­pects.

As men­tioned ear­lier, BM25 is based on an idea called the Probability Ranking Principle. In short, it says:

If re­trieved doc­u­ments are or­dered by de­creas­ing prob­a­bil­ity of rel­e­vance on the data avail­able, then the sys­tem’s ef­fec­tive­ness is the best that can be ob­tained for the data.

Unfortunately, cal­cu­lat­ing the true” prob­a­bil­ity that a doc­u­ment is rel­e­vant to a query is nearly im­pos­si­ble.

However, we re­ally care about the or­der of the doc­u­ments more than we care about the ex­act prob­a­bil­ity. Because of this, re­searchers re­al­ized that you could sim­plify the equa­tions and make it prac­ti­ca­ble. Specifically, you could drop terms from the equa­tion that would be re­quired to cal­cu­late the full prob­a­bil­ity but where leav­ing them out would not af­fect the or­der.

Even though we are us­ing the Probability Ranking Principle, we are ac­tu­ally cal­cu­lat­ing a weight” in­stead of a prob­a­bil­ity.

This equa­tion cal­cu­lates the weight us­ing term fre­quen­cies. Specifically:

* is the weight for a given doc­u­ment

* is the prob­a­bil­ity that the query term would ap­pear in the doc­u­ment with a given fre­quency () if the doc­u­ment is rel­e­vant ()

The var­i­ous terms boil down to the prob­a­bil­ity that we would see a cer­tain query term fre­quency within the doc­u­ment if the doc­u­ment is rel­e­vant or not rel­e­vant, and the prob­a­bil­i­ties that the term would not ap­pear at all if the doc­u­ment is rel­e­vant or not.

The Robertson/Sparck Jones Weight is a way of es­ti­mat­ing these prob­a­bil­i­ties but only us­ing the counts of dif­fer­ent sets of doc­u­ments:

The terms here are:

* is the num­ber of rel­e­vant doc­u­ments that con­tain the query term

* is the to­tal num­ber of doc­u­ments in the col­lec­tion

* is the num­ber of rel­e­vant doc­u­ments in the col­lec­tion

* is the num­ber of doc­u­ments that con­tain the query term

The big, glar­ing prob­lem with this equa­tion is that you first need to know which doc­u­ments are rel­e­vant to the query. How are we go­ing to get those?

The ques­tion about how to make use of the Robertson/Sparck Joes weight ap­par­ently stumped the en­tire re­search field for about 15 years. The equa­tion was built up from a solid the­o­ret­i­cal foun­da­tion, but re­ly­ing on al­ready hav­ing rel­e­vance in­for­ma­tion made it nearly im­pos­si­ble to put to use.

The BM25 de­vel­op­ers made a very clever as­sump­tion to get to the next step.

For any given query, we can as­sume that most doc­u­ments are not go­ing to be rel­e­vant. If we as­sume that the num­ber of rel­e­vant doc­u­ments is so small as to be neg­li­gi­ble, we can just set those num­bers to zero!

If we sub­sti­tute this into the Robertson/Sparck Jones Weight equa­tion, we get nearly the IDF term used in BM25:

Not re­ly­ing on rel­e­vance in­for­ma­tion made BM25 much more use­ful, while keep­ing the same the­o­ret­i­cal un­der­pin­nings. Victor Lavrenko de­scribed this as a very im­pres­sive leap of faith”, and I think this is quite a neat bit of BM25′s back­story.

As I men­tioned at the start, my mo­ti­vat­ing ques­tion was whether I could com­pare BM25 scores for a doc­u­ment across queries to un­der­stand which query the doc­u­ment best matches.

In gen­eral, BM25 scores can­not be di­rectly com­pared (and this is what ChatGPT and Claude stressed to me in re­sponse to my ini­tial in­quiries 🙂‍↔️). The al­go­rithm does not pro­duce a score from 0 to 1 that is easy to com­pare across sys­tems, and it does­n’t even try to es­ti­mate the prob­a­bil­ity that a doc­u­ment is rel­e­vant. It only fo­cuses on rank­ing doc­u­ments within a cer­tain col­lec­tion in an or­der that ap­prox­i­mates the prob­a­bil­ity of their rel­e­vance to the query. A higher BM25 score means the doc­u­ment is likely to be more rel­e­vant, but it is­n’t the ac­tual prob­a­bil­ity that it is rel­e­vant.

As far as I un­der­stand now, it is pos­si­ble to com­pare the BM25 scores across queries for the same doc­u­ment within the same col­lec­tion of doc­u­ments.

My hint that this was the case was the fact that BM25 sums the scores of each query term. There should not be a se­man­tic dif­fer­ence be­tween com­par­ing the scores for two query term and two whole queries.

The im­por­tant caveat to stress, how­ever, is the same doc­u­ment within the same col­lec­tion. BM25 uses the IDF or rar­ity of terms as well as the av­er­age doc­u­ment length within the col­lec­tion. Therefore, you can­not nec­es­sar­ily com­pare scores across time be­cause any mod­i­fi­ca­tions to the over­all col­lec­tion could change the scores.

For my pur­poses, though, this is use­ful enough. It means that I can do a full text search for each of a user’s in­ter­ests in my col­lec­tion of con­tent and com­pare the BM25 scores to help de­ter­mine which pieces best match their in­ter­ests.

I’ll write more about rank­ing al­go­rithms and how I’m us­ing the rel­e­vance scores in fu­ture posts, but in the mean­time I hope you’ve found this back­ground on BM25 use­ful or in­ter­est­ing!

Thanks to Alex Kesling and Natan Last for feed­back on drafts of this post.

If you are in­ter­ested in div­ing fur­ther into the the­ory and his­tory of BM25, I would highly rec­om­mend watch­ing Elastic en­gi­neer Britta Weber’s 2016 talk Improved Text Scoring with BM25 and read­ing The Probabilistic Relevance Framework: BM25 and Beyond by Stephen Robertson and Hugo Zaragoza.

Also, I had ini­tially in­cluded com­par­isons be­tween BM25 and some other al­go­rithms in this post. But, as you know, it was al­ready a bit long 😅. So, you can now find those in this other post: Comparing full text search al­go­rithms: BM25, TF-IDF, and Postgres.

...

Read the original on emschwartz.me »

8 237 shares, 9 trendiness

4.3 — blender.org

With light link­ing, lights can be set to af­fect only spe­cific ob­jects in the scene.

Shadow link­ing ad­di­tion­ally gives con­trol over which ob­jects acts as shadow block­ers for a light.

This is now fea­ture par­ity with Cycles.

...

Read the original on www.blender.org »

9 231 shares, 35 trendiness

Building a Large Geospatial Model to Achieve Spatial Intelligence

Skip to main con­tent

At Niantic, we are pi­o­neer­ing the con­cept of a Large Geospatial Model that will use large-scale ma­chine learn­ing to un­der­stand a scene and con­nect it to mil­lions of other scenes glob­ally.

When you look at a fa­mil­iar type of struc­ture — whether it’s a church, a statue, or a town square — it’s fairly easy to imag­ine what it might look like from other an­gles, even if you haven’t seen it from all sides. As hu­mans, we have spatial un­der­stand­ing” that means we can fill in these de­tails based on count­less sim­i­lar scenes we’ve en­coun­tered be­fore. But for ma­chines, this task is ex­tra­or­di­nar­ily dif­fi­cult. Even the most ad­vanced AI mod­els to­day strug­gle to vi­su­al­ize and in­fer miss­ing parts of a scene, or to imag­ine a place from a new an­gle. This is about to change: Spatial in­tel­li­gence is the next fron­tier of AI mod­els.

As part of Niantic’s Visual Positioning System (VPS), we have trained more than 50 mil­lion neural net­works, with more than 150 tril­lion pa­ra­me­ters, en­abling op­er­a­tion in over a mil­lion lo­ca­tions. In our vi­sion for a Large Geospatial Model (LGM), each of these lo­cal net­works would con­tribute to a global large model, im­ple­ment­ing a shared un­der­stand­ing of ge­o­graphic lo­ca­tions, and com­pre­hend­ing places yet to be fully scanned.

The LGM will en­able com­put­ers not only to per­ceive and un­der­stand phys­i­cal spaces, but also to in­ter­act with them in new ways, form­ing a crit­i­cal com­po­nent of AR glasses and fields be­yond, in­clud­ing ro­bot­ics, con­tent cre­ation and au­tonomous sys­tems. As we move from phones to wear­able tech­nol­ogy linked to the real world, spa­tial in­tel­li­gence will be­come the world’s fu­ture op­er­at­ing sys­tem.

Large Language Models (LLMs) are hav­ing an un­de­ni­able im­pact on our every­day lives and across mul­ti­ple in­dus­tries. Trained on in­ter­net-scale col­lec­tions of text, LLMs can un­der­stand and gen­er­ate writ­ten lan­guage in a way that chal­lenges our un­der­stand­ing of intelligence”.

Large Geospatial Models will help com­put­ers per­ceive, com­pre­hend, and nav­i­gate the phys­i­cal world in a way that will seem equally ad­vanced. Analogous to LLMs, geospa­tial mod­els are built us­ing vast amounts of raw data: bil­lions of im­ages of the world, all an­chored to pre­cise lo­ca­tions on the globe, are dis­tilled into a large model that en­ables a lo­ca­tion-based un­der­stand­ing of space, struc­tures, and phys­i­cal in­ter­ac­tions.

The shift from text-based mod­els to those based on 3D data mir­rors the broader tra­jec­tory of AIs growth in re­cent years: from un­der­stand­ing and gen­er­at­ing lan­guage, to in­ter­pret­ing and cre­at­ing sta­tic and mov­ing im­ages (2D vi­sion mod­els), and, with cur­rent re­search ef­forts in­creas­ing, to­wards mod­el­ing the 3D ap­pear­ance of ob­jects (3D vi­sion mod­els).

Geospatial mod­els are a step be­yond even 3D vi­sion mod­els in that they cap­ture 3D en­ti­ties that are rooted in spe­cific ge­o­graphic lo­ca­tions and have a met­ric qual­ity to them. Unlike typ­i­cal 3D gen­er­a­tive mod­els, which pro­duce un­scaled as­sets, a Large Geospatial Model is bound to met­ric space, en­sur­ing pre­cise es­ti­mates in scale-met­ric units. These en­ti­ties there­fore rep­re­sent next-gen­er­a­tion maps, rather than ar­bi­trary 3D as­sets. While a 3D vi­sion model may be able to cre­ate and un­der­stand a 3D scene, a geospa­tial model un­der­stands how that scene re­lates to mil­lions of other scenes, ge­o­graph­i­cally, around the world. A geospa­tial model im­ple­ments a form of geospa­tial in­tel­li­gence, where the model learns from its pre­vi­ous ob­ser­va­tions and is able to trans­fer knowl­edge to new lo­ca­tions, even if those are ob­served only par­tially.

While AR glasses with 3D graph­ics are still sev­eral years away from the mass mar­ket, there are op­por­tu­ni­ties for geospa­tial mod­els to be in­te­grated with au­dio-only or 2D dis­play glasses. These mod­els could guide users through the world, an­swer ques­tions, pro­vide per­son­al­ized rec­om­men­da­tions, help with nav­i­ga­tion, and en­hance real-world in­ter­ac­tions. Large lan­guage mod­els could be in­te­grated so un­der­stand­ing and space come to­gether, giv­ing peo­ple the op­por­tu­nity to be more in­formed and en­gaged with their sur­round­ings and neigh­bor­hoods. Geospatial in­tel­li­gence, as emerg­ing from a large geospa­tial model, could also en­able gen­er­a­tion, com­ple­tion or ma­nip­u­la­tion of 3D rep­re­sen­ta­tions of the world to help build the next gen­er­a­tion of AR ex­pe­ri­ences. Beyond gam­ing, Large Geospatial Models will have wide­spread ap­pli­ca­tions, rang­ing from spa­tial plan­ning and de­sign, lo­gis­tics, au­di­ence en­gage­ment, and re­mote col­lab­o­ra­tion.

Our work so far

Over the past five years, Niantic has fo­cused on build­ing our Visual Positioning System (VPS), which uses a sin­gle im­age from a phone to de­ter­mine its po­si­tion and ori­en­ta­tion us­ing a 3D map built from peo­ple scan­ning in­ter­est­ing lo­ca­tions in our games and Scaniverse.

With VPS, users can po­si­tion them­selves in the world with cen­time­ter-level ac­cu­racy. That means they can see dig­i­tal con­tent placed against the phys­i­cal en­vi­ron­ment pre­cisely and re­al­is­ti­cally. This con­tent is per­sis­tent in that it stays in a lo­ca­tion af­ter you’ve left, and it’s then share­able with oth­ers. For ex­am­ple, we re­cently started rolling out an ex­per­i­men­tal fea­ture in Pokémon GO, called Pokémon Playgrounds, where the user can place Pokémon at a spe­cific lo­ca­tion, and they will re­main there for oth­ers to see and in­ter­act with.

Niantic’s VPS is built from user scans, taken from dif­fer­ent per­spec­tives and at var­i­ous times of day, at many times dur­ing the years, and with po­si­tion­ing in­for­ma­tion at­tached, cre­at­ing a highly de­tailed un­der­stand­ing of the world. This data is unique be­cause it is taken from a pedes­trian per­spec­tive and in­cludes places in­ac­ces­si­ble to cars.

Today we have 10 mil­lion scanned lo­ca­tions around the world, and over 1 mil­lion of those are ac­ti­vated and avail­able for use with our VPS ser­vice. We re­ceive about 1 mil­lion fresh scans each week, each con­tain­ing hun­dreds of dis­crete im­ages.

As part of the VPS, we build clas­si­cal 3D vi­sion maps us­ing struc­ture from mo­tion tech­niques - but also a new type of neural map for each place. These neural mod­els, based on our re­search pa­pers ACE (2023) and ACE Zero (2024) do not rep­re­sent lo­ca­tions us­ing clas­si­cal 3D data struc­tures any­more, but en­code them im­plic­itly in the learn­able pa­ra­me­ters of a neural net­work. These net­works can swiftly com­press thou­sands of map­ping im­ages into a lean, neural rep­re­sen­ta­tion. Given a new query im­age, they of­fer pre­cise po­si­tion­ing for that lo­ca­tion with cen­time­ter-level ac­cu­racy.

Niantic has trained more than 50 mil­lion neural nets to date, where mul­ti­ple net­works can con­tribute to a sin­gle lo­ca­tion. All these net­works com­bined com­prise over 150 tril­lion pa­ra­me­ters op­ti­mized us­ing ma­chine learn­ing.

Our cur­rent neural map is a vi­able geospa­tial model, ac­tive and us­able right now as part of Niantic’s VPS. It is also most cer­tainly large”. However, our vi­sion of a Large Geospatial Model” goes be­yond the cur­rent sys­tem of in­de­pen­dent lo­cal maps.

An en­tirely lo­cal model might lack com­plete cov­er­age of their re­spec­tive lo­ca­tions. No mat­ter how much data we have avail­able on a global scale, lo­cally, it will of­ten be sparse. The main fail­ure mode of a lo­cal model is its in­abil­ity to ex­trap­o­late be­yond what it has al­ready seen and from where the model has seen it. Therefore, lo­cal mod­els can only po­si­tion cam­era views sim­i­lar to the views they have been trained with al­ready.

Imagine your­self stand­ing be­hind a church. Let us as­sume the clos­est lo­cal model has seen only the front en­trance of that church, and thus, it will not be able to tell you where you are. The model has never seen the back of that build­ing. But on a global scale, we have seen a lot of churches, thou­sands of them, all cap­tured by their re­spec­tive lo­cal mod­els at other places world­wide. No church is the same, but many share com­mon char­ac­ter­is­tics. An LGM is a way to ac­cess that dis­trib­uted knowl­edge.

An LGM dis­tills com­mon in­for­ma­tion in a global large-scale model that en­ables com­mu­ni­ca­tion and data shar­ing across lo­cal mod­els. An LGM would be able to in­ter­nal­ize the con­cept of a church, and, fur­ther­more, how these build­ings are com­monly struc­tured. Even if, for a spe­cific lo­ca­tion, we have only mapped the en­trance of a church, an LGM would be able to make an in­tel­li­gent guess about what the back of the build­ing looks like, based on thou­sands of churches it has seen be­fore. Therefore, the LGM al­lows for un­prece­dented ro­bust­ness in po­si­tion­ing, even from view­points and an­gles that the VPS has never seen.

The global model im­ple­ments a cen­tral­ized un­der­stand­ing of the world, en­tirely de­rived from geospa­tial and vi­sual data. The LGM ex­trap­o­lates lo­cally by in­ter­po­lat­ing glob­ally.

The process de­scribed above is sim­i­lar to how hu­mans per­ceive and imag­ine the world. As hu­mans, we nat­u­rally rec­og­nize some­thing we’ve seen be­fore, even from a dif­fer­ent an­gle. For ex­am­ple, it takes us rel­a­tively lit­tle ef­fort to back-track our way through the wind­ing streets of a European old town. We iden­tify all the right junc­tions al­though we had only seen them once and from the op­pos­ing di­rec­tion. This takes a level of un­der­stand­ing of the phys­i­cal world, and cul­tural spaces, that is nat­ural to us, but ex­tremely dif­fi­cult to achieve with clas­si­cal ma­chine vi­sion tech­nol­ogy. It re­quires knowl­edge of some ba­sic laws of na­ture: the world is com­posed of ob­jects which con­sist of solid mat­ter and there­fore have a front and a back. Appearance changes based on time of day and sea­son. It also re­quires a con­sid­er­able amount of cul­tural knowl­edge: the shape of many man-made ob­jects fol­low spe­cific rules of sym­me­try or other generic types of lay­outs — of­ten de­pen­dent on the ge­o­graphic re­gion.

While early com­puter vi­sion re­search tried to de­ci­pher some of these rules in or­der to hard-code them into hand-crafted sys­tems, it is now con­sen­sus that such a high de­gree of un­der­stand­ing as we as­pire to can re­al­is­ti­cally only be achieved via large-scale ma­chine learn­ing. This is what we aim for with our LGM. We have seen a first glimpse of im­pres­sive cam­era po­si­tion­ing ca­pa­bil­i­ties emerg­ing from our data in our re­cent re­search pa­per MicKey (2024). MicKey is a neural net­work able to po­si­tion two cam­era views rel­a­tive to each other, even un­der dras­tic view­point changes.

MicKey can han­dle even op­pos­ing shots that would take a hu­man some ef­fort to fig­ure out. MicKey was trained on a tiny frac­tion of our data — data that we re­leased to the aca­d­e­mic com­mu­nity to en­cour­age this type of re­search. MicKey is lim­ited to two-view in­puts and was trained on com­par­a­tively lit­tle data, but it still rep­re­sents a proof of con­cept re­gard­ing the po­ten­tial of an LGM. Evidently, to ac­com­plish geospa­tial in­tel­li­gence as out­lined in this text, an im­mense in­flux of geospa­tial data is needed — a kind of data not many or­ga­ni­za­tions have ac­cess to. Therefore, Niantic is in a unique po­si­tion to lead the way in mak­ing a Large Geospatial Model a re­al­ity, sup­ported by more than a mil­lion user-con­tributed scans of real-world places we re­ceive per week.

An LGM will be use­ful for more than mere po­si­tion­ing. In or­der to solve po­si­tion­ing well, the LGM has to en­code rich geo­met­ri­cal, ap­pear­ance and cul­tural in­for­ma­tion into scene-level fea­tures. These fea­tures will en­able new ways of scene rep­re­sen­ta­tion, ma­nip­u­la­tion and cre­ation. Versatile large AI mod­els like the LGM, which are use­ful for a mul­ti­tude of down­stream ap­pli­ca­tions, are com­monly re­ferred to as foundation mod­els”.

Different types of foun­da­tion mod­els will com­ple­ment each other. LLMs will in­ter­act with mul­ti­modal mod­els, which will, in turn, com­mu­ni­cate with LGMs. These sys­tems, work­ing to­gether, will make sense of the world in ways that no sin­gle model can achieve on its own. This in­ter­con­nec­tion is the fu­ture of spa­tial com­put­ing — in­tel­li­gent sys­tems that per­ceive, un­der­stand, and act upon the phys­i­cal world.

As we move to­ward more scal­able mod­els, Niantic’s goal re­mains to lead in the de­vel­op­ment of a large geospa­tial model that op­er­ates wher­ever we can de­liver novel, fun, en­rich­ing ex­pe­ri­ences to our users. And, as noted, be­yond gam­ing Large Geospatial Models will have wide­spread ap­pli­ca­tions, in­clud­ing spa­tial plan­ning and de­sign, lo­gis­tics, au­di­ence en­gage­ment, and re­mote col­lab­o­ra­tion.

The path from LLMs to LGMs is an­other step in AIs evo­lu­tion. As wear­able de­vices like AR glasses be­come more preva­lent, the world’s fu­ture op­er­at­ing sys­tem will de­pend on the blend­ing of phys­i­cal and dig­i­tal re­al­i­ties to cre­ate a sys­tem for spa­tial com­put­ing that will put peo­ple at the cen­ter.

...

Read the original on nianticlabs.com »

10 184 shares, 8 trendiness

Bluesky is ushering in a pick-your-own algorithm era of social media

More than 20 mil­lion peo­ple have joined Bluesky, a so­cial net­work that gives you fine-grained con­trol over what you see and who you in­ter­act with. I think it is the fu­ture of so­cial me­dia, says Chris Stokel-Walker

More than 20 mil­lion peo­ple have joined Bluesky, a so­cial net­work that gives you fine-grained con­trol over what you see and who you in­ter­act with. I think it is the fu­ture of so­cial me­dia, says Chris Stokel-Walker

As a tech­nol­ogy re­porter, I like to think I’m an early adopter. I first signed up to the so­cial net­work Bluesky around 18 months ago, when the plat­form saw a small surge in users dis­af­fected by Elon Musk’s ap­proach to what was then still called Twitter.

It did­n’t stick. Like many, I found the lure of Twitter too strong, and let my Bluesky ac­count wither, but in re­cent weeks I have re­turned — and I am not alone. With Musk con­tin­u­ing to trans­form his so­cial plat­form, now called X, at the same time as tak­ing a role in US pres­i­dent-elect Donald Trump’s up­com­ing gov­ern­ment, the Xodus has be­gun. Bluesky has gained 12 mil­lion users in two months, and has just sur­passed 20 mil­lion users. This time I in­tend on stick­ing around — and I think oth­ers will, too.

How does ChatGPT work and do AI-powered chat­bots think” like us?

In large part, that’s be­cause I want a so­cial me­dia ex­pe­ri­ence with­out be­ing bom­barded by hate speech, gore and porno­graphic videos — all of which users of X have com­plained about in re­cent months. But I’m also big on Bluesky be­cause I think it sig­nals a shift in how so­cial me­dia works on a more fun­da­men­tal level.

Social me­dia al­go­rithms — the com­puter code that de­cides what each user is shown — have long been a point of con­tentious de­bate. Fears of dis­ap­pear­ing down rabbit holes” of rad­i­cal­i­sa­tion, or be­ing trapped in echo cham­bers” of con­sen­sual, some­times con­spir­a­to­r­ial, view­points, have dom­i­nated sci­en­tific lit­er­a­ture.

The use of al­go­rithms to fil­ter in­for­ma­tion has be­come the norm be­cause chrono­log­i­cally pre­sent­ing in­for­ma­tion from fol­low­ers cre­ates a con­fus­ing morass for the av­er­age user to process. Sorting and fil­ter­ing what is im­por­tant — or likely to keep users en­gaged — has be­come key to the suc­cess of plat­forms like Facebook, X and Instagram.

But con­trol of these al­go­rithms also gives you a big say in what peo­ple read. One of the bug­bears many users have with X is its For you” al­go­rithm, which un­der Musk has seen com­men­tary by and about him seem­ingly shoved into users’ time­lines, even if they don’t di­rectly fol­low him.

Bluesky’s ap­proach is­n’t to ditch al­go­rithms — in­stead, it has more than the av­er­age so­cial net­work. In a 2023 blog post, Jay Graber, Bluesky’s CEO, out­lined the ethos of the plat­form. Bluesky pro­motes a marketplace of al­go­rithms”, she wrote, in­stead of a sin­gle master al­go­rithm”.

Mind-reading AI recre­ates what you’re look­ing at with amaz­ing ac­cu­racy

In prac­tice, this means that users can see posts by peo­ple they fol­low on the app, the stan­dard view Bluesky de­faults to. But they can equally opt to see what’s pop­u­lar with friends, an al­go­rith­mi­cally-dic­tated se­lec­tion of posts that your peers en­joy. There are feeds specif­i­cally for sci­en­tists, cu­rated by those work­ing in the field, or ones to pro­mote Black voices, which are of­ten thinned out by al­go­rith­mic fil­ter­ing. One feed even specif­i­cally pro­motes quiet posters” — users who post in­fre­quently, and whose views would oth­er­wise be drowned out by those who share every opin­ion with their fol­low­ers.

This menu of op­tions al­lows Bluesky to serve two pur­poses, bridg­ing the past era of so­cial me­dia and the fu­ture one. The plat­form has the po­ten­tial, once it reaches a crit­i­cal mass of users, to act as the de facto pub­lic town square”, as Musk dubbed Twitter be­fore he pur­chased it. Bluesky ar­guably is the only re­main­ing such square, given X has shifted to ex­clude many main­stream voices, and com­peti­tors like Threads choose to shy away from pro­mot­ing pol­i­tics and cur­rent af­fairs.

But Bluesky also al­lows you to tai­lor the app to your needs — not only through feeds, but other el­e­ments like starter packs of rec­om­mended users to quickly get in­volved in in­di­vid­ual niches, or block­ing tools to qui­eten un­ruly voices.

There are still hitches, un­doubt­edly. Finding the right feed for you can be tricky, while cre­at­ing your own is even more com­pli­cated, re­quir­ing third-party tools. But the abil­ity to get the full view of pub­lic con­ver­sa­tion, then to drill down into smaller de­bates within clus­ters and com­mu­ni­ties of that broad swathe of so­ci­ety, is ex­cit­ing. It’s a model of a new so­cial me­dia where users, not big com­pa­nies or enig­matic in­di­vid­u­als, are in charge of what they see. And if Bluesky con­tin­ues to add users, it could be­come the norm. So come and join me — I’m @stokel.bsky.social.

...

Read the original on www.newscientist.com »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.