10 interesting stories served every morning and every evening.




1 1,222 shares, 156 trendiness

The Git Commands I Run Before Reading Any Code

The first thing I usu­ally do when I pick up a new code­base is­n’t open­ing the code. It’s open­ing a ter­mi­nal and run­ning a hand­ful of git com­mands. Before I look at a sin­gle file, the com­mit his­tory gives me a di­ag­nos­tic pic­ture of the pro­ject: who built it, where the prob­lems clus­ter, whether the team is ship­ping with con­fi­dence or tip­toe­ing around land mines.

The 20 most-changed files in the last year. The file at the top is al­most al­ways the one peo­ple warn me about. Oh yeah, that file. Everyone’s afraid to touch it.”

High churn on a file does­n’t mean it’s bad. Sometimes it’s just ac­tive de­vel­op­ment. But high churn on a file that no­body wants to own is the clear­est sig­nal of code­base drag I know. That’s the file where every change is a patch on a patch. The blast ra­dius of a small edit is un­pre­dictable. The team pads their es­ti­mates be­cause they know it’s go­ing to fight back.

A 2005 Microsoft Research study found churn-based met­rics pre­dicted de­fects more re­li­ably than com­plex­ity met­rics alone. I take the top 5 files from this list and cross-ref­er­ence them against the bug hotspot com­mand be­low. A file that’s high-churn and high-bug is your sin­gle biggest risk.

Every con­trib­u­tor ranked by com­mit count. If one per­son ac­counts for 60% or more, that’s your bus fac­tor. If they left six months ago, it’s a cri­sis. If the top con­trib­u­tor from the over­all short­log does­n’t ap­pear in a 6-month win­dow (git short­log -sn –no-merges –since=“6 months ago”), I flag that to the client im­me­di­ately.

I also look at the tail. Thirty con­trib­u­tors but only three ac­tive in the last year. The peo­ple who built this sys­tem aren’t the peo­ple main­tain­ing it.

One caveat: squash-merge work­flows com­press au­thor­ship. If the team squashes every PR into a sin­gle com­mit, this out­put re­flects who merged, not who wrote. Worth ask­ing about the merge strat­egy be­fore draw­ing con­clu­sions.

Same shape as the churn com­mand, fil­tered to com­mits with bug-re­lated key­words. Compare this list against the churn hotspots. Files that ap­pear on both are your high­est-risk code: they keep break­ing and keep get­ting patched, but never get prop­erly fixed.

This de­pends on com­mit mes­sage dis­ci­pline. If the team writes update stuff” for every com­mit, you’ll get noth­ing. But even a rough map of bug den­sity is bet­ter than no map.

Commit count by month, for the en­tire his­tory of the repo. I scan the out­put look­ing for shapes. A steady rhythm is healthy. But what does it look like when the count drops by half in a sin­gle month? Usually some­one left. A de­clin­ing curve over 6 to 12 months tells you the team is los­ing mo­men­tum. Periodic spikes fol­lowed by quiet months means the team batches work into re­leases in­stead of ship­ping con­tin­u­ously.

I once showed a CTO their com­mit ve­loc­ity chart and they said that’s when we lost our sec­ond se­nior en­gi­neer.” They had­n’t con­nected the time­line be­fore. This is team data, not code data.

Revert and hot­fix fre­quency. A hand­ful over a year is nor­mal. Reverts every cou­ple of weeks means the team does­n’t trust its de­ploy process. They’re ev­i­dence of a deeper is­sue: un­re­li­able tests, miss­ing stag­ing, or a de­ploy pipeline that makes roll­backs harder than they should be. Zero re­sults is also a sig­nal; ei­ther the team is sta­ble, or no­body writes de­scrip­tive com­mit mes­sages.

Crisis pat­terns are easy to read. Either they’re there or they’re not.

These five com­mands take a cou­ple min­utes to run. They won’t tell you every­thing. But you’ll know which code to read first, and what to look for when you get there. That’s the dif­fer­ence be­tween spend­ing your first day read­ing the code­base me­thod­i­cally and spend­ing it wan­der­ing.

This is the first hour of what I do in a code­base au­dit. Here’s what the rest of the week looks like.

...

Read the original on piechowski.io »

2 910 shares, 41 trendiness

Artemis II Lunar Flyby

The first flyby im­ages of the Moon cap­tured by NASAs Artemis II as­tro­nauts dur­ing their his­toric test flight re­veal re­gions no hu­man has ever seen be­fore—in­clud­ing a rare in-space so­lar eclipse. Released Tuesday, April 7, 2026, the pho­tos were taken on April 6 dur­ing the crew’s seven‑hour pass over the lu­nar far side, mark­ing hu­man­i­ty’s re­turn to the Moon’s vicin­ity.

...

Read the original on www.nasa.gov »

3 889 shares, 90 trendiness

VeraCrypt / Forums / General Discussion

Open source disk en­cryp­tion with strong se­cu­rity for the Paranoid

...

Read the original on sourceforge.net »

4 574 shares, 27 trendiness

US and Iran agree to provisional ceasefire as Tehran says it will reopen strait of Hormuz

The US and Iran agreed to a two-week con­di­tional cease­fire on Tuesday evening, which in­cluded a tem­po­rary re­open­ing of the strait of Hormuz, af­ter a last-minute diplo­matic in­ter­ven­tion led by Pakistan, can­cel­ing an ul­ti­ma­tum from Donald Trump for Iran to sur­ren­der or face wide­spread de­struc­tion.

Trump’s an­nounce­ment of the cease­fire agree­ment came less than two hours be­fore the US pres­i­den­t’s self-im­posed 8pm Eastern time dead­line to bomb Iran’s power plants and bridges in a move that le­gal schol­ars, as well as of­fi­cials from nu­mer­ous coun­tries and the pope, had warned could con­sti­tute war crimes.

Just hours ear­lier, Trump had writ­ten on Truth Social: A whole civ­i­liza­tion will die tonight, never to be brought back again. I don’t want that to hap­pen, but it prob­a­bly will.” American B-52 bombers were re­ported to be en route to Iran be­fore the cease­fire agree­ment was an­nounced.

But by Tuesday evening, Trump an­nounced that a cease­fire agree­ment had been me­di­ated through Pakistan, whose prime min­is­ter, Shehbaz Sharif, had re­quested the two-week peace in or­der to allow diplo­macy to run its course”.

Trump wrote in a post that subject to the Islamic Republic of Iran agree­ing to the COMPLETE, IMMEDIATE, and SAFE OPENING of the Strait of Hormuz, I agree to sus­pend the bomb­ing and at­tack of Iran for a pe­riod of two weeks”.

In a sep­a­rate post later, the US pres­i­dent called Tuesday a big day for world peace” on a so­cial me­dia post, claim­ing that Iran had had enough”. He said the US would be helping with the traf­fic buildup” in the strait of Hormuz and that big money will be made” as Iran be­gins re­con­struc­tion.

For sev­eral hours af­ter­wards, Israel’s po­si­tion or agree­ment with the deal was un­clear. But just be­fore mid­night ET, the prime min­is­ter, Benjamin Netanyahu, said Israel backed the US cease­fire with Iran but that the deal did not cover fight­ing against Hezbollah in Lebanon. His of­fice said Israel also sup­ported US ef­forts to en­sure Iran no longer posed a nu­clear or mis­sile threat.

Pakistan’s prime min­is­ter had pre­vi­ously said that the agreed-upon cease­fire cov­ered everywhere in­clud­ing Lebanon”.

The cease­fire process was clouded in un­cer­tainty af­ter Iran re­leased two dif­fer­ent ver­sions of the 10-point plan in­tended to be the ba­sis for ne­go­ti­a­tions, and which Trump said was a workable ba­sis on which to ne­go­ti­ate”.

In the ver­sion re­leased in Farsi, Iran in­cluded the phrase acceptance of en­rich­ment” for its nu­clear pro­gram. But for rea­sons that re­main un­clear, that phrase was miss­ing in English ver­sions shared by Iranian diplo­mats to jour­nal­ists.

Pakistan has in­vited the US and Iran to talks in Islamabad on Friday. Tehran said it would at­tend, but Washington has yet to pub­licly ac­cept the in­vi­ta­tion.

In a tele­phone call with Agence France-Presse, Trump said he be­lieved China had per­suaded Iran to ne­go­ti­ate, and said Tehran’s en­riched ura­nium would be perfectly taken care of”, with­out pro­vid­ing more de­tail.

In the two-week cease­fire, Trump said, he be­lieved the US and Iran could ne­go­ti­ate over the 10-point pro­posal that would al­low an armistice to be finalized and con­sum­mated”.

This will be a dou­ble sided CEASEFIRE!” he con­tin­ued. The rea­son for do­ing so is that we have al­ready met and ex­ceeded all Military ob­jec­tives, and are very far along with a de­fin­i­tive Agreement con­cern­ing Longterm PEACE with Iran, and PEACE in the Middle East.”

Iran’s for­eign min­is­ter, Abbas Araghchi, is­sued a state­ment shortly af­ter Trump’s an­nounce­ment say­ing Iran had agreed to the cease­fire. For a pe­riod of two weeks, safe pas­sage through the Strait of Hormuz will be pos­si­ble via co­or­di­nat­ing with Iran’s Armed Forces,” he wrote.

Oil prices dived, stocks surged and the dol­lar was knocked back on Wednesday as a two-week Middle East cease­fire sparked a re­lief rally, fu­eled by hopes that oil and gas flows through the strait of Hormuz could re­sume.

Despite the pro­vi­sional cease­fire, at­tacks con­tin­ued across the re­gion in the hours af­ter Trump’s an­nounce­ment. Before the dead­line, airstrikes hit two bridges and a train sta­tion in Iran, and the US hit mil­i­tary in­fra­struc­ture on Kharg Island, a key hub for Iranian oil pro­duc­tion.

The sud­den about-face will al­low Trump to step back as the US war in Iran has dragged on for five weeks with lit­tle sign that Tehran is ready to sur­ren­der or re­lease its hold on the strait, a con­duit for a fifth of the global en­ergy sup­ply, where traf­fic has slowed to a trickle.

Trump had ear­lier re­jected the 10-point plan as not good enough” but the pres­i­dent has set dead­lines be­fore and al­lowed them to pass over the five weeks of the con­flict. Yet he in­sisted on Tuesday the en­su­ing hours would be one of the most im­por­tant mo­ments in the long and com­plex his­tory of the World” un­less something rev­o­lu­tion­ar­ily won­der­ful” hap­pened, with less rad­i­cal­ized minds” in Iran’s lead­er­ship.

News of the pro­vi­sional cease­fire deal was wel­comed but with a note of cau­tion else­where.

Iraq’s for­eign min­istry called for serious and sus­tain­able di­a­logue” be­tween the US and Iran to ad­dress the root causes of the dis­putes”, while the German for­eign min­is­ter, Johann Wadephul, said the deal must be the cru­cial first step to­wards last­ing peace, for the con­se­quences of the war con­tin­u­ing would be in­cal­cu­la­ble”.

In Australia, the gov­ern­ment warned that the lat­est de­vel­op­ments would not nec­es­sar­ily mean the fuel cri­sis is over. Oil prices fell as traders bet that the re­open­ing of the strait of Hormuz would help fuel sup­ply re­sume, but the en­ergy min­is­ter, Chris Bowen, told re­porters Australians should not get ahead of our­selves”.

He said: People should­n’t take to­day’s progress and ex­pect prices to fall. We wel­come progress, but I don’t think we can say the [strait of Hormuz is] now open.”

A spokesper­son for New Zealand’s for­eign min­is­ter, Winston Peters, wel­comed the encouraging news” but noted there re­mains sig­nif­i­cant im­por­tant work to be done to se­cure a last­ing cease­fire”.

Japan said it ex­pected the move to re­sult in a final agree­ment” af­ter Washington and Tehran be­gin talks on Friday. Describing the cease­fire as a positive move”, the chief cab­i­net sec­re­tary, Minoru Kihara, said Tokyo wanted to see a de-es­ca­la­tion on the ground in the re­gion, adding that the prime min­is­ter, Sanae Takaichi, was seek­ing talks with the Iranian pres­i­dent, Masoud Pezeshkian.

A tem­po­rary end to hos­til­i­ties will come as a re­lief to Japan, which de­pends on the Middle East for about 90% of its crude oil im­ports, most of which is trans­ported through the strait of Hormuz.

South Korea’s min­istry of for­eign af­fairs said it hoped negotiations be­tween the two sides will be suc­cess­fully con­cluded and that peace and sta­bil­ity in the Middle East will be re­stored at an early date”, as well as wishes for free and safe nav­i­ga­tion of all ves­sels through the strait of Hormuz”.

...

Read the original on www.theguardian.com »

5 438 shares, 18 trendiness

How to Get Better at Guitar

Years ago I watched a video where gui­tar teacher Justin Sandercoe ex­plained a way to get bet­ter at gui­tar. It has changed my play­ing, and it might change yours, too.

This is­n’t my idea; it’s Justin’s. You should watch his video and visit his web­site. Why am I writ­ing about it, then? Because I think it’s an in­valu­able idea, and I want to ex­tend its reach and of­fer my tes­ti­mony.

Note: This post is part of April Cools, where writ­ers pub­lish some­thing sin­cere but dif­fer­ent from their usual work. I hope you find it in­ter­est­ing.

The Way I Learned Then: Tabs#

I grew up with a bunch of kids who played mu­sic— gui­tar, drums, and bass. We were nerds in garages, mak­ing noise.

This was the 90’s, when mag­a­zines like Guitar World were in their hey­day. These mag­a­zines con­tained pages of tab­la­ture, or tabs, of the lat­est songs, and we could­n’t wait to get the newest pub­li­ca­tions. This was pre-wide­spread in­ter­net, so these tabs were hard to find.

And yet, once I’d pur­chased the mag­a­zine, did I learn Eruption”? No, I did­n’t. It did­n’t trans­late to mas­tery.

The Way I Learn Now: Listening & Transcribing#

When we think about the gui­tar greats, we might ask, how did they get great? And we know the an­swer. They did­n’t read tabs. They lis­tened to mu­sic and im­i­tated what they heard.

That’s what you need to do. If you want to learn a song and get bet­ter, here’s the plan.

Pick an easy song— not Eruption”. Songs like these:

Rock: The Ghost of Tom Joad”, Rage Against the Machine

These songs have some things in com­mon: sim­ple riffs, mostly us­ing one note at a time, mostly in one part of the gui­tar neck.

Next, get a piece of tab pa­per. When I started learn­ing this tech­nique, I printed off a cou­ple dozen pages of blank tabs.

Hit play” on the song with your gui­tar in your hand, tab pa­per on the table, and a pen­cil. When you hear the first gui­tar note, stop the song, find the note on the gui­tar, and write it down.

Hit play” again, and when you get to the sec­ond note, stop, find it, and write it down.

Keep do­ing this un­til you fin­ish the song. It’s go­ing to feel im­pos­si­ble at first, and point­less. You’re go­ing to want to quit. Keep go­ing.

Once you’ve fin­ished tran­scrib­ing, what’s next? Next, we check our work.

Find some tabs on­line and com­pare it to what you’ve writ­ten. When I do this, of­ten I re­al­ize I got some­thing wrong, and erase my tabs and write them again. Other times, I dis­agree with the tran­scrip­tion I find.

You can also watch the play­ers per­form­ing the songs on video. Sometimes the gui­tarist is do­ing some­thing sur­pris­ing you have to see! They might be us­ing a capo, or they’re pluck­ing the strings with an Allen wrench, or there’s a hid­den gui­tarist off­stage.

When you fol­low this process, an amaz­ing thing hap­pens: you learn the song. After tran­scrib­ing, I can of­ten play a song, near tempo, on the first try. I think it’s be­cause I’ve al­ready lis­tened to the song a few dozen times and started to com­mit the move­ments to my mus­cle mem­ory.

You also get bet­ter at hear­ing a sound and find­ing it on the neck. Remember that cool kid in high school who could copy any song on the ra­dio? That’s you now!

I take these songs and put them into a playlist. Then, when­ever I want to play, I put on my playlist and go through a few of the songs that I’ve learned. This kind per­for­mance-prac­tice is fun.

An im­por­tant dis­tinc­tion here is that you’re now learn­ing songs, not riffs. Riffs are fun, but pro­fes­sional gui­tarists play songs. They learn the catchy in­tro riff, as well as the cho­rus and bridge riffs. They learn to tran­si­tion from one to an­other. It’s a dif­fer­ent skill.

Once I get one part learned, I of­ten learn the sec­ond gui­tar part, or the bass part, too. When a song starts, I some­times pick on the fly which one I’m go­ing to play. It keeps things in­ter­est­ing.

The first time I played Venus” by Television all the way through, ex­press­ing the mu­sic rather than sim­ply keep­ing up, I felt like some­thing had changed.

Here’s a sam­pling of the songs that have made it into my playlist.

Someday”, The Strokes (chords or tri­ads; take your pick)

Maps”, Yeah Yeah Yeahs (just one gui­tar, so you stay busy)

Just Like Heaven”, The Cure (iconic lead tra­vers­ing the neck)

Killing in the Name”, Rage Against the Machine (fun drop D riffs)

You can learn so­los, too, or not. There’s more to the songs than the so­los. Learning the rhythm parts can of­ten be just as chal­leng­ing and fun as the lead.

Stop at the first note, find it, and write it down.

Stop at the sec­ond note, find it, and write it down.

Continue to the end of the song.

Compare your tab with oth­ers and make ad­just­ments.

🎉 You just learned a song! And pos­si­bly some new tech­niques and styles. Keep work­ing on it, and re­peat.

Thanks to Justin, and every­one who taught me things on the gui­tar over the years. Happy April Cools! Keep rock­ing.

...

Read the original on jakeworth.com »

6 428 shares, 82 trendiness

Why Cities Are Axing the Controversial Surveillance Technology

Early this year, my home city of Bend, Oregon, ended its con­tract with sur­veil­lance com­pany Flock Safety, fol­low­ing months of pub­lic pres­sure and con­cerns around weak data pri­vacy pro­tec­tions. Flock’s con­tro­ver­sial  were shut down, and its part­ner­ship with lo­cal law en­force­ment ended.

We weren’t the only city to ac­tively re­ject Flock cam­eras. Since the start of 2026, dozens of cities have sus­pended or de­ac­ti­vated con­tracts with Flock, la­bel­ing it a vast sur­veil­lance net­work. Others might not be aware that au­to­mated li­cense plate read­ers, com­monly re­ferred to as ALPR cam­eras, have al­ready been in­stalled in their neigh­bor­hood.

Flock gripped news head­lines late last year when it was un­der the mi­cro­scope dur­ing wide­spread crack­downs by Im­mi­gra­tion and Customs Enforcement. Though Flock does­n’t have a di­rect part­ner­ship with fed­eral agen­cies (a blurry line I’ll dis­cuss more), law en­force­ment agen­cies are free to share data with de­part­ments like ICE, and they fre­quently do.

One study from the Center for Human Rights at the University of Washington found that at least eight Washington law en­force­ment agen­cies shared their Flock data net­works di­rectly with ICE in 2025, and 10 more de­part­ments al­lowed ICE back­door ac­cess with­out ex­plic­itly grant­ing the agency per­mis­sion. Many other re­ports out­line sim­i­lar ac­tiv­ity.

Following Super Bowl ads about find­ing lost dogs, Flock was un­der scrutiny about its planned part­ner­ship with Ring, Amazon’s se­cu­rity brand. The in­te­gra­tion would have al­lowed po­lice to re­quest the use of Ring-brand home se­cu­rity cam­eras for in­ves­ti­ga­tions. Following in­tense pub­lic back­lash, Ring cut ties with Flock just like my city did.

To learn more, I spoke to Flock about how the com­pa­ny’s sur­veil­lance tech­nol­ogy is used (and mis­used). I also spoke with pri­vacy ad­vo­cates from the American Civil Liberties Union to dis­cuss sur­veil­lance con­cerns and what com­mu­ni­ties are do­ing about it.

If you hear that Flock is set­ting up near you, it usu­ally means the in­stal­la­tion of ALPR cam­eras to cap­ture li­cense plate pho­tos and mon­i­tor cars on the street.

Flock signs con­tracts with a wide range of en­ti­ties, in­clud­ing city gov­ern­ments and law en­force­ment de­part­ments. A neigh­bor­hood can also part­ner with Flock — for ex­am­ple, if an HOA de­cides it wants ex­tra eyes on the road, it may choose to use Flock’s sys­tems.

When Flock se­cures a con­tract, the com­pany at strate­gic lo­ca­tions. Though these cam­eras are pri­mar­ily mar­keted for li­cense plate recog­ni­tion, Flock re­ports on its site that its sur­veil­lance sys­tem is in­tended to re­duce crime, in­clud­ing prop­erty crimes such as mail and pack­age theft, home in­va­sions, van­dal­ism, tres­pass­ing, and bur­glary.” The com­pany also says it fre­quently solves vi­o­lent crimes like assault, kid­nap­pings, shoot­ings and homi­cides.”

Flock has re­cently ex­panded into other tech­nolo­gies, in­clud­ing ad­vanced cam­eras that mon­i­tor more than just ve­hi­cles. Most con­cern­ing are the lat­est Flock drones equipped with high-pow­ered cam­eras. Flock’s Drone as First Responder” plat­form au­to­mates drone op­er­a­tions, in­clud­ing launch­ing them in re­sponse to 911 calls or gun­fire. Flock’s drones, which reach speeds up to 60 mph, can fol­low ve­hi­cles or peo­ple and pro­vide in­for­ma­tion to law en­force­ment.

Drones like these can be used to track flee­ing sus­pects. In prac­tice, the key is how law en­force­ment chooses to use them, and whether states pass laws al­low­ing po­lice to use drones with­out a war­rant — I’ll cover state laws more be­low, be­cause that’s a big part of to­day’s sur­veil­lance.

It’s im­por­tant to note that not all cities or neigh­bor­hoods re­fer to Flock Safety by name, even when us­ing its tech­nol­ogy. They might men­tion the Drone as First Responder pro­gram, or ALPR cam­eras, with­out fur­ther de­tails. For ex­am­ple, a March announcement about po­lice drones from the city of Lancaster, California, does­n’t men­tion Flock at all, even though it was the com­pany be­hind the drone pro­gram.

Flock states on its web­site that its stan­dard li­cense-plate cam­eras can­not tech­ni­cally track ve­hi­cles, but only take a point-in-time” im­age of a car to nab the li­cense plate.

However, due to AI video and im­age search, con­tracted par­ties like lo­cal law en­force­ment can use these tools to piece to­gether li­cense in­for­ma­tion and form their own time­line of where and when a ve­hi­cle went. Adding to those ca­pa­bil­i­ties, Flock also told Forbes that it’s mak­ing ef­forts to ex­pand ac­cess to in­clude video clips and live feeds.

Flock’s ma­chine learn­ing can also note de­tails like a ve­hi­cle’s body type, color, the con­di­tion of the li­cense plate and a wide va­ri­ety of iden­ti­fiers, like roof racks, paint col­ors and what you have stored in the back. Flock rarely calls this AI, but it’s sim­i­lar to  you can find in the lat­est home se­cu­rity cam­eras

A Flock spokesper­son told me the com­pany has bound­aries and does not use fa­cial recog­ni­tion. We have more tra­di­tional video cam­eras that can send an alert when one sees if a per­son is in the frame, for in­stance, in a busi­ness park at 2 a.m. or in the pub­lic parks af­ter dark.”

By traditional” cam­eras, Flock refers to those that cap­ture a wider field of view — more than just cars and li­cense plates — and can record video rather than just snap­shot im­ages.

The in­for­ma­tion Flock can ac­cess pro­vides a com­pre­hen­sive pic­ture that po­lice can use to track cars by run­ning searches on their soft­ware. Just like you might Google a lo­cal restau­rant, po­lice can search for a ba­sic ve­hi­cle de­scrip­tion and re­trieve re­cent matches that the sur­veil­lance equip­ment may have found. Those searches can some­times ex­tend to peo­ple, too.

We have an in­ves­tiga­tive tool called Freeform that lets you use nat­ural lan­guage prompts to find the in­ves­tiga­tive lead you’re look­ing for, in­clud­ing the de­scrip­tion of what a per­son’s clothes may be,” the Flock spokesper­son told me.

Unlike red-light cam­eras, Flock’s cam­eras can be in­stalled nearly any­where and snap ve­hi­cle ID im­ages for all cars. There are Safe Lists that peo­ple can use to help Flock cam­eras fil­ter out ve­hi­cles by fill­ing out a form with their ad­dress and li­cense plate to mark their ve­hi­cle as a resident.”

The op­po­site is also true: Flock cam­eras can use a hot list of known, wanted ve­hi­cles and send au­to­matic alerts to po­lice if one is found.

With Flock drones, these in­tel­li­gent searches be­come even more com­plete, al­low­ing cam­eras to track where cars are go­ing and iden­tify peo­ple. That raises ad­di­tional pri­vacy con­cerns about hav­ing eyes in the sky over your back­yard.

While fly­ing, the drone faces for­ward, look­ing at the hori­zon, un­til it gets to the call for ser­vice, at which point the cam­era looks down,” the Flock spokesper­son said. Every flight path is logged in a pub­licly avail­able flight dash­board for ap­pro­pri­ate over­sight.”

Yet un­like per­sonal se­cu­rity op­tions, there’s no easy way to opt out of this kind of sur­veil­lance. You can’t turn off a fea­ture, can­cel a sub­scrip­tion or throw away a de­vice to avoid it.

And even though more than 45 cities have can­celed Flock con­tracts amid pub­lic out­cry, that does­n’t guar­an­tee that all sur­veil­lance cam­eras will be re­moved from the des­ig­nated area.

When I reached out to the po­lice de­part­ment in Eugene, an­other city in Oregon that ended its Flock con­tract, the PD di­rec­tor of pub­lic in­for­ma­tion told me that, while there were con­cerns about cer­tain vul­ner­a­bil­i­ties and data se­cu­rity re­quire­ments with the par­tic­u­lar ven­dor, the tech­nol­ogy it­self is not the prob­lem. Eugene Police’s ALPR sys­tem ex­pe­ri­ence has demon­strated the value of lever­ag­ing ALPR tech­nol­ogy to aid in­ves­ti­ga­tions … the de­part­ment must en­sure that any ven­dors meet the high­est stan­dards.”

Flock’s stance, as out­lined in its pri­vacy and ethics guide, is that li­cense plate num­bers and ve­hi­cle de­scrip­tions aren’t per­sonal in­for­ma­tion. The com­pany says it does­n’t sur­veil private data” — only cars and gen­eral de­scrip­tive mark­ers.

But ve­hi­cle in­for­ma­tion can be con­sid­ered per­sonal be­cause it’s legally tied to the ve­hi­cle’s owner. Privacy laws, in­clud­ing pro­posed fed­eral leg­is­la­tion from 2026, pro­hibit the re­lease of per­sonal in­for­ma­tion from state mo­tor ve­hi­cle records in or­der to pro­tect cit­i­zens.

However, those laws typ­i­cally in­clude ex­emp­tions for le­gal ac­tions and law en­force­ment, some­times even for pri­vate se­cu­rity com­pa­nies.

AI de­tec­tion also plays a role. When some­one can iden­tify a ve­hi­cle through searches like red pickup truck with a dog in the bed,” that track­ing goes be­yond ba­sic li­cense plates to much more per­sonal in­for­ma­tion about the dri­ver and their life. It may in­clude the bumper stick­ers, what can be seen in the back­seat and whether a ve­hi­cle has a vis­i­ble gun rack.

Flock’s prac­tices — like its re­cent push to­ward live video feeds and drones to track sus­pects — move out of the gray area, and that’s where pri­vacy ad­vo­cates are rightly con­cerned. Despite its pol­icy, it ap­pears you can track spe­cific peo­ple us­ing Flock tech. You’ll just need to pay more to do so, such as up­grad­ing from ALPRs to Flock’s sus­pect-fol­low­ing drone pro­gram, or us­ing its Freeform tool to track some­one by the clothes they’re wear­ing.

Flock states on its web­site that it stores data for 30 days on Amazon Web Services cloud stor­age and then deletes it. It uses KMS-based en­cryp­tion (a man­aged en­cryp­tion key sys­tem com­mon in AWS) and re­ports that all im­ages and re­lated data are en­crypted from on-de­vice stor­age to cloud stor­age.

When Flock col­lects crim­i­nal jus­tice in­for­ma­tion, or sen­si­tive data man­aged by law en­force­ment, it’s only avail­able to of­fi­cial gov­ern­ment agen­cies, not an en­tity like your lo­cal HOA. Because video data is en­crypted through­out its trans­fer to the end user, em­ploy­ees at Flock can­not ac­cess it. These are the same kind of se­cu­rity prac­tices I look for when re­view­ing home se­cu­rity cam­eras, but there are more com­pli­ca­tions here.

However, Flock also makes it clear that its cus­tomers — whether that’s a lo­cal po­lice de­part­ment, pri­vate busi­ness or an­other in­sti­tu­tion — own their data and con­trol ac­cess to it. Once end users ac­cess that data, Flock’s own pri­vacy mea­sures don’t do much to help. That raises con­cerns about the se­cu­rity of lo­cal law en­force­ment sys­tems, each of which has its own data reg­u­la­tions and ac­count­abil­ity prac­tices.

You may have no­ticed a theme: Flock pro­vides pow­er­ful sur­veil­lance tech­nol­ogy, and the fi­nal re­sults are deeply in­flu­enced by how cus­tomers use it. That can be creepy at best, and an il­le­gal abuse of power at worst.

Since Flock Safety be­gan part­ner­ing with law en­force­ment, a grow­ing num­ber of of­fi­cers have been found abus­ing the sur­veil­lance sys­tem. In one in­stance, a Kansas po­lice chief used Flock cam­eras 164 times while track­ing an ex. In an­other case, a sher­iff in Texas lied about us­ing Flock to track a miss­ing per­son,” but was later found to be in­ves­ti­gat­ing a pos­si­ble abor­tion. In Georgia, a po­lice chief was ar­rested for us­ing Flock to stalk and ha­rass cit­i­zens. In Virginia, a man sued the city of Norfolk over pur­ported pri­vacy vi­o­la­tions and dis­cov­ered that Flock cam­eras had been used to track him 526 times, around four times per day.

Those are just a few ex­am­ples from a long list, giv­ing real sub­stance to wor­ries about a sur­veil­lance state and a lack of checks and bal­ances. When I asked Flock how its sys­tems pro­tect against abuse and over­reach, a spokesper­son re­ferred to its ac­count­abil­ity fea­ture, an au­dit­ing tool that records every search that a user of Flock con­ducts in the sys­tem.” Flock used this tool dur­ing the Georgia case above, which ul­ti­mately led to the ar­rest of the po­lice chief.

While po­lice search logs are of­ten tracked like this, re­ports in­di­cate that many au­thor­i­ties start searches with vague terms and cast a wide net us­ing terms like investigation,” crime” or a broad im­mi­gra­tion term like deportee” to gain ac­cess to as much data as pos­si­ble. While po­lice can’t avoid Flock’s au­dit logs, they can use gen­eral or dis­crim­i­na­tory terms — or skip fill­ing out fields en­tirely — to evade in­ves­ti­ga­tions and hide in­tent.

Regardless of the au­dit­ing tools, the onus is on lo­cal or­ga­ni­za­tions to man­age in­ves­ti­ga­tions, ac­count­abil­ity and trans­parency. That brings me to a par­tic­u­larly im­pact­ful cur­rent event.

ICE is the ele­phant in the room in my Flock guide. Does Flock share its sur­veil­lance data with fed­eral agen­cies such as ICE? Yes, the fed­eral gov­ern­ment fre­quently has ac­cess to that data, but how it gets ac­cess is im­por­tant.

Flock states on its web­site that it has not shared data or part­nered with ICE or any other Department of Homeland Security of­fi­cials since ter­mi­nat­ing its pi­lot pro­grams in August 2025. Flock says its fo­cus is now on lo­cal law en­force­ment, but that comes with a hands-off ap­proach that does­n’t con­trol what hap­pens to in­for­ma­tion down­stream.

Flock has no au­thor­ity to share data on our cus­tomers’ be­half, nor the au­thor­ity to dis­rupt their law en­force­ment op­er­a­tions,” the Flock spokesper­son told me. Local po­lice all over the coun­try col­lab­o­rate with fed­eral agen­cies for var­i­ous rea­sons, with or with­out Flock tech­nol­ogy.

That col­lab­o­ra­tion has grown more com­plex. As Democratic Senator Ron Wyden from Oregon stated in an open let­ter to Flock Safety, local” law en­force­ment is­n’t that lo­cal any­more, es­pe­cially when 75% of Flock’s law en­force­ment cus­tomers have en­rolled in the Na­tional Lookup Tool, which al­lows in­for­ma­tion shar­ing across the coun­try be­tween all par­tic­i­pants.

Flock has built a dan­ger­ous plat­form in which abuse of sur­veil­lance data is al­most cer­tain,” Wyden wrote. The com­pany has adopted a see-no-evil ap­proach of not proac­tively au­dit­ing the searches done by its law en­force­ment cus­tomers be­cause, as the com­pa­ny’s Chief Communications Officer told the press, It is not Flock’s job to po­lice the po­lice.’”

Police de­part­ment shar­ing is­n’t al­ways easy to track, but re­port­ing from 404 Media found that po­lice de­part­ments across the coun­try have been cre­at­ing Flock searches with rea­sons listed as immigration,” ICE,” or ICE war­rant,” among oth­ers. Again, since po­lice can put what­ever terms they want in these fields — de­pend­ing on lo­cal poli­cies — we don’t know for sure how com­mon it is to look up info for ICE.

Additionally, there’s not al­ways an of­fi­cial process or chain of ac­count­abil­ity for shar­ing this data. In Oregon, reports found that a po­lice de­part­ment was con­duct­ing Flock searches on be­half of ICE and the FBI via a sim­ple email thread.

When this kind of sur­veil­lance power is in malev­o­lent hands — and in the case of ICE, I feel com­fort­able say­ing a grow­ing num­ber of Americans view it as a bad ac­tor — these com­pa­nies are em­pow­er­ing ac­tions the pub­lic in­creas­ingly finds ob­jec­tion­able,” a lawyer with the ACLU told a Salt Lake City news out­let ear­lier this year.

With the myr­iad ways law en­force­ment shares Flock data with the fed­eral gov­ern­ment, it may seem like there’s not much you can do. But one pow­er­ful tool is ad­vo­cat­ing for new laws.

In the past two years, a grow­ing num­ber of state laws have been passed or pro­posed to ad­dress Flock Safety, li­cense plate read­ers and sur­veil­lance. Much of this leg­is­la­tion is bi­par­ti­san, or has been passed by both tra­di­tion­ally right- and left-lean­ing states, al­though some go fur­ther than oth­ers.

When I con­tacted the ACLU to learn what leg­is­la­tion is most ef­fec­tive in sit­u­a­tions like this, Chad Marlow, se­nior pol­icy coun­sel and lead on the ACLUs ad­vo­cacy work for Flock and re­lated sur­veil­lance, gave sev­eral ex­am­ples.

I would limit the al­lowed uses for ALPR,” Marlow told me. While some uses, like for toll col­lec­tion and Amber Alerts, with the right guardrails in place, are not par­tic­u­larly prob­lem­atic, some ALPRs are used to tar­get com­mu­ni­ties of color and low-in­come com­mu­ni­ties for fine/​fee en­force­ment and for mi­nor crime en­force­ment, which can ex­ac­er­bate ex­ist­ing polic­ing in­equities.”

This type of harm­ful ALPR tar­get­ing is typ­i­cally used to both op­press mi­nori­ties and bring in a greater num­ber of fees for lo­cal law or­ga­ni­za­tions — prob­lems that ex­isted long be­fore AI recog­ni­tion cam­era, but have been ex­ac­er­bated by the tech­nol­ogy.

New leg­is­la­tion can help, but it needs to be care­fully crafted. The most ef­fec­tive laws fall into two cat­e­gories. The first is re­quir­ing any col­lected ALPR or re­lated data to be deleted within a cer­tain time frame — the shorter, the bet­ter. New Hampshire wins here with a 3-minute rule.

For states that want a lit­tle more time to see if cap­tured ALPR data is rel­e­vant to an on­go­ing in­ves­ti­ga­tion, keep­ing the data for a few days is suf­fi­cient,” Marlow said. Some states, like Washington and Virginia, re­cently adopted 21-day lim­its, which is the very out­er­most ac­cept­able limit.”

The sec­ond type of promis­ing law makes it il­le­gal to share ALPR and sim­i­lar data out­side the state (such as with ICE) and has been passed by states like Virginia, Illinois and California.

Ideally, no data should be shared out­side the col­lect­ing agency with­out a war­rant,” Marlow said. But some states have cho­sen to pro­hibit data shar­ing out­side of the state, which is bet­ter than noth­ing, and does limit some risks.”

Vermont, mean­while, re­quires a strict ap­proval process for ALPRs that, by 2025, left no law en­force­ment agency in the state us­ing li­cense cams.

But what hap­pens if po­lice choose to ig­nore laws and con­tinue us­ing Flock as they see fit? That’s al­ready hap­pened. In California, for ex­am­ple, po­lice in Los Angeles and San Diego were found shar­ing in­for­ma­tion with Homeland Security in 2025, in vi­o­la­tion of a state law that bans or­ga­ni­za­tions from shar­ing li­cense plate data out of state.

When this hap­pens, the re­course is typ­i­cally a law­suit, ei­ther from the state at­tor­ney gen­eral or a class ac­tion by the com­mu­nity, both of which are on­go­ing in California in 2026. But what should peo­ple do while leg­is­la­tion and law­suits pro­ceed?

Marlow ac­knowl­edged that in­di­vid­u­als can’t do much about Flock sur­veil­lance with­out bans or leg­is­la­tion.

Flock iden­ti­fies and tracks your ve­hi­cle by scan­ning its li­cense plate, and cov­er­ing your li­cense plate is il­le­gal, so that is not an op­tion,” he told me.

However, Marlow sug­gested mi­nor changes that could make a dif­fer­ence for those who are se­ri­ously wor­ried. When peo­ple are trav­el­ing to sen­si­tive lo­ca­tions, they could take pub­lic trans­porta­tion and pay with cash (credit cards can be tracked, as can share-a-rides) or get a lift from a friend, but those aren’t re­ally prac­ti­cal on an every­day ba­sis.”

Ditching or re­strict­ing Flock Safety is one way com­mu­ni­ties are fight­ing back against what they con­sider to be un­nec­es­sary sur­veil­lance with the po­ten­tial for abuse. But AI sur­veil­lance does­n’t be­gin or end with one com­pany.

Flock Safety is an in­ter­me­di­ary that pro­vides tech­nol­ogy in de­mand by pow­er­ful or­ga­ni­za­tions. It’s hardly the only one with these kinds of high-tech eyes — it’s just one of the first to en­ter the mar­ket at a na­tional level. If Flock were gone, an­other com­pany would likely step in to fill the gap, un­less re­stricted by law.

As Flock’s in­te­gra­tion with other apps and cam­eras be­comes more com­plex, it’s go­ing to be harder to tell where Flock ends and an­other so­lu­tion be­gins, even with­out ri­val com­pa­nies show­ing up with the lat­est AI track­ing.

But ri­vals are show­ing up, from Shield AI for mil­i­tary in­tel­li­gence to com­mer­cial ap­pli­ca­tions by com­pa­nies like Ambient.ai, Verkada’s AI se­cu­rity searches and the in­fa­mous in­tel­li­gence firm Palantir, all look­ing for ways to in­te­grate and ex­pand. Motorola, in par­tic­u­lar, is in on the ac­tion with its VehicleManager plat­form.

The first step is be­ing aware, in­clud­ing know­ing which new cam­eras your city is in­stalling and which soft­ware part­ner­ships your lo­cal law en­force­ment has. If you don’t like what you dis­cover, find ways to par­tic­i­pate in the de­ci­sion-mak­ing process, like at­tend­ing open city coun­cil meet­ings on Flock, as in Bend.

On a broader level, keep track of the leg­is­la­tion your state is con­sid­er­ing re­gard­ing Flock and sim­i­lar sur­veil­lance con­tracts and op­er­a­tions, as these will have the great­est long-term im­pact. Blocking data from be­ing shared out of state and re­quir­ing po­lice to delete sur­veil­lance ASAP are par­tic­u­larly im­por­tant steps. You can con­tact your state sen­a­tors and rep­re­sen­ta­tives to en­cour­age leg­is­la­tion like this.

When you’re won­der­ing what to share with politi­cians, I rec­om­mend some­thing like what Marlow told me: The idea of keep­ing a lo­ca­tion dossier on every sin­gle per­son just in case one of us turns out to be a crim­i­nal is just about the most un-Amer­i­can ap­proach to pri­vacy I can imag­ine.”

You can also sign up for and do­nate to pro­jects that are ad­dress­ing Flock con­cerns, such as The Plate Privacy Project from The Institute for Justice. I’m cur­rently talk­ing to them about the lat­est events, and I’ll up­date if they have any ad­di­tional tips for us.

Keep fol­low­ing CNET home se­cu­rity, where I break down the lat­est news you should know, like pri­vacy set­tings to turn on, se­cu­rity cam­era set­tings you may want to turn off and how sur­veil­lance in­ter­sects with our daily lives. Things are chang­ing fast, but we’re stay­ing on top of it.

...

Read the original on www.cnet.com »

7 414 shares, 174 trendiness

Porting Mac OS X to the Nintendo Wii

Since its launch in 2007, the Wii has seen sev­eral op­er­at­ing sys­tems ported to it: Linux, NetBSD, and most-re­cently, Windows NT. Today, Mac OS X joins that list.

In this post, I’ll share how I ported the first ver­sion of Mac OS X, 10.0 Cheetah, to the Nintendo Wii. If you’re not an op­er­at­ing sys­tems ex­pert or low-level en­gi­neer, you’re in good com­pany; this pro­ject was all about learn­ing and nav­i­gat­ing count­less unknown un­knowns”. Join me as we ex­plore the Wii’s hard­ware, boot­loader de­vel­op­ment, ker­nel patch­ing, and writ­ing dri­vers - and give the PowerPC ver­sions of Mac OS X a new life on the Nintendo Wii.

Visit the wi­iMac boot­loader repos­i­tory for in­struc­tions on how to try this pro­ject your­self.

Before fig­ur­ing out how to tackle this pro­ject, I needed to know whether it would even be pos­si­ble. According to a 2021 Reddit com­ment:

There is a zero per­cent chance of this ever hap­pen­ing.

Feeling en­cour­aged, I started with the ba­sics: what hard­ware is in the Wii, and how does it com­pare to the hard­ware used in real Macs from the era.

The Wii uses a PowerPC 750CL proces­sor - an evo­lu­tion of the PowerPC 750CXe that was used in G3 iBooks and some G3 iMacs. Given this close lin­eage, I felt con­fi­dent that the CPU would­n’t be a blocker.

As for RAM, the Wii has a unique con­fig­u­ra­tion: 88 MB to­tal, split across 24 MB of 1T-SRAM (MEM1) and 64 MB of slower GDDR3 SDRAM (MEM2); un­con­ven­tional, but tech­ni­cally enough for Mac OS X Cheetah, which of­fi­cially calls for 128 MB of RAM but will un­of­fi­cially boot with less. To be safe, I used QEMU to boot Cheetah with 64 MB of RAM and ver­i­fied that there were no is­sues.

Other hard­ware I’d even­tu­ally need to sup­port in­cluded:

* The SD card for boot­ing the rest of the sys­tem once the ker­nel was run­ning

* Video out­put via a frame­buffer that lives in RAM

* The Wii’s USB ports for us­ing a mouse and key­board

Convinced that the Wii’s hard­ware was­n’t fun­da­men­tally in­com­pat­i­ble with Mac OS X, I moved my at­ten­tion to in­ves­ti­gat­ing the soft­ware stack I’d be port­ing.

Mac OS X has an open source core (Darwin, with XNU as the ker­nel and IOKit as the dri­ver model), with closed-source com­po­nents lay­ered on top (Quartz, Dock, Finder, sys­tem apps and frame­works). In the­ory, if I could mod­ify the open-source parts enough to get Darwin run­ning, the closed-source parts would run with­out ad­di­tional patches.

Porting Mac OS X would also re­quire un­der­stand­ing how a real Mac boots. PowerPC Macs from the early 2000s use Open Firmware as their low­est-level soft­ware en­vi­ron­ment; for sim­plic­ity, it can be thought of as the first code that runs when a Mac is pow­ered on. Open Firmware has sev­eral re­spon­si­bil­i­ties, in­clud­ing:

* Providing use­ful func­tions for I/O, draw­ing, and hard­ware com­mu­ni­ca­tion

* Loading and ex­e­cut­ing an op­er­at­ing sys­tem boot­loader from the filesys­tem

Open Firmware even­tu­ally hands off con­trol to BootX, the boot­loader for Mac OS X. BootX pre­pares the sys­tem so that it can even­tu­ally pass con­trol to the ker­nel. The re­spon­si­bil­i­ties of BootX in­clude:

* Loading and de­cod­ing the XNU ker­nel, a Mach-O ex­e­cutable, from the root filesys­tem

Once XNU is run­ning, there are no de­pen­den­cies on BootX or Open Firmware. XNU con­tin­ues on to ini­tial­ize proces­sors, vir­tual mem­ory, IOKit, BSD, and even­tu­ally con­tinue boot­ing by load­ing and run­ning other ex­e­cuta­bles from the root filesys­tem.

The last piece of the puz­zle was how to run my own cus­tom code on the Wii - a triv­ial task thanks to the Wii be­ing jailbroken”, al­low­ing any­one to run home­brew with full ac­cess to the hard­ware via the Homebrew Channel and BootMii.

Armed with knowl­edge of how the boot process works on a real Mac, along with how to run low-level code on the Wii, I needed to se­lect an ap­proach for boot­ing Mac OS X on the Wii. I eval­u­ated three op­tions:

Port Open Firmware, use that to run un­mod­i­fied BootX to boot Mac OS X

Port BootX and mod­ify it to not rely on Open Firmware, use that to boot Mac OS X

Write a cus­tom boot­loader that per­forms the bare-min­i­mum setup to boot Mac OS X

Since Mac OS X does­n’t de­pend on Open Firmware or BootX once run­ning, spend­ing time port­ing ei­ther of those seemed like an un­nec­es­sary dis­trac­tion. Additionally, both Open Firmware and BootX con­tain added com­plex­ity for sup­port­ing many dif­fer­ent hard­ware con­fig­u­ra­tions - com­plex­ity that I would­n’t need since this only needs to run on the Wii. Following in the foot­steps of the Wii Linux pro­ject, I de­cided to write my own boot­loader from scratch. The boot­loader would need to, at a min­i­mum:

* Load the ker­nel from the SD card

Once the ker­nel was run­ning, none of the boot­loader code would mat­ter. At that point, my fo­cus would shift to patch­ing the ker­nel and writ­ing dri­vers.

I de­cided to base my boot­loader on some low-level ex­am­ple code for the Wii called ppcskel. ppcskel puts the sys­tem into a sane ini­tial state, and pro­vides use­ful func­tions for com­mon things like read­ing files from the SD card, draw­ing text to the frame­buffer, and log­ging de­bug mes­sages to a USB Gecko.

Next, I had to fig­ure out how to load the XNU ker­nel into mem­ory so that I could pass con­trol to it. The ker­nel is stored in a spe­cial bi­nary for­mat called Mach-O, and needs to be prop­erly de­coded be­fore be­ing used.

The Mach-O ex­e­cutable for­mat is well-doc­u­mented, and can be thought of as a list of load com­mands that tell the loader where to place dif­fer­ent sec­tions of the bi­nary file in mem­ory. For ex­am­ple, a load com­mand might in­struct the loader to read the data from file off­set 0x2cf000 and store it at the mem­ory ad­dress 0x2e0000. After pro­cess­ing all of the ker­nel’s load com­mands, we end up with this mem­ory lay­out:

The ker­nel file also spec­i­fies the mem­ory ad­dress where ex­e­cu­tion should be­gin. Once the boot­loader jumps to this ad­dress, the ker­nel is in full con­trol and the boot­loader is no longer run­ning.

To jump to the ker­nel-en­try-point’s mem­ory ad­dress, I needed to cast the ad­dress to a func­tion and call it:

After this code ran, the screen went black and my de­bug logs stopped ar­riv­ing via the se­r­ial de­bug con­nec­tion - while an­ti­cli­mac­tic, this was an in­di­ca­tor that the ker­nel was run­ning.

The ques­tion then be­came: how far was I mak­ing it into the boot process? To an­swer this, I had to start look­ing at XNU source code. The first code that runs is a PowerPC as­sem­bly _start rou­tine. This code re­con­fig­ures the hard­ware, over­rid­ing all of the Wii-specific setup that the boot­loader per­formed and, in the process, dis­ables boot­loader func­tion­al­ity for se­r­ial de­bug­ging and video out­put. Without nor­mal de­bug-out­put fa­cil­i­ties, I’d need to track progress a dif­fer­ent way.

The ap­proach that I came up with was a bit of a hack: bi­nary-patch the ker­nel, re­plac­ing in­struc­tions with ones that il­lu­mi­nate one of the front-panel LEDs on the Wii. If the LED il­lu­mi­nated af­ter jump­ing to the ker­nel, then I’d know that the ker­nel was mak­ing it at least that far. Turning on one of these LEDs is as sim­ple as writ­ing a value to a spe­cific mem­ory ad­dress. In PowerPC as­sem­bly, those in­struc­tions are:

lis r5, 0xd80  ; load up­per half of 0x0D8000C0 into r5

ori r5, r5, 0xc0  ; load lower half of 0x0D8000C0 into r5

lwz r4, (r5)  ; read the 32-bit value from 0x0D8000C0

sync  ; mem­ory bar­rier

xori r4, r4, 0x20  ; tog­gle bit 5

stw r4, (r5)  ; write the value back to 0x0D8000C0

To know which parts of the ker­nel to patch, I cross-ref­er­enced func­tion names in XNU source code with func­tion off­sets in the com­piled ker­nel bi­nary, us­ing Hopper Disassembler to make the process eas­ier. Once I iden­ti­fied the cor­rect off­set in the bi­nary that cor­re­sponded to the code I wanted to patch, I just needed to re­place the ex­ist­ing in­struc­tions at that off­set with the ones to blink the LED.

To make this patch­ing process eas­ier, I added some code to the boot­loader to patch the ker­nel bi­nary on the fly, en­abling me to try dif­fer­ent off­sets with­out man­u­ally mod­i­fy­ing the ker­nel file on disk.

After trac­ing through many ker­nel startup rou­tines, I even­tu­ally mapped out this path of ex­e­cu­tion:

This was an ex­cit­ing mile­stone - the ker­nel was def­i­nitely run­ning, and I had even made it into some higher-level C code. To make it past the 300 ex­cep­tion crash, the boot­loader would need to pass a pointer to a valid de­vice tree.

The de­vice tree is a data struc­ture rep­re­sent­ing all of the hard­ware in the sys­tem that should be ex­posed to the op­er­at­ing sys­tem. As the name sug­gests, it’s a tree made up of nodes, each ca­pa­ble of hold­ing prop­er­ties and ref­er­ences to child nodes.

On real Mac com­put­ers, the boot­loader scans the hard­ware and con­structs a de­vice tree based on what it finds. Since the Wii’s hard­ware is al­ways the same, this scan­ning step can be skipped. I ended up hard-cod­ing the de­vice tree in the boot­loader, tak­ing in­spi­ra­tion from the de­vice tree that the Wii Linux pro­ject uses.

Since I was­n’t sure how much of the Wii’s hard­ware I’d need to sup­port in or­der to get the boot process fur­ther along, I started with a min­i­mal de­vice tree: a root node with chil­dren for the cpus and mem­ory:

My plan was to ex­pand the de­vice tree with more pieces of hard­ware as I got fur­ther along in the boot process - even­tu­ally con­struct­ing a com­plete rep­re­sen­ta­tion of all of the Wii’s hard­ware that I planned to sup­port in Mac OS X.

Once I had a de­vice tree cre­ated and stored in mem­ory, I needed to pass it to the ker­nel as part of boot_args:

With the de­vice tree in mem­ory, I had made it past the de­vice_tree.c crash. The boot­loader was per­form­ing the ba­sics well: load­ing the ker­nel, cre­at­ing boot ar­gu­ments and a de­vice tree, and ul­ti­mately, call­ing the ker­nel. To make ad­di­tional progress, I’d need to shift my at­ten­tion to­ward patch­ing the ker­nel source code to fix re­main­ing com­pat­i­bil­ity is­sues.

At this point, the ker­nel was get­ting stuck while run­ning some code to set up video and I/O mem­ory. XNU from this era makes as­sump­tions about where video and I/O mem­ory can be, and re­con­fig­ures Block Address Translations (BATs) in a way that does­n’t play nicely with the Wii’s mem­ory lay­out (MEM1 start­ing at 0x00000000, MEM2 start­ing at 0x10000000). To work around these lim­i­ta­tions, it was time to mod­ify the ker­nel’s source code and boot a mod­i­fied ker­nel bi­nary.

Figuring out a sane de­vel­op­ment en­vi­ron­ment to build an OS ker­nel from 25 years ago took some ef­fort. Here’s what I landed on:

* XNU source code lives on the host’s filesys­tem, and is ex­posed via an NFS server

* The guest ac­cesses the XNU source via an NFS mount

* The host uses SSH to con­trol the guest

* Edit XNU source on host, kick off a build via SSH on the guest, build ar­ti­facts end up on the filesys­tem ac­ces­si­ble by host and guest

To set up the de­pen­den­cies needed to build the Mac OS X Cheetah ker­nel on the Mac OS X Cheetah guest, I fol­lowed the in­struc­tions here. They mostly matched up with what I needed to do. Relevant sources are avail­able from Apple here.

After fix­ing the BAT setup and adding some small patches to reroute con­sole out­put to my USB Gecko, I now had video out­put and se­r­ial de­bug logs work­ing - mak­ing fu­ture de­vel­op­ment and de­bug­ging sig­nif­i­cantly eas­ier. Thanks to this new vis­i­bil­ity into what was go­ing on, I could see that the vir­tual mem­ory, IOKit, and BSD sub­sys­tems were all ini­tial­ized and run­ning - with­out crash­ing. This was a sig­nif­i­cant mile­stone, and gave me con­fi­dence that I was on the right path to get­ting a full sys­tem work­ing.

Readers who have at­tempted to run Mac OS X on a PC via hackintoshing” may rec­og­nize the last line in the boot logs: the dreaded Still wait­ing for root de­vice”. This oc­curs when the sys­tem can’t find a root filesys­tem from which to con­tinue boot­ing. In my case, this was ex­pected: the ker­nel had done all it could and was ready to load the rest of the Mac OS X sys­tem from the filesys­tem, but it did­n’t know where to lo­cate this filesys­tem. To make progress, I would need to tell the ker­nel how to read from the Wii’s SD card. To do this, I’d need to tackle the next phase of this pro­ject: writ­ing dri­vers.

Mac OS X dri­vers are built us­ing IOKit - a col­lec­tion of soft­ware com­po­nents that aim to make it easy to ex­tend the ker­nel to sup­port dif­fer­ent hard­ware de­vices. Drivers are writ­ten us­ing a sub­set of C++, and make ex­ten­sive use of ob­ject-ori­ented pro­gram­ming con­cepts like in­her­i­tance and com­po­si­tion. Many pieces of use­ful func­tion­al­ity are pro­vided, in­clud­ing:

* Base classes and families” that im­ple­ment com­mon be­hav­ior for dif­fer­ent types of hard­ware

* Probing and match­ing dri­vers to hard­ware pre­sent in the de­vice tree

In IOKit, there are two kinds of dri­vers: a spe­cific de­vice dri­ver and a nub. A spe­cific de­vice dri­ver is an ob­ject that man­ages a spe­cific piece of hard­ware. A nub is an ob­ject that serves as an at­tach-point for a spe­cific de­vice dri­ver, and also pro­vides the abil­ity for that at­tached dri­ver to com­mu­ni­cate with the dri­ver that cre­ated the nub. It’s this chain of dri­ver-to-nub-to-dri­ver that cre­ates the afore­men­tioned provider-client re­la­tion­ships. I strug­gled for a while to grasp this con­cept, and found a con­crete ex­am­ple use­ful.

Real Macs can have a PCI bus with sev­eral PCI ports. In this ex­am­ple, con­sider an eth­er­net card be­ing plugged into one of the PCI ports. A dri­ver, IOPCIBridge, han­dles com­mu­ni­cat­ing with the PCI bus hard­ware on the moth­er­board. This dri­ver scans the bus, cre­at­ing IOPCIDevice nubs (attach-points) for each plugged-in de­vice that it finds. A hy­po­thet­i­cal dri­ver for the plugged-in eth­er­net card (let’s call it SomeEthernetCard) can at­tach to the nub, us­ing it as its proxy to call into PCI func­tion­al­ity pro­vided by the IOPCIBridge dri­ver on the other side. The SomeEthernetCard dri­ver can also cre­ate its own IOEthernetInterface nubs so that higher-level parts of the IOKit net­work­ing stack can at­tach to it.

Someone de­vel­op­ing a PCI eth­er­net card dri­ver would only need to write SomeEthernetCard; the lower-level PCI bus com­mu­ni­ca­tion and the higher-level net­work­ing stack code is all pro­vided by ex­ist­ing IOKit dri­ver fam­i­lies. As long as SomeEthernetCard can at­tach to an IOPCIDevice nub and pub­lish its own IOEthernetInterface nubs, it can sand­wich it­self be­tween two ex­ist­ing fam­i­lies in the dri­ver stack, ben­e­fit­ing from all of the func­tion­al­ity pro­vided by IOPCIFamily while also sat­is­fy­ing the needs of IONetworkingFamily.

Unlike Macs from the same era, the Wii does­n’t use PCI to con­nect its var­i­ous pieces of hard­ware to its moth­er­board. Instead, it uses a cus­tom sys­tem-on-a-chip (SoC) called the Hollywood. Through the Hollywood, many pieces of hard­ware can be ac­cessed: the GPU, SD card, WiFi, Bluetooth, in­ter­rupt con­trollers, USB ports, and more. The Hollywood also con­tains an ARM co­proces­sor, nick­named the Starlet, that ex­poses hard­ware func­tion­al­ity to the main PowerPC proces­sor via in­ter-proces­sor-com­mu­ni­ca­tion (IPC).

This unique hard­ware lay­out and com­mu­ni­ca­tion pro­to­col meant that I could­n’t piggy-back off of an ex­ist­ing IOKit dri­ver fam­ily like IOPCIFamily. Instead, I would need to im­ple­ment an equiv­a­lent dri­ver for the Hollywood SoC, cre­at­ing nubs that rep­re­sent at­tach-points for all of the hard­ware it con­tains. I landed on this lay­out of dri­vers and nubs (note that this is only show­ing a sub­set of the dri­vers that had to be writ­ten):

Now that I had a bet­ter idea of how to rep­re­sent the Wii’s hard­ware in IOKit, I be­gan work on my Hollywood dri­ver.

I started by cre­at­ing a new C++ header and im­ple­men­ta­tion file for a NintendoWiiHollywood dri­ver. Its dri­ver personality” en­abled it to be matched to a node in the de­vice tree with the name hollywood”`. Once the dri­ver was matched and run­ning, it was time to pub­lish nubs for all of its child de­vices.

Once again lean­ing on the de­vice tree as the source of truth for what hard­ware lives un­der the Hollywood, I it­er­ated through all of the Hollywood node’s chil­dren, cre­at­ing and pub­lish­ing NintendoWiiHollywoodDevice nubs for each:

Once NintendoWiiHollywoodDevice nubs were cre­ated and pub­lished, the sys­tem would be able to have other de­vice dri­vers, like an SD card dri­ver, at­tach to them.

Next, I moved on to writ­ing a dri­ver to en­able the sys­tem to read and write from the Wii’s SD card. This dri­ver is what would en­able the sys­tem to con­tinue boot­ing, since it was cur­rently stuck look­ing for a root filesys­tem from which to load ad­di­tional startup files.

I be­gan by sub­class­ing IOBlockStorageDevice, which has many ab­stract meth­ods in­tended to be im­ple­mented by sub­classers:

For most of these meth­ods, I could im­ple­ment them with hard-coded val­ues that matched the Wii’s SD card hard­ware; ven­dor string, block size, max read and write trans­fer size, ejectabil­ity, and many oth­ers all re­turn con­stant val­ues, and were triv­ial to im­ple­ment.

The more in­ter­est­ing meth­ods to im­ple­ment were the ones that needed to ac­tu­ally com­mu­ni­cate with the cur­rently-in­serted SD card: get­ting the ca­pac­ity of the SD card, read­ing from the SD card, and writ­ing to the SD card:

To com­mu­ni­cate with the SD card, I uti­lized the IPC func­tion­al­ity pro­vided by MINI run­ning on the Starlet co-proces­sor. By writ­ing data to cer­tain re­served mem­ory ad­dresses, the SD card dri­ver was able to is­sue com­mands to MINI. MINI would then ex­e­cute those com­mands, com­mu­ni­cat­ing back any re­sult data by writ­ing to a dif­fer­ent re­served mem­ory ad­dress that the dri­ver could mon­i­tor.

MINI sup­ports many use­ful com­mand types. The ones used by the SD card dri­ver are:

* IPC_SDMMC_SIZE: Returns the num­ber of sec­tors on the cur­rently-in­serted SD card

With these three com­mand types, reads, writes, and ca­pac­ity-checks could all be im­ple­mented, en­abling me to sat­isfy the core re­quire­ments of the block stor­age de­vice sub­class.

Like with most pro­gram­ming en­de­vours, things rarely work on the first try. To in­ves­ti­gate is­sues, my pri­mary de­bug­ging tool was send­ing log mes­sages to the se­r­ial de­bug­ger via calls to IOLog. With this tech­nique, I was able to see which meth­ods were be­ing called on my dri­ver, what val­ues were be­ing passed in, and what val­ues my IPC im­ple­men­ta­tion was send­ing to and re­ceiv­ing from MINI - but I had no abil­ity to set break­points or an­a­lyze ex­e­cu­tion dy­nam­i­cally while the ker­nel was run­ning.

One of the trick­ier bugs that I en­coun­tered had to do with cached mem­ory. When the SD card dri­ver wants to read from the SD card, the com­mand it is­sues to MINI (running on the ARM CPU) in­cludes a mem­ory ad­dress at which to store any loaded data. After MINI fin­ishes writ­ing to mem­ory, the SD card dri­ver (running on the PowerPC CPU) might not be able to see the up­dated con­tents if that re­gion is mapped as cacheable. In that case, the PowerPC will read from its cache lines rather than RAM, re­turn­ing stale data in­stead of the newly loaded con­tents. To work around this, the SD card dri­ver must use un­cached mem­ory for its buffers.

After sev­eral days of bug-fix­ing, I reached a new mile­stone: IOBlockStorageDriver, which at­tached to my SD card dri­ver, had started pub­lish­ing IOMedia nubs rep­re­sent­ing the log­i­cal par­ti­tions pre­sent on the SD. Through these nubs, higher-level parts of the sys­tem were able to at­tach and be­gin us­ing the SD card. Importantly, the sys­tem was now able to find a root filesys­tem from which to con­tinue boot­ing, and I was no longer stuck at Still wait­ing for root de­vice”:

My boot logs now looked like this:

After some more rounds of bug fixes (while on the go), I was able to boot past sin­gle-user mode:

And even­tu­ally, make it through the en­tire ver­bose-mode startup se­quence, which ends with the mes­sage: Startup com­plete”:

At this point, the sys­tem was try­ing to find a frame­buffer dri­ver so that the Mac OS X GUI could be shown. As in­di­cated in the logs, WindowServer was not happy - to fix this, I’d need to write my own frame­buffer dri­ver.

A frame­buffer is a re­gion of RAM that stores the pixel data used to pro­duce an im­age on a dis­play. This data is typ­i­cally made up of color com­po­nent val­ues for each pixel. To change what’s dis­played, new pixel data is writ­ten into the frame­buffer, which is then shown the next time the dis­play re­freshes. For the Wii, the frame­buffer usu­ally lives some­where in MEM1 due to it be­ing slightly faster than MEM2. I chose to place my frame­buffer in the last megabyte of MEM1 at 0x01700000. At 640x480 res­o­lu­tion, and 16 bits per pixel, the pixel data for the frame­buffer fit com­fort­ably in less than one megabyte of mem­ory.

Early in the boot process, Mac OS X uses the boot­loader-pro­vided frame­buffer ad­dress to dis­play sim­ple boot graph­ics via video_­con­sole.c. In the case of a ver­bose-mode boot, font-char­ac­ter bitmaps are writ­ten into the frame­buffer to pro­duce a vi­sual log of what’s hap­pen­ing while start­ing up. Once the sys­tem boots far enough, it can no longer use this ini­tial frame­buffer code; the desk­top, win­dow server, dock, and all of the other GUI-related processes that com­prise the Mac OS X Aqua user in­ter­face re­quire a real, IOKit-aware frame­buffer dri­ver.

To tackle this next dri­ver, I sub­classed IOFramebuffer. Similar to sub­class­ing IOBlockStorageDevice for the SD card dri­ver, IOFramebuffer also had sev­eral ab­stract meth­ods for my frame­buffer sub­class to im­ple­ment:

Once again, most of these were triv­ial to im­ple­ment, and sim­ply re­quired re­turn­ing hard-coded Wii-compatible val­ues that ac­cu­rately de­scribed the hard­ware. One of the most im­por­tant meth­ods to im­ple­ment is getA­per­tur­eRange, which re­turns an IODeviceMemory in­stance whose base ad­dress and size de­scribe the lo­ca­tion of the frame­buffer in mem­ory:

After re­turn­ing the cor­rect de­vice mem­ory in­stance from this method, the sys­tem was able to tran­si­tion from the early-boot text-out­put frame­buffer, to a frame­buffer ca­pa­ble of dis­play­ing the full Mac OS X GUI. I was even able to boot the Mac OS X in­staller:

Readers with a keen eye might no­tice some is­sues:

* The ver­bose-mode text frame­buffer is still ac­tive, caus­ing text to be dis­played and the frame­buffer to be scrolled

The fix for the early-boot video con­sole still writ­ing text out­put to the frame­buffer was sim­ple: tell the sys­tem that our new, IOKit frame­buffer is the same as the one that was pre­vi­ously in use by re­turn­ing true from is­Con­soleDe­vice:

The fix for the in­cor­rect col­ors was much more in­volved, as it re­lates to a fun­da­men­tal in­com­pat­i­bil­ity be­tween the Wii’s video hard­ware and the graph­ics code that Mac OS X uses.

The Nintendo Wii’s video en­coder hard­ware is op­ti­mized for ana­logue TV sig­nal out­put, and as a re­sult, ex­pects 16-bit YUV pixel data in its frame­buffer. This is a prob­lem, since Mac OS X ex­pects the frame­buffer to con­tain RGB pixel data. If the frame­buffer that the Wii dis­plays con­tains non-YUV pixel data, then col­ors will be com­pletely wrong.

To work around this in­com­pat­i­bil­ity, I took in­spi­ra­tion from the Wii Linux pro­ject, which had solved this prob­lem many years ago. The strat­egy is to use two frame­buffers: an RGB frame­buffer that Mac OS X in­ter­acts with, and a YUV frame­buffer that the Wii’s video hard­ware out­puts to the at­tached dis­play. 60 times per sec­ond, the frame­buffer dri­ver con­verts the pixel data in the RGB frame­buffer to YUV pixel data, plac­ing the con­verted data in the frame­buffer that the Wii’s video hard­ware dis­plays:

After im­ple­ment­ing the dual-frame­buffer strat­egy, I was able to boot into a cor­rectly-col­ored Mac OS X sys­tem - for the first time, Mac OS X was run­ning on a Nintendo Wii:

...

Read the original on bryankeller.github.io »

8 366 shares, 13 trendiness

When Is Technology Too Dangerous to Release to the Public?

Last week, the non­profit re­search group OpenAI re­vealed that it had de­vel­oped a new text-gen­er­a­tion model that can write co­her­ent, ver­sa­tile prose given a cer­tain sub­ject mat­ter prompt. However, the or­ga­ni­za­tion said, it would not be re­leas­ing the full al­go­rithm due to safety and se­cu­rity con­cerns.”

Instead, OpenAI de­cided to re­lease a much smaller” ver­sion of the model and with­hold the data sets and train­ing codes that were used to de­velop it. If your knowl­edge of the model, called GPT-2, came solely on head­lines from the re­sult­ing news cov­er­age, you might think that OpenAI had built a weapons-grade chat­bot. A head­line from Metro U. K. read, Elon Musk-Founded OpenAI Builds Artificial Intelligence So Powerful That It Must Be Kept Locked Up for the Good of Humanity.” Another from CNET re­ported, Musk-Backed AI Group: Our Text Generator Is So Good It’s Scary.” A col­umn from the Guardian was ti­tled, ap­par­ently with­out irony, AI Can Write Just Like Me. Brace for the Robot Apocalypse.”

That sounds alarm­ing. Experts in the ma­chine learn­ing field, how­ever, are de­bat­ing whether OpenAI’s claims may have been a bit ex­ag­ger­ated. The an­nounce­ment has also sparked a de­bate about how to han­dle the pro­lif­er­a­tion of po­ten­tially dan­ger­ous A. I. al­go­rithms.

OpenAI is a pi­o­neer in ar­ti­fi­cial in­tel­li­gence re­search that was ini­tially funded by ti­tans like SpaceX and Tesla founder Elon Musk, ven­ture cap­i­tal­ist Peter Thiel, and LinkedIn co-founder Reid Hoffman. The non­prof­it’s mis­sion is to guide A. I. de­vel­op­ment re­spon­si­bly, away from abu­sive and harm­ful ap­pli­ca­tions. Besides text gen­er­a­tion, OpenAI has also de­vel­oped a ro­botic hand that can teach it­self sim­ple tasks, sys­tems that can beat pro play­ers of the strat­egy video game Dota 2, and al­go­rithms that can in­cor­po­rate hu­man in­put into their learn­ing processes.

On Feb. 14, OpenAI an­nounced yet an­other feat of ma­chine learn­ing in­ge­nu­ity in a blog post de­tail­ing how its re­searchers had trained a lan­guage model us­ing text from 8 mil­lion web­pages to pre­dict the next word in a piece of writ­ing. The re­sult­ing al­go­rithm, ac­cord­ing to the non­profit, was stun­ning: It could [adapt] to the style and con­tent of the con­di­tion­ing text” and al­low users to generate re­al­is­tic and co­her­ent con­tin­u­a­tions about a topic of their choos­ing.” To demon­strate the feat, OpenAI pro­vided sam­ples of text that GPT-2 had pro­duced given a par­tic­u­lar hu­man-writ­ten prompt.

For ex­am­ple, re­searchers fed the gen­er­a­tor the fol­low­ing sce­nario:

In a shock­ing find­ing, sci­en­tist dis­cov­ered a herd of uni­corns liv­ing in a re­mote, pre­vi­ously un­ex­plored val­ley, in the Andes Mountains. Even more sur­pris­ing to the re­searchers was the fact that the uni­corns spoke per­fect English.

The sci­en­tist named the pop­u­la­tion, af­ter their dis­tinc­tive horn, Ovid’s Unicorn. These four-horned, sil­ver-white uni­corns were pre­vi­ously un­known to sci­ence. Now, af­ter al­most two cen­turies, the mys­tery of what sparked this odd phe­nom­e­non is fi­nally solved. Dr. Jorge Pérez, an evo­lu­tion­ary bi­ol­o­gist from the University of La Paz, and sev­eral com­pan­ions, were ex­plor­ing the Andes Mountains when they found a small val­ley, with no other an­i­mals or hu­mans. Pérez no­ticed that the val­ley had what ap­peared to be a nat­ural foun­tain, sur­rounded by two peaks of rock and sil­ver snow. Pérez and the oth­ers then ven­tured fur­ther into the val­ley. By the time we reached the top of one peak, the wa­ter looked blue, with some crys­tals on top,” said Pérez.

Other sam­ples ex­hib­ited GPT-2’s turns as a nov­el­ist writ­ing an­other bat­tle pas­sage of The Lord of the Rings, a colum­nist rail­ing against re­cy­cling, and a speech­writer com­pos­ing John F. Kennedy’s ad­dress to the American peo­ple in the wake of his hy­po­thet­i­cal res­ur­rec­tion as a cy­borg.

While re­searchers ad­mit that the al­go­rith­m’s prose can be a bit sloppy—it of­ten ram­bles, uses repet­i­tive lan­guage, can’t quite nail topic tran­si­tions, and in­ex­plic­a­bly men­tions fires hap­pen­ing un­der wa­ter”—Ope­nAI nev­er­the­less con­tends that GPT-2 is far more so­phis­ti­cated than any other text gen­er­a­tor that it has de­vel­oped. That’s a bit self-ref­er­en­tial, but most in the A. I. field seem to agree that GPT-2 is truly at the cut­ting edge of what’s cur­rently pos­si­ble with text gen­er­a­tion. Most A.I. tech is only equipped to han­dle spe­cific tasks and tends to fum­ble any­thing else out­side a very nar­row range. Training the GPT-2 al­go­rithm to adapt nim­bly to var­i­ous modes of writ­ing is a sig­nif­i­cant achieve­ment. The model also stands out from older text gen­er­a­tors in that it can dis­tin­guish be­tween mul­ti­ple de­f­i­n­i­tions of a sin­gle word based on con­text clues and has a deeper knowl­edge of more ob­scure us­ages. These en­hanced ca­pa­bil­i­ties al­low the al­go­rithm to com­pose longer and more co­her­ent pas­sages, which could be used to im­prove trans­la­tion ser­vices, chat­bots, and A.I. writ­ing as­sis­tants. That does­n’t mean it will nec­es­sar­ily rev­o­lu­tion­ize the field.

Nevertheless, OpenAI said that it would only be pub­lish­ing a much smaller ver­sion” of the model due to con­cerns that it could be abused. The blog post fret­ted that it could be used to gen­er­ate false news ar­ti­cles, im­per­son­ate peo­ple on­line, and gen­er­ally flood the in­ter­net with spam and vit­riol. While peo­ple can, of course, cre­ate such ma­li­cious con­tent them­selves, the im­ple­men­ta­tion of so­phis­ti­cated A. I. text gen­er­a­tion may aug­ment the scale at which it’s gen­er­ated. What GPT-2 lacks in el­e­gant prose stylings it could more than make up for in its pro­lifi­cacy.

Yet the pre­vail­ing no­tion among most A. I. ex­perts, in­clud­ing those at OpenAI, was that with­hold­ing the al­go­rithm is a stop­gap mea­sure at best. Plus, It’s not clear that there’s any, like, stun­ningly new tech­nique they [OpenAI] are us­ing. They’re just do­ing a good job of tak­ing the next step,” says Robert Frederking, the prin­ci­pal sys­tems sci­en­tist at Carnegie Mellon’s Language Technologies Institute. A lot of peo­ple are won­der­ing if you ac­tu­ally achieve any­thing by em­bar­go­ing your re­sults when every­body else can fig­ure out how to do it any­way.”

An en­tity with enough cap­i­tal and knowl­edge of A. I. re­search that’s al­ready out in the pub­lic could build a text gen­er­a­tor com­pa­ra­ble to GPT-2, even by rent­ing servers from Amazon Web Services. If OpenAI had re­leased the al­go­rithm, you per­haps would not have to spend as much time and com­put­ing power de­vel­op­ing your own text gen­er­a­tor. But the process by which it built the model is­n’t ex­actly a mys­tery. (OpenAI did not re­spond to Slate’s re­quests for com­ment by pub­li­ca­tion.)

Some in the ma­chine learn­ing com­mu­nity have ac­cused OpenAI of ex­ag­ger­at­ing the risks of its al­go­rithm for me­dia at­ten­tion and de­priv­ing aca­d­e­mics, who may not have the re­sources to build such a model them­selves, the op­por­tu­nity to con­duct re­search with GPT-2. However, David Bau, a re­searcher at MITs Computer Science and Artificial Intelligence Laboratory, sees this de­ci­sion more of a ges­ture in­tended to start a de­bate about ethics in A. I. One or­ga­ni­za­tion paus­ing one par­tic­u­lar pro­ject is­n’t re­ally go­ing to change any­thing long term,” says Bau. But OpenAI gets a lot of at­ten­tion for any­thing they do … and I think they should be ap­plauded for turn­ing a spot­light on this is­sue.”

It’s worth con­sid­er­ing, as OpenAI seems to be en­cour­ag­ing us to do, how re­searchers and so­ci­ety in gen­eral should ap­proach pow­er­ful A. I. mod­els. The dan­gers that come with the pro­lif­er­a­tion of A.I. won’t nec­es­sar­ily in­volve in­sub­or­di­nate killer ro­bots. Let’s say, hy­po­thet­i­cally, that OpenAI had man­aged to cre­ate a truly un­prece­dented text gen­er­a­tor that could be eas­ily down­loaded and op­er­ated by laypeo­ple on a mass scale. For John Bowers, a re­search as­so­ci­ate at the Berkman Klein Center, what to do next may come down to a cost-ben­e­fit cal­cu­lus. The fact of the mat­ter is that a lot of the cool stuff that we’re see­ing com­ing out of A.I. re­search can be weaponized in some form,” says Bowers.

In the case of in­creas­ingly so­phis­ti­cated text gen­er­a­tors, Bowers would press for re­leas­ing the al­go­rithms be­cause of their con­tri­bu­tions to the field of nat­ural lan­guage pro­cess­ing and prac­ti­cal uses, though he ac­knowl­edges that im­por­tant de­vel­op­ments in A. I. im­age recog­ni­tion could be lever­aged for in­va­sive sur­veil­lance. However, Bowers would lean away from try­ing to ad­vance and pro­lif­er­ate an A.I. tool like that used to make deep­fakes, which is of­ten used to graft im­ages of peo­ple’s faces onto pornog­ra­phy. To me, deep­fakes are a prime ex­am­ple of a tech­nol­ogy that has way more down­side than up­side.”

Bowers stresses, how­ever, that these are all judg­ment calls, which in part speaks to the cur­rent short­com­ings of the ma­chine learn­ing field that OpenAI is try­ing to high­light. A. I. is a very young field, one that in many ways has­n’t achieved ma­tu­rity in terms of how we think about the prod­ucts we’re build­ing and the bal­ance be­tween the harm they’ll do in the world and the good,” he says. Machine learn­ing prac­ti­tion­ers have not yet es­tab­lished many widely ac­cepted frame­works for con­sid­er­ing the eth­i­cal im­pli­ca­tions of cre­at­ing and re­leas­ing A.I.-enabled tech­nolo­gies.

If re­cent his­tory is any in­di­ca­tion, try­ing to sup­press or con­trol the pro­lif­er­a­tion of A. I. tools may also be a los­ing bat­tle. Even if there is a con­sen­sus around the ethics of dis­sem­i­nat­ing cer­tain al­go­rithms, it might not be enough to stop peo­ple who dis­agree.

Frederking says an anal­o­gous prece­dent to the cur­rent co­nun­drum with A. I. might be the pop­u­lar­iza­tion of con­sumer-level en­cryp­tion in the 1990s, when the gov­ern­ment re­peat­edly tried and failed to reg­u­late cryp­tog­ra­phy. In 1991, Joe Biden, then a sen­a­tor, in­tro­duced a bill man­dat­ing that tech com­pa­nies in­stall back doors that would al­low law en­force­ment to carry out war­rants to re­trieve voice, text, and other com­mu­ni­ca­tions from cus­tomers. Programmer Phil Zimmerman soon spoiled the scheme by de­vel­op­ing a tool called PGP, which en­crypted com­mu­ni­ca­tions so that they could only be read by the sender and re­ceiver. PGP soon en­joyed wide­spread adop­tion, un­der­cut­ting back doors ac­ces­si­ble to tech com­pa­nies and the gov­ern­ment. And as law­mak­ers were mulling fur­ther at­tempts to stem the adop­tion of strong en­cryp­tion ser­vices, the National Research Council con­cluded in a 1996 study that users could eas­ily and legally ob­tain those same ser­vices from coun­tries like Israel and Finland.

There’s a gen­eral phi­los­o­phy that when the time has come for some sci­en­tific progress to hap­pen, you re­ally can’t stop it,” says Frederking. You just need to fig­ure out how you’re go­ing to deal with it.”

Future Tense

is a part­ner­ship of Slate, New America, and Arizona State University

that ex­am­ines emerg­ing tech­nolo­gies, pub­lic pol­icy, and so­ci­ety.

...

Read the original on slate.com »

9 364 shares, 36 trendiness

Škoda DuoBell: A bicycle bell that outsmarts even smart headphones

The re­design of a safety fea­ture that is more than 100 years old orig­i­nated from a sim­ple need. Bicycle bells have re­mained al­most un­changed for over a cen­tury, but the world around them has not. Škoda DuoBell is the first bell ever de­signed to pen­e­trate noise-can­celling head­phones. It is a smart ana­logue trick that out­smarts the ar­ti­fi­cial in­tel­li­gence al­go­rithms in these head­phones. It is a small ad­just­ment that will im­prove safety on city streets,” said Ben Edwards from AMV BBDO, the agency in­volved in de­vel­op­ing the con­cept. The idea was also sup­ported by the agency PHD, while pro­duc­tion com­pany Unit9 con­tributed to the de­vel­op­ment of the pro­to­type.

The num­ber of cy­clists in ma­jor cities world­wide is in­creas­ing. For ex­am­ple, in London, the num­ber of cy­clists is ex­pected to sur­pass the num­ber of car dri­vers for the first time in his­tory this year. At the same time, how­ever, the risk of col­li­sions be­tween cy­clists and inat­ten­tive pedes­tri­ans is also ris­ing. In 2024 alone, ac­cord­ing to data from Transport for London, the num­ber of such in­ci­dents in­creased by 24%.

...

Read the original on www.skoda-storyboard.com »

10 360 shares, 14 trendiness

S3 Files and the changing face of S3

Almost every­one at some point in their ca­reer has dealt with the deeply frus­trat­ing process of mov­ing large amounts of data from one place to an­other, and if you haven’t, you prob­a­bly just haven’t worked with large enough datasets yet. For Andy Warfield, one of those for­ma­tive ex­pe­ri­ences was at UBC, work­ing along­side ge­nomics re­searchers who were pro­duc­ing ex­tra­or­di­nary vol­umes of se­quenc­ing data but spend­ing an ab­surd amount of their time on the me­chan­ics of get­ting that data where it needed to be. Forever copy­ing data back and forth, man­ag­ing mul­ti­ple in­con­sis­tent copies. It is a prob­lem that has frus­trated builders across every in­dus­try, from sci­en­tists in the lab to en­gi­neers train­ing ma­chine learn­ing mod­els, and it is ex­actly the type of prob­lem that we should be solv­ing for our cus­tomers.

In this post, Andy writes about the so­lu­tion that his team came up with: S3 Files. The hard-won lessons, a few gen­uinely funny mo­ments, and at least one ill-fated at­tempt to name a new data type. It is a fas­ci­nat­ing read that I think you’ll en­joy.

It turns out that sun­flow­ers are a lot more promis­cu­ous than hu­mans.

About a decade ago, just be­fore join­ing Amazon, I had wrapped up my sec­ond startup and was back teach­ing at UBC. I wanted to ex­plore some­thing that I did­n’t have a lot of re­search ex­pe­ri­ence with and de­cided to learn about ge­nomics, and in par­tic­u­lar the in­ter­sec­tion of com­puter sys­tems and how bi­ol­o­gists per­form ge­nomics re­search. I wound up spend­ing time with Loren Rieseberg, a botany pro­fes­sor at UBC who stud­ies sun­flower DNA—analyzing genomes to un­der­stand how plants de­velop traits that let them thrive in chal­leng­ing en­vi­ron­ments like drought or salty soils.

The botanists’ joke about promis­cu­ity (the one that started this blog) was one rea­son why Loren’s lab was so fun to work with. Their ex­pla­na­tion was that hu­man DNA has about 3 bil­lion base pairs, and any two hu­mans are 99.9% iden­ti­cal at a ge­nomic level—all of our DNA is re­mark­ably sim­i­lar. But sun­flow­ers, be­ing flow­ers, and not at all monog­a­mous, have both larger genomes (about 3.6 bil­lion base pairs) and way more vari­a­tion (10 times more ge­netic vari­a­tion be­tween in­di­vid­u­als).

One of my PhD grads at the time, JS Legare, de­cided to join me on this ad­ven­ture and went on to do a post­doc in Loren’s lab, ex­plor­ing how we might move these work­loads to the cloud. Genomic analy­sis is an ex­am­ple of some­thing that some re­searchers have called burst par­al­lel” com­put­ing. Analyzing DNA can be done with mas­sive amounts of par­al­lel com­pu­ta­tion, and when you do that it of­ten runs for rel­a­tively short pe­ri­ods of time. This means that us­ing lo­cal hard­ware in a lab can be a poor fit, be­cause you of­ten don’t have enough com­pute to run fast analy­sis when you need to, and the com­pute you do have sits idle when you aren’t do­ing ac­tive work. Our idea was to ex­plore us­ing S3 and server­less com­pute to run tens or hun­dreds of thou­sands of tasks in par­al­lel so that re­searchers could run com­plex analy­sis very very quickly, and then scale down to zero when they were done.

The bi­ol­o­gists worked in Linux with an an­a­lyt­ics frame­work called GATK4—a ge­nomic analy­sis toolkit with in­te­gra­tion for Apache Spark. All of their data lived on a shared NFS filer. In bridg­ing to the cloud, JS built a sys­tem he called bunnies” (another promis­cu­ity joke) to pack­age analy­ses in con­tain­ers and run them on S3, which was a real win for ve­loc­ity, re­peata­bil­ity, and per­for­mance through par­al­leliza­tion. But a stand­out les­son was the fric­tion at the stor­age bound­ary.

S3 was great for par­al­lelism, cost, and dura­bil­ity, but every tool the ge­nomics re­searchers used ex­pected a lo­cal Linux filesys­tem. Researchers were for­ever copy­ing data back and forth, man­ag­ing mul­ti­ple, some­times in­con­sis­tent copies. This data fric­tion—S3 on one side, a filesys­tem on the other, and a man­ual copy pipeline in be­tween—is some­thing I’ve seen over and over in the years since. In me­dia and en­ter­tain­ment, in pre­train­ing for ma­chine learn­ing, in sil­i­con de­sign, and in sci­en­tific com­put­ing. Dif­fer­ent tools are writ­ten to ac­cess data in dif­fer­ent ways and it sucks when the API that sits in front of our data be­comes a source of fric­tion that makes it harder to work with.

We are all aware, and I think still maybe even a lit­tle stunned, at the way that agen­tic tool­ing is chang­ing soft­ware de­vel­op­ment to­day. Agents are pretty darned good at writ­ing code, and they are get­ting bet­ter at it fast enough that we’re all spend­ing a fair bit of time think­ing about what it all even means (even Werner). One thing that does re­ally seem true though is that agen­tic de­vel­op­ment has pro­foundly changed the cost of build­ing ap­pli­ca­tions. Cost in terms of dol­lars, in terms of time, and es­pe­cially in terms of the skill as­so­ci­ated with writ­ing work­able code. And it’s this last part that I’ve been find­ing the most ex­cit­ing lately, be­cause for about as long as we’ve had soft­ware, suc­cess­ful ap­pli­ca­tions have al­ways in­volved com­bin­ing two of­ten dis­jointed skillsets: On one hand skill in the do­main of the ap­pli­ca­tion be­ing writ­ten, like ge­nomics, or fi­nance, or de­sign, and on the other hand skill in ac­tu­ally writ­ing code. In a lot of ways, agents are il­lus­trat­ing just how pro­hib­i­tively high the bar­rier to en­try for writ­ing soft­ware has al­ways been, and are sud­denly al­low­ing apps to be writ­ten by a much larger set of peo­ple–peo­ple with deep skills in the do­mains of the ap­pli­ca­tions be­ing writ­ten, rather than in the me­chan­ics of writ­ing them.

As we find our­selves in this spot where ap­pli­ca­tions are be­ing writ­ten faster, more ex­per­i­men­tally, more di­versely than ever, the cy­cle time from idea to run­ning code is com­press­ing dra­mat­i­cally. As the cost of build­ing ap­pli­ca­tions col­lapses, and as each ap­pli­ca­tion we build can serve as a ref­er­ence for the next one, it re­ally feels like the code/​data di­vi­sion is be­com­ing more mean­ing­ful than it has ever been be­fore. We are en­ter­ing a time where ap­pli­ca­tions will come and go, and as al­ways, data out­lives all of them. The role of ef­fec­tive stor­age sys­tems has al­ways been not just to safely store data, but also to help ab­stract and de­cou­ple it from in­di­vid­ual ap­pli­ca­tions. As the pace of ap­pli­ca­tion de­vel­op­ment ac­cel­er­ates, this prop­erty of stor­age has be­come more im­por­tant than ever, be­cause the eas­ier data is to at­tach to and work with, the more that we can play, build, and ex­plore new ways to ben­e­fit from it.

S3 as a stew­ard for your data

Over the past few years, the S3 team has been re­ally fo­cused on this last point. We’ve been look­ing closely at sit­u­a­tions where the way that data is ac­cessed in S3 just is­n’t sim­ple enough–pre­cisely like the ex­am­ple of bi­ol­o­gists in Loren’s lab hav­ing to build scripts to copy data around so that it’s in the right place to use with their tool­ing–and we started look­ing more broadly at places where cus­tomers were find­ing that work­ing with stor­age was dis­tract­ing them from work­ing with data. The first les­son that we had here was with struc­tured data. S3 stores ex­abytes of par­quet data and av­er­ages over 25 mil­lion re­quests per sec­ond to that for­mat alone. A lot of this was ei­ther as plain par­quet or struc­tured as Hive ta­bles. And it was clear that peo­ple wanted to do more with this data. Open table for­mats, no­tably Apache Iceberg, were emerg­ing as func­tion­ally richer table ab­strac­tions al­low­ing in­ser­tions and mu­ta­tions, schema changes, and snap­shots of ta­bles. While Iceberg was clearly help­ing lift the level of ab­strac­tion for tab­u­lar data on S3, it also still car­ried a set of sharp edges be­cause it was hav­ing to sur­face ta­bles strictly over the ob­ject API.

As Iceberg started to grow in pop­u­lar­ity, cus­tomers who adopted it at scale told us that man­ag­ing se­cu­rity pol­icy was dif­fi­cult, that they did­n’t want to have to man­age table main­te­nance and com­paction, and that they wanted work­ing with tab­u­lar data to be eas­ier. Moreover, a lot of work on Iceberg and Open Table Formats (OTFs) gen­er­ally was be­ing dri­ven specif­i­cally for Spark. While Spark is very im­por­tant as an an­a­lyt­ics en­gine, peo­ple store data in S3 be­cause they want to be able to work with it us­ing any tool they want, even (and es­pe­cially!) the tools that don’t ex­ist yet. So in 2024, at re:In­vent, we launched S3 Tables as a man­aged, first-class table prim­i­tive that can serve as a build­ing block for struc­tured data. S3 Tables stores data in Iceberg, but adds guardrails to pro­tect data in­tegrity and dura­bil­ity. It makes com­paction au­to­matic, adds sup­port for cross-re­gion table repli­ca­tion, and con­tin­ues to re­fine and ex­tend the idea that a table should be a first-class data prim­i­tive that sits along­side ob­jects as a way to build ap­pli­ca­tions. Today we have over 2 mil­lion ta­bles stored in S3 Tables and are see­ing all sorts of re­mark­able ap­pli­ca­tions built on top of them.

At around the same time, we were be­gin­ning to have a lot of con­ver­sa­tions about sim­i­lar­ity search and vec­tor in­dices with S3 cus­tomers. AI ad­vances over the past few years have re­ally cre­ated both an op­por­tu­nity and a need for vec­tor in­dexes over all sorts of stored data. The op­por­tu­nity is pro­vided by ad­vanced em­bed­ding mod­els, which have in­tro­duced a step-func­tion change in the abil­ity to pro­vide se­man­tic search. Suddenly, cus­tomers with large archival me­dia col­lec­tions, like his­tor­i­cal sports footage, could build a vec­tor in­dex and do a live search for a spe­cific player scor­ing div­ing touch­downs and in­stantly get a col­lec­tion of clips, as­sem­bled as a hit reel, that can be used in live broad­cast. That same prop­erty of se­man­ti­cally rel­e­vant search is equally valu­able for RAG and for ap­ply­ing mod­els over data they weren’t trained on.

As cus­tomers started to build and op­er­ate vec­tor in­dexes over their data, they be­gan to high­light a slightly dif­fer­ent source of data fric­tion. Powerful vec­tor data­bases al­ready ex­isted, and vec­tors had been quickly work­ing their way in as a fea­ture on ex­ist­ing data­bases like Postgres. But these sys­tems stored in­dexes in mem­ory or on SSD, run­ning as com­pute clus­ters with live in­dices. That’s the right model for a con­tin­u­ous low-la­tency search fa­cil­ity, but it’s less help­ful if you’re com­ing to your data from a stor­age per­spec­tive. Customers were find­ing that, es­pe­cially over text-based data like code or PDFs, that the vec­tors them­selves were of­ten more bytes than the data be­ing in­dexed, stored on me­dia many times more ex­pen­sive.

So just like with the team’s work on struc­tured data with S3 Tables, at the last re:In­vent we launched S3 Vectors as a new S3-native data type for vec­tor in­dices. S3 Vectors takes a very S3 spin on stor­ing vec­tors in that its de­sign an­chors on a per­for­mance, cost and dura­bil­ity pro­file that is very sim­i­lar to S3 ob­jects. Probably most im­por­tantly though, S3 Vectors is de­signed to be fully elas­tic, mean­ing that you can quickly cre­ate an in­dex with only a few hun­dred records in it, and scale over time to bil­lions of records. S3 Vector’s biggest strength is re­ally with the sheer sim­plic­ity of hav­ing an al­ways-avail­able API end­point that can sup­port sim­i­lar­ity search in­dices. Just like ob­jects and ta­bles, it’s an­other data prim­i­tive that you can just reach for as part of ap­pli­ca­tion de­vel­op­ment.

Today, we are launch­ing S3 Files, a new S3 fea­ture that in­te­grates the Amazon Elastic File System (EFS) into S3 and al­lows any ex­ist­ing S3 data to be ac­cessed di­rectly as a net­work at­tached file sys­tem.

The story about files is ac­tu­ally longer, and even more in­ter­est­ing than the work on ei­ther Tables or Vectors, be­cause files turn out to be a com­plex and tricky data type to cleanly in­te­grate with ob­ject stor­age. We ac­tu­ally started work­ing on the files idea be­fore we launched S3 Tables, as a joint ef­fort be­tween the EFS and S3 teams, but let’s put a pin in that for a sec­ond.

As I de­scribed with the ge­nomics ex­am­ple of an­a­lyz­ing sun­flower DNA, there is an enor­mous body of ex­ist­ing soft­ware that works with data through filesys­tem APIs, data sci­ence tools, build sys­tems, log proces­sors, con­fig­u­ra­tion man­age­ment, and train­ing pipelines. If you have watched agen­tic cod­ing tools work with data, they are very quick to reach for the rich range of Unix tools to work di­rectly with data in the lo­cal file sys­tem. Working with data in S3 means deep­en­ing the rea­son­ing that they have to do to ac­tively go list files in S3, trans­fer them to the lo­cal disk, and then op­er­ate on those lo­cal copies. And it’s ob­vi­ously broader than just the agen­tic use case, it’s true for every cus­tomer ap­pli­ca­tion that works with lo­cal file sys­tems in their jobs to­day. Natively sup­port­ing files on S3 makes all of that data im­me­di­ately more ac­ces­si­ble—and ul­ti­mately more valu­able. You don’t have to copy data out of S3 to use pan­das on it, or to point a train­ing job at it, or to in­ter­act with it us­ing a de­sign tool.

With S3 Files, you get a re­ally sim­ple thing. You can now mount any S3 bucket or pre­fix in­side your EC2 VM, con­tainer, or Lambda func­tion and ac­cess that data through your file sys­tem. If you make changes, your changes will be prop­a­gated back to S3. As a re­sult, you can work with your ob­jects as files, and your files as ob­jects.

And this is where the story gets in­ter­est­ing, be­cause as we of­ten learn when we try to make things sim­ple for cus­tomers, mak­ing some­thing sim­ple is of­ten one of the more com­pli­cated things that you can set out to do.

Builders hate the fact that they have to de­cide early on whether their data is go­ing to live in a file sys­tem or an ob­ject store, and to be stuck with the con­se­quences of that from then on. With that de­ci­sion, they are ba­si­cally pick­ing how they are go­ing to in­ter­act with their data not just now, but long into the fu­ture, and if they get it wrong they ei­ther have to do a mi­gra­tion or build a layer of au­toma­tion for copy­ing data.

Early on, the idea was ba­si­cally that we would just put EFS and S3 in a gi­ant pot, sim­mer it for a bit, and we would get the best of both worlds. We even called the early ver­sion of the pro­ject EFS3 (and I’m glad we did­n’t keep that name!). But things got tricky in a hurry. Every time we sat down to work through de­signs, we found dif­fi­cult tech­ni­cal chal­lenges and tough de­ci­sions. And in each of these de­ci­sions, ei­ther the file or the ob­ject pre­sen­ta­tion of data would have to give some­thing up in the de­sign that would make it a bit less good. One of the en­gi­neers on the team de­scribed this as a bat­tle of un­palat­able com­pro­mises.”  We were hardly the first stor­age peo­ple to dis­cover how dif­fi­cult it is to con­verge file and ob­ject into a sin­gle stor­age sys­tem, but we were also acutely aware of how much not hav­ing a so­lu­tion to the prob­lem was frus­trat­ing builders.

We were de­ter­mined to find a path through it so we did the only sen­si­ble thing you can do when you are faced with a re­ally dif­fi­cult tech­ni­cal de­sign prob­lem: we locked a bunch of our most se­nior en­gi­neers in a room and said we weren’t go­ing to let them out till they had a plan that they all liked.

Passionate and con­tentious dis­cus­sions en­sued. And en­sued. And en­sued. And even­tu­ally we gave up. We just could­n’t get to a so­lu­tion that did­n’t leave some­one (and in most cases re­ally every­one) un­happy with the de­sign.

A quick aside at this point: I may be tak­ing some dra­matic lib­er­ties with the com­ment about lock­ing peo­ple in a room. The Amazon meet­ing rooms don’t have locks on them. But to be clear on this point: I fre­quently find that we make the fastest and most con­struc­tive progress on re­ally hard de­sign prob­lems when we get smart, pas­sion­ate peo­ple with dif­fer­ing tech­ni­cal views in front of a white­board to re­ally dig in over a pe­riod of days. This is­n’t an earth-mov­ing ob­ser­va­tion, but it’s of­ten sur­pris­ing how easy it can be to for­get in the face of try­ing to talk through big hard prob­lems in one-hour blocks over video con­fer­ence. The en­gi­neers in these dis­cus­sions deeply un­der­stood file and ob­ject work­loads and the sub­tleties of how dif­fer­ent they can be, and so these dis­cus­sions were deep, some­times heated, and ab­solutely fas­ci­nat­ing. And de­spite all of this, we still could­n’t get to a de­sign that we liked. It was re­ally frus­trat­ing.

This was around Christmas of 2024. Leading into the hol­i­days, the team changed course. They went through the de­sign docs and dis­cus­sion notes that they had and started to enu­mer­ate all of the spe­cific de­sign com­pro­mises and the be­hav­iour that we would need to be com­fort­able with if we wanted to pre­sent both file and ob­ject in­ter­faces as a sin­gle uni­fied sys­tem. We all looked at it and agreed that it was­n’t the best of both worlds, it was the low­est com­mon de­nom­i­na­tor, and we could all think of ex­am­ple work­loads on both sides that would break in sur­pris­ing, of­ten sub­tle, and al­ways frus­trat­ing ways.

I think the ex­am­ple where this re­ally stood out to me was around the top-level se­man­tics and ex­pe­ri­ence of how ob­jects and files are ac­tu­ally dif­fer­ent as data prim­i­tives. Here’s a painfully sim­ple char­ac­ter­i­za­tion: files are an op­er­at­ing sys­tem con­struct. They ex­ist on stor­age, and per­sist when the power is out, but when they are used they are in­cred­i­bly rich as a way of rep­re­sent­ing data, to the point that they are very fre­quently used as a way of com­mu­ni­cat­ing across threads, processes, and ap­pli­ca­tions. Application APIs for files are built to sup­port the idea that I can up­date a record in a data­base in place, or ap­pend data to a log, and that you can con­cur­rently ac­cess that file and see my change al­most in­stan­ta­neously, to an ar­bi­trary sub-re­gion of the file. There’s a rich set of OS func­tion­al­ity, like mmap() that dou­bles down on files as shared per­sis­tent data that can mu­tate at a very fine gran­u­lar­ity and as if it is a set of in-mem­ory data struc­tures.

Now if we flip over to ob­ject world, the idea of writ­ing to the mid­dle of an ob­ject while some­one else is ac­cess­ing it is more or less sac­ri­lege. The im­mutabil­ity of ob­jects is an as­sump­tion that is cooked into APIs and ap­pli­ca­tions. Tools will down­load and ver­ify con­tent hashes, they will use ob­ject ver­sion­ing to pre­serve old copies. Most no­table of all, they of­ten build so­phis­ti­cated and com­plex work­flows that are en­tirely an­chored on the no­ti­fi­ca­tions that are as­so­ci­ated with whole ob­ject cre­ation. This last thing was some­thing that sur­prised me when I started work­ing on S3, and it’s ac­tu­ally re­ally cool. Systems like S3 Cross Region Replication (CRR) repli­cate data based on no­ti­fi­ca­tions that hap­pen when ob­jects are cre­ated or over­writ­ten and those no­ti­fi­ca­tions are counted on to have at-least-once se­man­tics in or­der to en­sure that we never miss repli­ca­tion for an ob­ject. Customers use sim­i­lar pipelines to trig­ger log pro­cess­ing, im­age transcod­ing and all sorts of other stuff–it’s a very pop­u­lar pat­tern for ap­pli­ca­tion de­sign over ob­jects. In fact, no­ti­fi­ca­tions are an ex­am­ple of an S3 sub­sys­tem that makes me mar­vel at the scale of the stor­age sys­tem I get to work on: S3 sends over 300 bil­lion event no­ti­fi­ca­tions every day just to server­less event lis­ten­ers that process new ob­jects!

The thing that we came to re­al­ize was that there is ac­tu­ally a pretty pro­found bound­ary be­tween files and ob­jects. File in­ter­ac­tions are ag­ile, of­ten mu­ta­tion heavy, and se­man­ti­cally rich. Objects on the other hand come with a rel­a­tively fo­cused and nar­row set of se­man­tics; and we re­al­ized that this bound­ary that sep­a­rated them was what we re­ally needed to pay at­ten­tion to, and that rather than try­ing to hide it, the bound­ary it­self was the fea­ture we needed to build.

When we got back from the hol­i­days, we started lock­ing (well, ok, not ex­actly lock­ing) folks in rooms again, but this time with the view that the bound­ary be­tween file and ob­ject did­n’t ac­tu­ally have to be in­vis­i­ble. And this time, the team started com­ing out of dis­cus­sions look­ing a lot hap­pier.

The first de­ci­sion was that we were go­ing to treat first-class file ac­cess on S3 as a pre­sen­ta­tion layer for work­ing with data. We would al­low cus­tomers to de­fine an S3 mount on a bucket or pre­fix, and that un­der the cov­ers, that mount would at­tach an EFS name­space to mir­ror the meta­data from S3. We would make the tran­sit and con­sis­tency of data across the two lay­ers an ab­solutely cen­tral part of our de­sign. We started to de­scribe this as stage and com­mit,” a term that we bor­rowed from ver­sion con­trol sys­tems like git—changes would be able to ac­cu­mu­late in EFS, and then be pushed down col­lec­tively to S3—and that the specifics of how and when data tran­sited the bound­ary would be pub­lished as part of the sys­tem, clear to cus­tomers, and some­thing that we could ac­tu­ally con­tinue to evolve and im­prove as a pro­gram­matic prim­i­tive over time. (I’m go­ing to talk about this point a lit­tle more at the end, be­cause there’s much more the team is ex­cited to do on this sur­face).

Being ex­plicit about the bound­ary be­tween file and ob­ject pre­sen­ta­tions is some­thing that I did not ex­pect at all when the team started work­ing on S3 Files, and it’s some­thing that I’ve re­ally come to love about the de­sign. It is early and there is plenty of room for us to evolve, but I think the team all feels that it sets us up on a path where we are ex­cited to im­prove and evolve in part­ner­ship with what builders need, and not be stuck be­hind those un­palat­able com­pro­mises.

Not out of the woods

Deciding on this stage and com­mit thing was one of those de­sign de­ci­sions that pro­vided some bound­aries and sep­a­ra­tion of con­cerns. It gave us a clear struc­ture, but it did­n’t make the hard prob­lems go away. The team still had to nav­i­gate real trade­offs be­tween file and ob­ject se­man­tics, per­for­mance, and con­sis­tency. Let me walk through a few ex­am­ples to show how nu­anced these two ab­strac­tions re­ally are, and how the team ap­proached these de­ci­sions.

S3 read­ers of­ten as­sume full ob­ject up­dates, no­ti­fi­ca­tions, and in many cases ac­cess to his­tor­i­cal ver­sions. File sys­tems have fine-grained mu­ta­tions, but they have im­por­tant con­sis­tency and atom­ic­ity tricks as well. Many ap­pli­ca­tions de­pend on the abil­ity to do atomic file re­names as a way of mak­ing a large change vis­i­ble all at once. They do the same thing with di­rec­tory moves. S3 con­di­tion­als help a bit with the first thing but aren’t an ex­act match, and there is­n’t an S3 ana­log for the sec­ond. So as men­tioned above, sep­a­rat­ing the lay­ers al­lows these modal­i­ties to co­ex­ist in par­al­lel sys­tems with a sin­gle view of the same data. You can mu­tate and re­name a file all you want, and at a later point, it will be writ­ten as a whole to S3.

Authorization is equally thorny. S3 and file sys­tems think about au­tho­riza­tion in very dif­fer­ent ways. S3 sup­ports IAM poli­cies scoped to key pre­fixes—you can say deny GetObject on any­thing un­der /private/”. In fact, you can fur­ther con­strain those per­mis­sions based on things like the net­work or prop­er­ties of the re­quest it­self. IAM poli­cies are in­cred­i­bly rich, and also much more ex­pen­sive to eval­u­ate than file per­mis­sions are. File sys­tems have spent years get­ting things like per­mis­sion checks off of the data path, of­ten eval­u­at­ing up front and then us­ing a han­dle for per­sis­tent fu­ture ac­cess. Files are also a lit­tle weird as an en­tity to wrap au­tho­riza­tion pol­icy around, be­cause per­mis­sions for a file live in its in­ode. Hard links al­low you to have many in­odes for the same file, and you also need to think about di­rec­tory per­mis­sions that de­ter­mine if you can get to a file in the first place. Unless you have a han­dle on it, in which case it kind of does­n’t mat­ter, even if it’s re­named, moved, and of­ten even deleted.

There’s a lot more com­plex­ity, erm, rich­ness to dis­cuss here—es­pe­cially around top­ics like user and group iden­tity—but by mov­ing to an ex­plicit bound­ary, the team got them­selves out of hav­ing to co-rep­re­sent both types of per­mis­sions on every sin­gle ob­ject. Instead, per­mis­sions could be spec­i­fied on the mount it­self (familiar ter­ri­tory for net­work file sys­tem users) and en­forced within the file sys­tem, with spe­cific map­pings ap­plied across the two worlds.

This de­sign had an­other ad­van­tage. It pre­served IAM pol­icy on S3 as a back­stop. You can al­ways dis­able ac­cess at the S3 layer if you need to change a data perime­ter, while del­e­gat­ing au­tho­riza­tion up to the file layer within each mount. And it left the door open for sit­u­a­tions in the fu­ture where we might want to ex­plore mul­ti­ple dif­fer­ent mounts over the same data.

If you are fa­mil­iar with both file and ob­ject sys­tems, it’s not a hard ex­er­cise to think about cases where file and ob­ject nam­ing be­haves quite dif­fer­ently. When you start to sit down and re­ally dig into it, things get al­most hi­lar­i­ously des­o­late. File sys­tems have first-class path sep­a­ra­tors—of­ten for­ward slash (“/”) char­ac­ters. S3 has these too, but they are re­ally just a sug­ges­tion. In fact, S3s LIST com­mand al­lows you to spec­ify any­thing you want to be parsed as a path sep­a­ra­tor and there are a hand­ful of cus­tomers who have built re­mark­able multi-di­men­sional nam­ing struc­tures that em­bed mul­ti­ple dif­fer­ent sep­a­ra­tors in the same paths and pass a dif­fer­ent de­lim­iter to LIST de­pend­ing on how they want to or­ga­nize re­sults.

Here’s an­other sim­ple and an­noy­ing one: be­cause S3 does­n’t have di­rec­to­ries, you can have ob­jects that end with that same slash. That’s to say, that you can have a thing that looks like a di­rec­tory but is a file. For about 20 min­utes the team thought this was a cool fea­ture and were call­ing them filerectories.” Thank good­ness we did­n’t keep that one.

There are tens of these dif­fer­ences, and we care­fully thought about re­strict­ing to a sin­gle com­mon struc­ture or just fix­ing our­selves on one side or the other. On all of these paths we re­al­ized that we were go­ing to break as­sump­tions about nam­ing in­side ap­pli­ca­tions.

We de­cided to lean into the bound­ary and al­low both sides to stick with their ex­ist­ing nam­ing con­ven­tions and se­man­tics. When ob­jects or files are cre­ated that can’t be moved across the bound­ary, we de­cided that (and wow was this ever a lot of pas­sion­ate dis­cus­sion) we just would­n’t move them. Instead, we would emit an event to al­low cus­tomers to mon­i­tor and take ac­tion if nec­es­sary. This is clearly an ex­am­ple of down­load­ing com­plex­ity onto the de­vel­oper, but I think it’s also a pro­foundly good ex­am­ple of that be­ing the right thing to do, be­cause we are choos­ing not to fail things in the do­mains where they al­ready ex­pect to run, we are build­ing a bound­ary that ad­mits the vast ma­jor­ity of path names that ac­tu­ally do work in both cases, and we are build­ing a mech­a­nism to de­tect and cor­rect prob­lems as they arise.

The last big area of dif­fer­ences that the team spent a lot of time talk­ing about was per­for­mance, and in par­tic­u­lar the per­for­mance and re­quest la­tency of name­space in­ter­ac­tions. File and ob­ject name­spaces are op­ti­mized for very dif­fer­ent things. In a file sys­tem, there are a lot of data-de­pen­dent ac­cesses to meta­data. Ac­cess­ing a file means also ac­cess­ing (and in some cases up­dat­ing) the di­rec­tory record. There are also many op­er­a­tions that end up tra­vers­ing all of the di­rec­tory records along a path. As a re­sult, fast file sys­tem name­spaces—even big dis­trib­uted ones, tend to co-lo­cate all the meta­data for a di­rec­tory on a sin­gle host so that those in­ter­ac­tions are as fast as pos­si­ble. The ob­ject name­space is com­pletely flat and tends to op­ti­mize for very highly par­al­lel point queries and up­dates. There are many cases in S3 where in­di­vid­ual directories” have bil­lions of ob­jects in them and are be­ing ac­cessed by hun­dreds of thou­sands of clients in par­al­lel.

As we looked through the set of chal­lenges that I’ve just de­scribed, we spent a lot of time talk­ing about adop­tion. S3 is two decades old and we wanted a so­lu­tion that ex­ist­ing S3 cus­tomers could im­me­di­ately use on their own data, and not one that meant mi­grat­ing to some­thing com­pletely new. There are enor­mous num­bers of ex­ist­ing buck­ets serv­ing ap­pli­ca­tions that de­pend on S3s ob­ject se­man­tics work­ing ex­actly as doc­u­mented. We were not will­ing to in­tro­duce sub­tle new be­hav­iours that could break those ap­pli­ca­tions.

It turns out that very few ap­pli­ca­tions use both file and ob­ject in­ter­faces con­cur­rently on the same data at the same in­stant. The far more com­mon pat­tern is mul­ti­phase. A data pro­cess­ing pipeline uses filesys­tem tools in one stage to pro­duce out­put that’s con­sumed by ob­ject-based ap­pli­ca­tions in the next. Or a cus­tomer wants to run an­a­lyt­ics queries over a snap­shot of data that’s ac­tively be­ing mod­i­fied through a filesys­tem.

We re­al­ized that it’s not nec­es­sary to con­verge file and ob­ject se­man­tics to solve the data silo prob­lem. What they needed was the same data in one place, with the right view for each ac­cess pat­tern. A file view that pro­vides full NFS close-to-open con­sis­tency. An ob­ject view that pro­vides full S3 atomic-PUT strong con­sis­tency. And a syn­chro­niza­tion layer that keeps them con­nected.

So we shipped it

All of that ar­gu­ing—the team’s list of unpalatable com­pro­mises”, the pas­sion­ate and oc­ca­sion­ally des­o­late dis­cus­sions about fil­erec­to­ries—turned out to be ex­actly the work we needed to do. I think the team all feels that the de­sign is bet­ter for hav­ing gone through it. S3 Files lets you mount any S3 bucket or pre­fix as a filesys­tem on your EC2 in­stance, con­tainer, or Lambda func­tion. Behind the scenes it’s backed by EFS, which pro­vides the file ex­pe­ri­ence your tools al­ready ex­pect. NFS se­man­tics, di­rec­tory op­er­a­tions, per­mis­sions. From your ap­pli­ca­tion’s per­spec­tive, it’s a mounted di­rec­tory. From S3s per­spec­tive, the data is ob­jects in a bucket.

The way it works is worth a quick walk through. When you first ac­cess a di­rec­tory, S3 Files im­ports meta­data from S3 and pop­u­lates a syn­chro­nized view. For files un­der 128 KB it also pulls the data it­self. For larger files only meta­data comes over and the data is fetched from S3 when you ac­tu­ally read it. This lazy hy­dra­tion is im­por­tant be­cause it means that you can mount a bucket with mil­lions of ob­jects in it and just start work­ing im­me­di­ately. This start work­ing im­me­di­ately” part is a good ex­am­ple of a sim­ple ex­pe­ri­ence that is ac­tu­ally pretty so­phis­ti­cated un­der the cov­ers–be­ing able to mount and im­me­di­ately work with ob­jects in S3 as files is an ob­vi­ous and nat­ural ex­pec­ta­tion for the fea­ture, and it would be pretty frus­trat­ing to have to wait min­utes or hours for the file view of meta­data to be pop­u­lated. But un­der the cov­ers, S3 Files needs to scan S3 meta­data and pop­u­late a file-op­ti­mized name­space for it, and the team was able to make this hap­pen very quickly, and as a back­ground op­er­a­tion that pre­serves a sim­ple and very ag­ile cus­tomer ex­pe­ri­ence.

When you cre­ate or mod­ify files, changes are ag­gre­gated and com­mit­ted back to S3 roughly every 60 sec­onds as a sin­gle PUT. Sync runs in both di­rec­tions, so when other ap­pli­ca­tions mod­ify ob­jects in the bucket, S3 Files au­to­mat­i­cally spots those mod­i­fi­ca­tions and re­flects them in the filesys­tem view au­to­mat­i­cally. If there is ever a con­flict where files are mod­i­fied from both places at the same time, S3 is the source of truth and the filesys­tem ver­sion moves to a lost+found di­rec­tory with a CloudWatch met­ric iden­ti­fy­ing the event. File data that has­n’t been ac­cessed in 30 days is evicted from the filesys­tem view but not deleted from S3, so stor­age costs stay pro­por­tional to your ac­tive work­ing set.

There are many smaller, and re­ally fun bits of work that hap­pened as the team built the sys­tem. One of the im­prove­ments that I think is re­ally cool is what we are call­ing read by­pass.” For high-through­put se­quen­tial reads, read by­pass au­to­mat­i­cally reroutes the read data path to not use tra­di­tional NFS ac­cess, and in­stead to per­form par­al­lel GET re­quests di­rectly to S3 it­self, this ap­proach achieves 3 GB/s per client (with fur­ther room to im­prove) and scales to ter­abits per sec­ond across mul­ti­ple clients. And for those who are in­ter­ested, there’s way more de­tail in our tech­ni­cal docs (which are a pretty in­ter­est­ing read).

One thing I’ve re­ally come to ap­pre­ci­ate about the de­sign is how hon­est it is about its own edges. The ex­plicit bound­ary be­tween file and ob­ject do­mains is­n’t a lim­i­ta­tion we’re pa­per­ing over. It’s the thing that lets both sides re­main un­com­pro­mised. That said, there are places where we know we still have work to do. Renames are ex­pen­sive be­cause S3 has no na­tive re­name op­er­a­tion, so re­nam­ing a di­rec­tory means copy­ing and delet­ing every ob­ject un­der that pre­fix. We warn you when a mount cov­ers more than 50 mil­lion ob­jects for ex­actly this rea­son. Explicit com­mit con­trol is­n’t there at launch; the 60-second win­dow works for most work­loads but we know it won’t be enough for every­one. And there are ob­ject keys that sim­ply can’t be rep­re­sented as valid POSIX file­names, so they won’t ap­pear in the filesys­tem view. We’ve been in cus­tomer beta for about nine months and these are the things that we’ve learned and con­tin­ued to evolve and it­er­ate on with early cus­tomers. We’d rather be clear about them than pre­tend they don’t ex­ist.

When we were work­ing with Loren’s lab at UBC, JS spent a re­mark­able amount of his time build­ing caching and nam­ing lay­ers — not do­ing bi­ol­ogy, but writ­ing in­fra­struc­ture to shut­tle data be­tween where it lived and where tools ex­pected it to be. That fric­tion re­ally stood out to me, and look­ing back at it now, I think the les­son we kept learn­ing — in that lab, and then over and over again as the S3 team worked on Tables, Vectors, and now Files — is that dif­fer­ent ways of work­ing with data aren’t a prob­lem to be col­lapsed. They’re a re­al­ity to be served. The sun­flow­ers in Loren’s lab thrived on vari­a­tion, and it turns out data ac­cess pat­terns do too.

What I find most ex­cit­ing about S3 Files is some­thing I gen­uinely did not ex­pect when we started: that the ex­plicit bound­ary be­tween file and ob­ject turned out to be the best part of the de­sign. We spent months try­ing to make it dis­ap­pear, and when we fi­nally ac­cepted it as a first-class el­e­ment of the sys­tem, every­thing got bet­ter. Stage and com­mit gives us a sur­face that we can con­tinue to evolve — more con­trol over when and how data tran­sits the bound­ary, richer in­te­gra­tion with pipelines and work­flows–and it sets us up to do that with­out com­pro­mis­ing ei­ther side.

20 years ago, S3 started as an ob­ject store. Over the past cou­ple of years, with Tables, Vectors, and now Files, it’s be­come some­thing broader. A place where data lives durably and can be worked with in what­ever way makes sense for the job at hand. Our goal is for the stor­age sys­tem to get out of the way of your work, not to be a thing that you have to work around. We’re nowhere near done, but I’m re­ally ex­cited about the di­rec­tion that we’re head­ing in.

As Werner says, Now, go build!”

...

Read the original on www.allthingsdistributed.com »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.