10 interesting stories served every morning and every evening.
In a large-scale analysis of 20 popular VPNs, IPinfo found that 17 of those VPNs exit traffic from different countries than they claim. Some claim 100+ countries, but many of them point to the same handful of physical data centers in the US or Europe. That means the majority of VPN providers we analyzed don’t route your traffic via the countries they claim to, and they claim many more countries than they actually support. Analyzing over 150,000 exit IPs across 137 possible exit countries, and comparing what providers claim to what IPinfo measures, shows that:17 in 20 providers had traffic exiting in a different country.38 countries were “virtual-only” in our dataset (claimed by at least one provider, but never observed as the actual traffic exit country for any provider we tested).We were only able to verify all provider announced locations for 3 providers out of the 20.Across ~150,000 VPN exit IPs tested, ProbeNet, our internet measurement platform, detected roughly 8,000 cases where widely-used IP datasets placed the server in the wrong country — sometimes thousands of kilometers off.This report walks through what we saw across VPN and IP data providers, provides a closer look at two particularly interesting countries, explores why measurement-based IP data matters if you care where your traffic really goes, and shares how we ran the investigation.Which VPNs Matched Reality (And Which Didn’t)Here is the overlap between the number of listed countries each VPN provider claims to offer versus the countries with real VPN traffic that we measured — lower percentages indicate providers whose claimed lists best match our data:
It’s important to note that we used the most commonly and widely supported technologies in this research, to make comparison between providers as fair as possible while giving us significant data to analyze, so this will not be the full coverage for each provider. These are some of the most visible names in the market. They also tend to have very long country lists on their websites. Notably, three well-known providers had zero mismatches across all the countries we tested: Mullvad, IVPN, and Windscribe.Country mismatches doesn’t automatically mean some providers offer “bad VPNs,” but it does mean that if you’re choosing a VPN because it claims “100+ countries,” you should know that a significant share of those flags may be labels, or virtual locations.What “Virtual Locations” Really MeanWhen a VPN lets you connect to, for example, “Bahamas” or “Somalia,” that doesn’t always mean traffic routes through there. In many cases, it’s somewhere entirely different, like Miami or London, but presented as if traffic is in the country you picked.This setup is known as a virtual location:The IP registry data also says “Country X” — because the provider self-declared it that way.But the network measurements (latency and routing) show the traffic actually exits in “Country Y” — often thousands of kilometers away.The problem? Without active network measurement, most IP datasets will rely on what the IP’s owner told the internet registry or published in WHOIS/geofeeds: a self-reported country tag. If that record is wrong or outdated, the mistake spreads everywhere. That’s where IPinfo’s ProbeNet comes in: by running live RTT tests from 1,200+ points of presence worldwide, we anchor each IP to its real-world location, not just its declared one.Across the dataset, we found 97 countries where at least one VPN brand only ever appeared as virtual or unmeasurable in our data. In other words, for a noticeable slice of the world map, some “locations” in VPNs never show up as true exits in our measurements. We also found 38 countries where every mention behaved this way: at least one VPN claimed them, but none ever produced a stable, measurable exit in that country in our sample.You can think of these 38 as the “unmeasurable” countries in this study — places that exist in server lists, config files, and IP geofeeds, but never once appeared as the actual exit country in our measurements. They’re not randomly scattered — they cluster in specific parts of the map. By region, that includes:This doesn’t prove there is zero VPN infrastructure in those countries globally. It does show that, across the providers and locations we measured, the dominant pattern is to serve those locations from elsewhere. Here are three of the most interesting examples of how this looks at the IP level.Case Studies: Two Countries That Only Exist on the MapTo make this concrete, let’s look at three countries where every provider in our dataset turned out to be virtual: Bahamas, and Somalia.Bahamas: All-Inclusive, Hosted in the USIn our measurements, five providers offered locations labeled as “Bahamas”: NordVPN, ExpressVPN, Private Internet Access, FastVPN, and IPVanish.For all of them, measured traffic was in the United States, usually with sub-millisecond RTT to US probes.
Somalia: Mogadishu, via France and the UKSomalia appears in our sample for only two providers: NordVPN and ProtonVPN. Both label Mogadishu explicitly in their naming, but these RTTs are exactly what you’d expect for traffic in Western Europe, and completely inconsistent with traffic in East Africa. Both providers go out of their way in the labels (e.g. “SO, Mogadishu”), but the actual traffic is in Nice and London, not Somalia.
When Legacy IP Providers Agree With the Wrong VPN LocationsSo far, we’ve talked about VPN claims versus our measurements. But other IP data providers don’t run active RTT tests. They rely on self-declared IP data sources, and often assume that if an IP is tagged as “Country X,” it must actually be there. In these cases, the IP legacy datasets typically “follow” the VPN provider’s story: if the VPN markets the endpoint as Country X, the legacy IP dataset also places it in Country X.To quantify that, we looked at 736 VPN exits where ProbeNet’s measured country disagreed with one or more widely used legacy IP datasets.We then compared the country IPinfo’s ProbeNet measured (backed by RTT and routing) with the country reported by these other IP datasets and computed the distance between them. The gaps are large:How Far Off Were the Other IP Datasets?
The median error between ProbeNet and the legacy datasets was roughly 3,100 km. On the ProbeNet side, we have strong latency evidence that our measured country is the right one:The median minimum RTT to a probe in the measured country was 0.27 ms. About 90% of these locations had a sub-millisecond RTT from at least one probe.That’s what you expect when traffic is genuinely in that country, not thousands of kilometers away.An IP Example You Can Test YourselfThis behavior is much more tangible if you can see it on a single IP. Here’s one VPN exit IP where ProbeNet places the server in the United Kingdom, backed by sub-millisecond RTT from local probes, while other widely used legacy IP datasets place the same IP in Mauritius, 9,691 kilometers away.If you want to check this yourself, you can plug it into a public measurement tool like https://ping.sx/ and run pings or traceroutes from different regions. Tools like this one provide a clear visual for where latency is lowest.ProbeNet uses the same basic idea, but at a different scale: we maintain a network of 1,200+ points of presence (PoPs) around the world, so we can usually get even closer to the real physical location than public tools with smaller networks.If you’d like to play with more real IPs (not necessarily VPNs) where ProbeNet and IPinfo get the country right and other datasets don’t, you can find a fuller set of examples on our IP geolocation accuracy page.Why This Happens and How It Impacts TrustIt’s worth separating technical reasons from trust issues. There are technical reasons to use virtual or hubbed infrastructure:Risk & regulation. Hosting in certain countries can expose both the provider and users to local surveillance or seizure.Infrastructure quality. Some regions simply don’t have the same density of reliable data centers or high-capacity internet links, so running servers there is harder and riskier.Performance & cost. Serving “Bahamas” from Miami or “Cambodia” from Singapore can be cheaper, faster, and easier to maintain.From this perspective, a virtual location can be a reasonable compromise: you get a regional IP and content unblocking without the downsides of hosting in a fragile environment.Where It Becomes a Trust ProblemLack of disclosure. Marking something clearly as “Virtual Bahamas (US-based)” is transparent. Listing “Bahamas” alongside “Germany” without any hint that one is virtual and the other is physical blurs the line between marketing and reality.Scale of the mismatch. It’s one thing to have a few virtual locations in hard-to-host places. It’s another when dozens of countries exist only as labels across your entire footprint, or when more than half of your tested locations are actually somewhere else.Downstream reliance. Journalists, activists, and NGOs may pick locations based on safety assumptions. Fraud systems, compliance workflows, and geo-restricted services may treat “Somalia” vs “France” as a meaningful difference. If both the VPN UI and the IP data say “Somalia” while the traffic is physically in France, everyone is making decisions on a false premise.That last point leads directly into the IP data problem that we are focused on solving.So How Much Should You Trust Your VPN?If you’re a VPN user, here are some practical takeaways from this work:Treat “100+ countries” as a marketing number, not a guarantee. In our sample, 97 countries existed only as claims, not reality, across 17 providers.Check how your provider talks about locations. Do they clearly label “virtual” servers? Document where they’re actually hosted? Or do they quietly mix virtual and physical locations in one long list?If you rely on IP data professionally, ask where it comes from. A static “99.x% accurate worldwide” claim doesn’t tell you how an IP data provider handles fast-moving, high-stakes environments like VPN infrastructure.Ultimately, this isn’t an argument against VPNs, or even against virtual locations. It’s an argument for honesty and evidence. If a VPN provider wants you to trust that map of flags, they should be willing, and able, to show that it matches the real network underneath.Most legacy IP data providers rely on regional internet registry (RIR) allocation data and heuristics around routing and address blocks. These providers will often accept self-declared data like customer feedback, corrections, and geofeeds, without a clear way to verify them. Proprietary ProbeNet with 1,200+ points of presence
We maintain an internet measurement platform of PoPs in locations around the world.Active measurements
For each visible IP on the internet, including both IPv4 and IPv6 addresses, we measure RTT from multiple probes.Evidence-based geolocation
We combine these measurements with IPinfo’s other signals to assign a country (and more granular location) that’s grounded in how the internet actually behaves.This measurement-first approach is unique in the IP data space. Once we realized how much inaccuracy came from self-declared data, we started investing heavily in research and building ProbeNet to use active measurements at scale. Our goal is to make IP data as evidence-based as possible, verifying with observation on how the internet actually behaves.Our Methodology for This ReportWe approached this VPN investigation the way a skeptical but well-equipped user would: start from the VPNs’ own claims, then test them.For each of the 20 VPN providers, we pulled together three kinds of data:Marketing promises: The “servers in X countries” claims and country lists from their websites. When a country was clearly listed there, we treated it as the locations they actively promote. Configurations and locations lists: Configurations from different protocols like OpenVPN or WireGuard were collected along with location information available on provider command-line tools, mobile applications, or APIs.Unique provider–location entries: We ended up with over 6,000,000 data points and a list of provider + location combinations we could actually try to connect to with multiple IPs each.Step 2: Observing Where the Traffic Really GoesNext, we used IPinfo infrastructure and ProbeNet to dial into those locations and watch what actually happens:We connected to each VPN “location” and captured the exit IP addresses.For each exit IP address, we used IPinfo + ProbeNet’s active measurements to determine a measured country, plus:The round-trip time (RTT) from that probe (often under 1 ms), which is a strong hint about physical proximityNow we had two views for each location:Expected/Claimed country: What the VPN claims in its UI/configs/websiteMeasured country: Where IPinfo + ProbeNet actually see the exit IPFor each location where a country was clearly specified, we asked a very simple question: Does the expected country match the measured country?If yes, we counted it as a match. If not, it became a mismatch: a location where the app says one country, but the traffic exits somewhere else.We deliberately used a very narrow definition of “mismatch.” For a location to be counted, two things had to be true: the provider had to clearly claim a specific country (on their website, in their app, or in configs), and we had direct active measurements from ProbeNet for the exit IPs behind that location.We ignored any locations where the marketing was ambiguous, where we hadn’t measured the exit directly, or where we only had weaker hints like hostname strings, registry data, or third-party IP databases. Those signals can be useful and true, but we wanted our numbers to be as hard-to-argue-with as possible.The result is that the mismatch rates we show here are conservative. With a looser methodology that also leaned on those additional hints, the numbers would almost certainly be higher, not lower.
...
Read the original on ipinfo.io »
So there I was, pedaling my bicycle as fast as I could down a long, straight stretch of road, feeling great. I’d just discovered the pleasures of riding a road bike, and I loved every minute that I could get away. Always a data geek, I tracked my mileage, average speed, heart rate, etc. It was a beautiful Indian summer Sunday afternoon in September. I was in my late 30s, still a baby. Out of nowhere, my chain came off right in the middle of the sprint I was timing. In true masculine fashion, I threw a fit, cursing and hitting the brakes as hard as I could. At this point, I found out that experienced riders don’t do that because I flew right over the handlebars, landing on the pavement amid speeding cars. I momentarily lost consciousness, and when I regained my senses, I knew I’d screwed up badly. The pain in my shoulder was nauseating. I couldn’t move my arm, and I had to just roll off the road onto the shoulder. I just lay there, hurting, unable to think clearly. Within seconds, it seemed, a man materialized beside me.
He was exceptionally calm. He didn’t ask me if I was OK, since I clearly wasn’t. It was obvious that he knew what he was doing. He made certain I could breathe, paused long enough to dial 911, and then started pulling stuff out of a medical bag (WTF?) to clean the extensive road rash I had. In a minute, he asked for my home phone number so he could call my wife to let her know I was going to be riding in an ambulance to the hospital. He told her he was an emergency room doctor who just happened to be right behind me when I crashed. He explained that he would stay with me until the medics arrived and that he would call ahead to make sure one of the doctors on duty would “take good care of me.”
When he hung up, he asked me if I’d heard the conversation. I told him that I had and that I couldn’t believe how lucky I was under the circumstances. He agreed. To keep my mind off the pain, he just kept chatting, telling me that because I was arriving by ambulance, I’d be treated immediately. He told me that I’d be getting the “good drugs” to take care of the pain. That sounded awesome.
I don’t remember telling him goodbye. I certainly didn’t ask him his name or find out anything about him. He briefed the EMTs when they arrived and stood there until the ambulance doors closed. The ER was indeed ready for me when the ambulance got there. They treated me like a VIP. I got some Dilaudid for the pain, and it was indeed the good stuff. They covered the road rash with Tegaderm and took x-rays, which revealed that I’d torn my collarbone away from my shoulder blade. That was going to require a couple of surgeries and lots of physical therapy. I had a concussion and was glad that I had a helmet on.
All of this happened almost 25 years ago. I’ve had plenty of other bike wrecks, but that remains the worst one. My daughter is a nurse, and she’s like a magnet for car crashes, having stopped multiple times to render aid. She doesn’t do it with a smile on her face, though; emergency medicine isn’t her gig, and if anyone asks her if she’s a doctor, her stock answer is “I’m a YMCA member.”
The guy who helped me that day was an absolute angel. I have no idea what I would have done without him. I didn’t even have a cell phone at the time. But he was there at a time when I couldn’t have needed him any more badly. He helped me and then got in his car and completed his trip. I think of that day often, especially when the American medical system makes me mad, which happens regularly these days.
I’ve enjoyed the kindness of a lot of strangers over the years, particularly during the long hike my wife and I did for our honeymoon (2,186 miles) when we hitchhiked to a town in NJ in the rain and got a ride from the first car to pass. Another time, in Connecticut, a man gave us a $100 bill and told us to have a nice dinner at the restaurant atop Mt. Greylock, the highest mountain in Massachusetts. In Virginia, a moth flew into my wife’s ear, and I mean all the way into her ear until it was bumping into her eardrum. We hiked several miles to the road and weren’t there for a minute before a man stopped and took us to urgent care, 30 miles away.
When you get down in the dumps, I hope you have some memories like that to look back on, to restore your faith in humanity. There are a lot of really good people in the world.
Enjoyed it? Please upvote 👇
...
Read the original on louplummer.lol »
I’ve started using the term HTML tools to refer to HTML applications that I’ve been building which combine HTML, JavaScript, and CSS in a single file and use them to provide useful functionality. I have built over 150 of these in the past two years, almost all of them written by LLMs. This article presents a collection of useful patterns I’ve discovered along the way.
First, some examples to show the kind of thing I’m talking about:
pypi-changelog lets you generate (and copy to clipboard) diffs between different PyPI package releases.
bluesky-thread provides a nested view of a discussion thread on Bluesky.
These are some of my recent favorites. I have dozens more like this that I use on a regular basis.
You can explore my collection on tools.simonwillison.net—the by month view is useful for browsing the entire collection.
If you want to see the code and prompts, almost all of the examples in this post include a link in their footer to “view source” on GitHub. The GitHub commits usually contain either the prompt itself or a link to the transcript used to create the tool.
These are the characteristics I have found to be most productive in building tools of this nature:
A single file: inline JavaScript and CSS in a single HTML file means the least hassle in hosting or distributing them, and crucially means you can copy and paste them out of an LLM response.
Avoid React, or anything with a build step. The problem with React is that JSX requires a build step, which makes everything massively less convenient. I prompt “no react” and skip that whole rabbit hole entirely.
Load dependencies from a CDN. The fewer dependencies the better, but if there’s a well known library that helps solve a problem I’m happy to load it from CDNjs or jsdelivr or similar.
Keep them small. A few hundred lines means the maintainability of the code doesn’t matter too much: any good LLM can read them and understand what they’re doing, and rewriting them from scratch with help from an LLM takes just a few minutes.
The end result is a few hundred lines of code that can be cleanly copied and pasted into a GitHub repository.
The easiest way to build one of these tools is to start in ChatGPT or Claude or Gemini. All three have features where they can write a simple HTML+JavaScript application and show it to you directly.
Claude calls this “Artifacts”, ChatGPT and Gemini both call it “Canvas”. Claude has the feature enabled by default, ChatGPT and Gemini may require you to toggle it on in their “tools” menus.
Try this prompt in Gemini or ChatGPT:
Build a canvas that lets me paste in JSON and converts it to YAML. No React.
Or this prompt in Claude:
Build an artifact that lets me paste in JSON and converts it to YAML. No React.
I always add “No React” to these prompts, because otherwise they tend to build with React, resulting in a file that is harder to copy and paste out of the LLM and use elsewhere. I find that attempts which use React take longer to display (since they need to run a build step) and are more likely to contain crashing bugs for some reason, especially in ChatGPT.
All three tools have “share” links that provide a URL to the finished application. Examples:
Coding agents such as Claude Code and Codex CLI have the advantage that they can test the code themselves while they work on it using tools like Playwright. I often upgrade to one of those when I’m working on something more complicated, like my Bluesky thread viewer tool shown above.
I also frequently use asynchronous coding agents like Claude Code for web to make changes to existing tools. I shared a video about that in Building a tool to copy-paste share terminal sessions using Claude Code for web.
Claude Code for web and Codex Cloud run directly against my simonw/tools repo, which means they can publish or upgrade tools via Pull Requests (here are dozens of examples) without me needing to copy and paste anything myself.
Any time I use an additional JavaScript library as part of my tool I like to load it from a CDN.
The three major LLM platforms support specific CDNs as part of their Artifacts or Canvas features, so often if you tell them “Use PDF.js” or similar they’ll be able to compose a URL to a CDN that’s on their allow-list.
Sometimes you’ll need to go and look up the URL on cdnjs or jsDelivr and paste it into the chat.
CDNs like these have been around for long enough that I’ve grown to trust them, especially for URLs that include the package version.
The alternative to CDNs is to use npm and have a build step for your projects. I find this reduces my productivity at hacking on individual tools and makes it harder to self-host them.
I don’t like leaving my HTML tools hosted by the LLM platforms themselves for a couple of reasons. First, LLM platforms tend to run the tools inside a tight sandbox with a lot of restrictions. They’re often unable to load data or images from external URLs, and sometimes even features like linking out to other sites are disabled.
The end-user experience often isn’t great either. They show warning messages to new users, often take additional time to load and delight in showing promotions for the platform that was used to create the tool.
They’re also not as reliable as other forms of static hosting. If ChatGPT or Claude are having an outage I’d like to still be able to access the tools I’ve created in the past.
Being able to easily self-host is the main reason I like insisting on “no React” and using CDNs for dependencies—the absence of a build step makes hosting tools elsewhere a simple case of copying and pasting them out to some other provider.
My preferred provider here is GitHub Pages because I can paste a block of HTML into a file on github.com and have it hosted on a permanent URL a few seconds later. Most of my tools end up in my simonw/tools repository which is configured to serve static files at tools.simonwillison.net.
One of the most useful input/output mechanisms for HTML tools comes in the form of copy and paste.
I frequently build tools that accept pasted content, transform it in some way and let the user copy it back to their clipboard to paste somewhere else.
Copy and paste on mobile phones is fiddly, so I frequently include “Copy to clipboard” buttons that populate the clipboard with a single touch.
Most operating system clipboards can carry multiple formats of the same copied data. That’s why you can paste content from a word processor in a way that preserves formatting, but if you paste the same thing into a text editor you’ll get the content with formatting stripped.
These rich copy operations are available in JavaScript paste events as well, which opens up all sorts of opportunities for HTML tools.
hacker-news-thread-export lets you paste in a URL to a Hacker News thread and gives you a copyable condensed version of the entire thread, suitable for pasting into an LLM to get a useful summary.
paste-rich-text lets you copy from a page and paste to get the HTML—particularly useful on mobile where view-source isn’t available.
alt-text-extractor lets you paste in images and then copy out their alt text.
The key to building interesting HTML tools is understanding what’s possible. Building custom debugging tools is a great way to explore these options.
clipboard-viewer is one of my most useful. You can paste anything into it (text, rich text, images, files) and it will loop through and show you every type of paste data that’s available on the clipboard.
This was key to building many of my other tools, because it showed me the invisible data that I could use to bootstrap other interesting pieces of functionality.
keyboard-debug shows the keys (and KeyCode values) currently being held down.
cors-fetch reveals if a URL can be accessed via CORS.
HTML tools may not have access to server-side databases for storage but it turns out you can store a lot of state directly in the URL.
I like this for tools I may want to bookmark or share with other people.
icon-editor is a custom 24x24 icon editor I built to help hack on icons for the GitHub Universe badge. It persists your in-progress icon design in the URL so you can easily bookmark and share it.
The localStorage browser API lets HTML tools store data persistently on the user’s device, without exposing that data to the server.
I use this for larger pieces of state that don’t fit comfortably in a URL, or for secrets like API keys which I really don’t want anywhere near my server —even static hosts might have server logs that are outside of my influence.
word-counter is a simple tool I built to help me write to specific word counts, for things like conference abstract submissions. It uses localStorage to save as you type, so your work isn’t lost if you accidentally close the tab.
render-markdown uses the same trick—I sometimes use this one to craft blog posts and I don’t want to lose them.
haiku is one of a number of LLM demos I’ve built that request an API key from the user (via the prompt() function) and then store that in localStorage. This one uses Claude Haiku to write haikus about what it can see through the user’s webcam.
CORS stands for Cross-origin resource sharing. It’s a relatively low-level detail which controls if JavaScript running on one site is able to fetch data from APIs hosted on other domains.
APIs that provide open CORS headers are a goldmine for HTML tools. It’s worth building a collection of these over time.
Here are some I like:
* iNaturalist for fetching sightings of animals, including URLs to photos
* GitHub because anything in a public repository in GitHub has a CORS-enabled anonymous API for fetching that content from the raw.githubusercontent.com domain, which is behind a caching CDN so you don’t need to worry too much about rate limits or feel guilty about adding load to their infrastructure.
* Bluesky for all sorts of operations
* Mastodon has generous CORS policies too, as used by applications like phanpy.social
GitHub Gists are a personal favorite here, because they let you build apps that can persist state to a permanent Gist through making a cross-origin API call.
species-observation-map uses iNaturalist to show a map of recent sightings of a particular species.
zip-wheel-explorer fetches a .whl file for a Python package from PyPI, unzips it (in browser memory) and lets you navigate the files.
github-issue-to-markdown fetches issue details and comments from the GitHub API (including expanding any permanent code links) and turns them into copyable Markdown.
terminal-to-html can optionally save the user’s converted terminal session to a Gist.
bluesky-quote-finder displays quotes of a specified Bluesky post, which can then be sorted by likes or by time.
All three of OpenAI, Anthropic and Gemini offer JSON APIs that can be accessed via CORS directly from HTML tools.
Unfortunately you still need an API key, and if you bake that key into your visible HTML anyone can steal it and use to rack up charges on your account.
I use the localStorage secrets pattern to store API keys for these services. This sucks from a user experience perspective—telling users to go and create an API key and paste it into a tool is a lot of friction—but it does work.
haiku uses the Claude API to write a haiku about an image from the user’s webcam.
gemini-bbox demonstrates Gemini 2.5’s ability to return complex shaped image masks for objects in images, see Image segmentation using Gemini 2.5.
You don’t need to upload a file to a server in order to make use of the element. JavaScript can access the content of that file directly, which opens up a wealth of opportunities for useful functionality.
ocr is the first tool I built for my collection, described in Running OCR against PDFs and images directly in your browser. It uses PDF.js and Tesseract.js to allow users to open a PDF in their browser which it then converts to an image-per-page and runs through OCR.
social-media-cropper lets you open (or paste in) an existing image and then crop it to common dimensions needed for different social media platforms—2:1 for Twitter and LinkedIn, 1.4:1 for Substack etc.
ffmpeg-crop lets you open and preview a video file in your browser, drag a crop box within it and then copy out the ffmpeg command needed to produce a cropped copy on your own machine.
An HTML tool can generate a file for download without needing help from a server.
The JavaScript library ecosystem has a huge range of packages for generating files in all kinds of useful formats.
Pyodide is a distribution of Python that’s compiled to WebAssembly and designed to run directly in browsers. It’s an engineering marvel and one of the most underrated corners of the Python world.
It also cleanly loads from a CDN, which means there’s no reason not to use it in HTML tools!
Even better, the Pyodide project includes micropip—a mechanism that can load extra pure-Python packages from PyPI via CORS.
pyodide-bar-chart demonstrates running Pyodide, Pandas and matplotlib to render a bar chart directly in the browser.
numpy-pyodide-lab is an experimental interactive tutorial for Numpy.
apsw-query demonstrates the APSW SQLite library running in a browser, using it to show EXPLAIN QUERY plans for SQLite queries.
Pyodide is possible thanks to WebAssembly. WebAssembly means that a vast collection of software originally written in other languages can now be loaded in HTML tools as well.
Squoosh.app was the first example I saw that convinced me of the power of this pattern—it makes several best-in-class image compression libraries available directly in the browser.
I’ve used WebAssembly for a few of my own tools:
The biggest advantage of having a single public collection of 100+ tools is that it’s easy for my LLM assistants to recombine them in interesting ways.
Sometimes I’ll copy and paste a previous tool into the context, but when I’m working with a coding agent I can reference them by name—or tell the agent to search for relevant examples before it starts work.
The source code of any working tool doubles as clear documentation of how something can be done, including patterns for using editing libraries. An LLM with one or two existing tools in their context is much more likely to produce working code.
And then, after it had found and read the source code for zip-wheel-explorer:
Build a new tool pypi-changelog.html which uses the PyPI API to get the wheel URLs of all available versions of a package, then it displays them in a list where each pair has a “Show changes” clickable in between them - clicking on that fetches the full contents of the wheels and displays a nicely rendered diff representing the difference between the two, as close to a standard diff format as you can get with JS libraries from CDNs, and when that is displayed there is a “Copy” button which copies that diff to the clipboard
See Running OCR against PDFs and images directly in your browser for another detailed example of remixing tools to create something new.
I like keeping (and publishing) records of everything I do with LLMs, to help me grow my skills at using them over time.
For HTML tools I built by chatting with an LLM platform directly I use the “share” feature for those platforms.
For Claude Code or Codex CLI or other coding agents I copy and paste the full transcript from the terminal into my terminal-to-html tool and share that using a Gist.
In either case I include links to those transcripts in the commit message when I save the finished tool to my repository. You can see those in my tools.simonwillison.net colophon.
I’ve had so much fun exploring the capabilities of LLMs in this way over the past year and a half, and building tools in this way has been invaluable in helping me understand both the potential for building tools with HTML and the capabilities of the LLMs that I’m building them with.
If you’re interested in starting your own collection I highly recommend it! All you need to get started is a free GitHub repository with GitHub Pages enabled (Settings -> Pages -> Source -> Deploy from a branch -> main) and you can start copying in .html pages generated in whatever manner you like.
...
Read the original on simonwillison.net »
I do Advent of Code every year.
For the last seven years, including this one, I have managed to get all the stars. I do not say that to brag. I say it because it explains why I keep coming back.
It is one of the few tech traditions I never get bored of, even after doing it for a long time. I like the time pressure. I like the community vibe. I like that every December I can pick one language and go all in.
Advent of Code is usually 25 days. This year Eric decided to do 12 days instead.
So instead of 50 parts, it was 24.
That sounds like a relaxed year. It was not, but not in a bad way.
The easier days were harder than the easy days in past years, but they were also really engaging and fun to work through. The hard days were hard, especially the last three, but they were still the good kind of hard. They were problems I actually wanted to wrestle with.
It also changes the pacing in a funny way. In a normal year, by day 10 you have a pretty comfy toolbox. This year it felt like the puzzles were already demanding that toolbox while I was still building it.
That turned out to be a perfect setup for learning a new language.
Gleam is easy to like quickly.
The syntax is clean. The compiler is helpful, and the error messages are super duper good. Rust good.
Most importantly, the language strongly nudges you into a style that fits Advent of Code really well. Parse some text. Transform it a few times. Fold. Repeat.
One thing I did not expect was how good the editor experience would be. The LSP worked much better than I expected. It basically worked perfectly the whole time. I used the Gleam extension for IntelliJ and it was great.
I also just like FP.
FP is not always easier, but it is often easier. When it clicks, you stop writing instructions and you start describing the solution.
The first thing I fell in love with was echo.
It is basically a print statement that does not make you earn it. You can echo any value. You do not have to format anything. You do not have to build a string. You can just drop it into a pipeline and keep going.
This is the kind of thing I mean:
You can quickly inspect values at multiple points without breaking the flow.
I did miss string interpolation, especially early on. echo made up for a lot of that.
It mostly hit when I needed to generate text, not when I needed to inspect values. The day where I generated an LP file for glpsol is the best example. It is not hard code, but it is a lot of string building. Without interpolation it turns into a bit of a mess of <>s.
This is a small excerpt from my LP generator:
It works. It is just the kind of code where you really feel missing interpolation.
Grids are where you normally either crash into out of bounds bugs, or you litter your code with bounds checks you do not care about.
In my day 4 solution I used a dict as a grid. The key ergonomic part is that dict.get gives you an option-like result, which makes neighbour checking safe by default.
This is the neighbour function from my solution:
That last line is the whole point.
No bounds checks. No sentinel values. Out of bounds just disappears.
I expected to write parsers and helpers, and I did. What I did not expect was how often Gleam already had the exact list function I needed.
I read the input, chunked it into rows, transposed it, and suddenly the rest of the puzzle became obvious.
In a lot of languages you end up writing your own transpose yet again. In Gleam it is already there.
Another example is list.combination_pairs.
In day 8 I needed all pairs of 3D points. In an imperative language you would probably write nested loops and then question your off by one logic.
In Gleam it is a one liner:
Sometimes FP is not about being clever. It is about having the right function name.
If I had to pick one feature that made me want to keep writing Gleam after AoC, it is fold_until.
Early exit without hacks is fantastic in puzzles.
In day 8 part 2 I kept merging sets until the first set in the list contained all boxes. When that happens, I stop.
The core shape looks like this:
It is small, explicit, and it reads like intent.
I also used fold_until in day 10 part 1 to find the smallest combination size that works.
Even though I enjoyed Gleam a lot, I did hit a few recurring friction points.
None of these are deal breakers. They are just the kind of things you notice when you do 24 parts in a row.
This one surprised me on day 1.
For AoC you read a file every day. In this repo I used simplifile everywhere because you need something. It is fine, I just did not expect basic file IO to be outside the standard library.
Day 2 part 2 pushed me into regex and I had to add gleam_regexp.
This is the style I used, building a regex from a substring:
Again, totally fine. It just surprised me.
You can do [first, ..rest] and you can do [first, second].
But you cannot do [first, ..middle, last].
It is not the end of the world, but it would have made some parsing cleaner.
In Gleam a lot of comparisons are not booleans. You get an order value.
This is great for sorting. It is also very explicit. It can be a bit verbose when you just want an check.
In day 5 I ended up writing patterns like this:
I used bigi a few times this year.
On the Erlang VM, integers are arbitrary precision, so you usually do not care about overflow. That is one of the nicest things about the BEAM.
If you want your Gleam code to also target JavaScript, you do care. JavaScript has limits, and suddenly using bigi becomes necessary for some puzzles.
I wish that was just part of Int, with a single consistent story across targets.
Day 10 part 1 was my favorite part of the whole event.
The moment I saw the toggling behavior, it clicked as XOR. Represent the lights as a number. Represent each button as a bitmask. Find the smallest combination of bitmasks that XOR to the target.
This is the fold from my solution:
It felt clean, it felt fast, and it felt like the representation did most of the work.
I knew brute force was out. It was clearly a system of linear equations.
In previous years I would reach for Z3, but there are no Z3 bindings for Gleam. I tried to stay in Gleam, and I ended up generating an LP file and shelling out to glpsol using shellout.
It worked, and honestly the LP format is beautiful.
Here is the call:
It is a hack, but it is a pragmatic hack, and that is also part of Advent of Code.
Day 11 part 2 is where I was happy I was writing Gleam.
The important detail was that the memo key is not just the node. It is the node plus your state.
In my case the key was:
Once I got the memo threading right, it ran instantly.
The last day was the only puzzle I did not fully enjoy.
Not because it was bad. It just felt like it relied on assumptions about the input, and I am one of those people that does not love doing that.
I overthought it for a bit, then I learned it was more of a troll problem. The “do the areas of the pieces, when fully interlocked, fit on the board” heuristic was enough.
In my solution it is literally this:
Sometimes you build a beautiful mental model and then the right answer is a single inequality.
I am very happy I picked Gleam this year.
It has sharp edges, mostly around where the standard library draws the line and a few language constraints that show up in puzzle code. But it also has real strengths.
Pipelines feel good. Options and Results make unsafe problems feel safe. The list toolbox is better than I expected. fold_until is incredible. Once you stop trying to write loops and you let it be functional, the solutions start to feel clearer.
I cannot wait to try Gleam in a real project. I have been thinking about using it to write a webserver, and I am genuinely excited to give it a go.
And of course, I cannot wait for next year’s Advent of Code.
If you want to look at the source for all 12 days, it is here:
...
Read the original on blog.tymscar.com »
Memory safety and sandboxing are two different things. It’s reasonable to think of them as orthogonal: you could have memory safety but not be sandboxed, or you could be sandboxed but not memory safe.
* Example of memory safe but not sandboxed: a pure Java program that opens files on the filesystem for reading and writing and accepts filenames from the user. The OS will allow this program to overwrite any file that the user has access to. This program can be quite dangerous even if it is memory safe. Worse, imagine that the program didn’t have any code to open files for reading and writing, but also had no sandbox to prevent those syscalls from working. If there was a bug in the memory safety enforcement of this program (say, because of a bug in the Java implementation), then an attacker could cause this program to overwrite any file if they succeeded at achieving code execution via weird state.
* Example of sandboxed but not memory safe: a program written in assembly that starts by requesting that the OS revoke all of its capabilities beyond just pure compute. If the program did want to open a file or write to it, then the kernel will kill the process, based on the earlier request to have this capability revoked. This program could have lots of memory safety bugs (because it’s written in assembly), but even if it did, then the attacker cannot make this program overwrite any file unless they find some way to bypass the sandbox.
In practice, sandboxes have holes by design. A typical sandbox allows the program to send and receive messages to broker processes that have higher privileges. So, an attacker may first use a memory safety bug to make the sandboxed process send malicious messages, and then use those malicious messages to break into the brokers.
The best kind of defense is to have both a sandbox and memory safety. This document describes how to combine sandboxing and Fil-C’s memory safety by explaining what it takes to port OpenSSH’s seccomp-based Linux sandbox code to Fil-C.
Fil-C is a memory safe implementation of C and C++ and this site has a lot of documentation about it. Unlike most memory safe languages, Fil-C enforces safety down to where your code meets Linux syscalls and the Fil-C runtime is robust enough that it’s possible to use it in low-level system components like init and udevd. Lots of programs work in Fil-C, including OpenSSH, which makes use of seccomp-BPF sandboxing.
This document focuses on how OpenSSH uses seccomp and other technologies on Linux to build a sandbox around its unprivileged sshd-session process. Let’s review what tools Linux gives us that OpenSSH uses:
* chroot to restrict the process’s view of the filesystem.
* Running the process with the sshd user and group, and giving that user/group no privileges.
* setrlimit to prevent opening files, starting processes, or writing to files.
* seccomp-BPF syscall filter to reduce the attack surface by allowlisting only the set of syscalls that are legitimate for the unprivileged process. Syscalls not in the allowlist will crash the process with SIGSYS.
The Chromium developers and the Mozilla developers both have excellent notes about how to do sandboxing on Linux using seccomp. Seccomp-BPF is a well-documented kernel feature that can be used as part of a larger sandboxing story.
Fil-C makes it easy to use chroot and different users and groups. The syscalls that are used for that part of the sandbox are trivially allowed by Fil-C and no special care is required to use them.
Both setrlimit and seccomp-BPF require special care because the Fil-C runtime starts threads, allocates memory, and performs synchronization. This document describes what you need to know to make effective use of those sandboxing technologies in Fil-C. First, I describe how to build a sandbox that prevents thread creation without breaking Fil-C’s use of threads. Then, I describe what tweaks I had to make to OpenSSH’s seccomp filter. Finally, I describe how the Fil-C runtime implements the syscalls used to install seccomp filters.
The Fil-C runtime uses multiple background threads for garbage collection and has the ability to automatically shut those threads down when they are not in use. If the program wakes up and starts allocating memory again, then those threads are automatically restarted.
Starting threads violates the “no new processes” rule that OpenSSH’s setrlimit sandbox tries to achieve (since threads are just lightweight processes on Linux). It also relies on syscalls like clone3 that are not part of OpenSSH’s seccomp filter allowlist.
It would be a regression to the sandbox to allow process creation just because the Fil-C runtime relies on it. Instead, I added a new API to :
void zlock_runtime_threads(void);
This forces the runtime to immediately create whatever threads it needs, and to disable shutting them down on demand. Then, I added a call to zlock_runtime_threads() in OpenSSH’s ssh_sandbox_child function before either the setrlimit or seccomp-BPF sandbox calls happen.
Because the use of zlock_runtime_threads() prevents subsequent thread creation from happening, most of the OpenSSH sandbox just works. I did not have to change how OpenSSH uses setrlimit. I did change the following about the seccomp filter:
* Failure results in SECCOMP_RET_KILL_PROCESS rather than SECCOMP_RET_KILL. This ensures that Fil-C’s background threads are also killed if a sandbox violation occurs.
* MAP_NORESERVE is added to the mmap allowlist, since the Fil-C allocator uses it. This is not a meaningful regression to the filter, since MAP_NORESERVE is not a meaningful capability for an attacker to have.
* sched_yield is allowed. This is not a dangerous syscall (it’s semantically a no-op). The Fil-C runtime uses it as part of its lock implementation.
Nothing else had to change, since the filter already allowed all of the futex syscalls that Fil-C uses for synchronization.
The OpenSSH seccomp filter is installed using two prctl calls. First, we PR_SET_NO_NEW_PRIVS:
if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) == -1) {
debug(“%s: prctl(PR_SET_NO_NEW_PRIVS): %s”,
__func__, strerror(errno));
nnp_failed = 1;
This prevents additional privileges from being acquired via execve. It’s required that unprivileged processes that install seccomp filters first set the no_new_privs bit.
if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &preauth_program) == -1)
debug(“%s: prctl(PR_SET_SECCOMP): %s”,
__func__, strerror(errno));
else if (nnp_failed)
fatal(“%s: SECCOMP_MODE_FILTER activated but ”
“PR_SET_NO_NEW_PRIVS failed”, __func__);
This installs the seccomp filter in preauth_program. Note that this will fail in the kernel if the no_new_privs bit is not set, so the fact that OpenSSH reports a fatal error if the filter is installed without no_new_privs is just healthy paranoia on the part of the OpenSSH authors.
The trouble with both syscalls is that they affect the calling thread, not all threads in the process. Without special care, Fil-C runtime’s background threads would not have the no_new_privs bit set and would not have the filter installed. This would mean that if an attacker busted through Fil-C’s memory safety protections (in the unlikely event that they found a bug in Fil-C itself!), then they could use those other threads to execute syscalls that bypass the filter!
To prevent even this unlikely escape, the Fil-C runtime’s wrapper for prctl implements PR_SET_NO_NEW_PRIVS and PR_SET_SECCOMP by handshaking all runtime threads using this internal API:
/* Calls the callback from every runtime thread. */
PAS_API void filc_runtime_threads_handshake(void (*callback)(void* arg), void* arg);
The callback performs the requested prctl from each runtime thread. This ensures that the no_new_privs bit and the filter are installed on all threads in the Fil-C process.
Additionally, because of ambiguity about what to do if the process has multiple user threads, these two prctl commands will trigger a Fil-C safety error if the program has multiple user threads.
The best kind of protection if you’re serious about security is to combine memory safety with sandboxing. This document shows how to achieve this using Fil-C and the sandbox technologies available on Linux, all without regressing the level of protection that those sandboxes enforce or the memory safety guarantees of Fil-C.
...
Read the original on fil-c.org »
You’re only seeing part of our work. Create an account to explore all our investigations. Try it for free for 30 days, no strings attached.
You’re only seeing part of our work. Create an account to explore all our investigations. Try it for free for 30 days, no strings attached.
The European messaging service Zivver — which is used for confidential communication by governments and hospitals in the EU and the U. K. — has been sold to Kiteworks, an American company with strong links to Israeli intelligence. Experts have expressed deep concerns over the deal.
With the sale of Amsterdam-based data security company Zivver, sensitive information about European citizens is now in the hands of Kiteworks. The CEO of the American tech company is a former cyber specialist from an elite unit of the Israeli army, as are several other members of its top management. Various institutions in Europe and the U. K. — from hospitals to courts and immigration services — use Zivver to send confidential documents. While Zivver says these documents are encrypted, an investigation by Follow the Money shows that the company is able to read their contents.Why does this matter?Cybersecurity and intelligence experts told Follow the Money that the takeover should either have been prevented or properly assessed in advance. Zivver processes information that could be extremely valuable to third parties, such as criminals or foreign intelligence services. That information is now subject to invasive U.S. law, and overseen by a company with well-documented links to Israeli intelligence.How was this investigated?Follow the Money investigated the acquisition of Zivver and the management of Kiteworks, and spoke to experts in intelligence services and cyber security.
This article is part of an ongoing series.
When the American data security company Kiteworks bought out its Dutch industry peer Zivver in June, CEO Jonathan Yaron described it as “a proud moment for all of us”. The purchase was “a significant milestone in Kiteworks’ continued mission to safeguard sensitive data across all communication channels”, he added in a LinkedIn post. But what Yaron did not mention was that this acquisition — coming at a politically charged moment between the U.S. and the EU — put highly sensitive, personal data belonging to European and British citizens directly into American hands. Zivver is used by institutions including hospitals, health insurers, government services and immigration authorities in countries including the Netherlands, Germany, Belgium and the U.K.Neither did Yaron mention that much of Kiteworks’ top management — himself included — are former members of an elite Israeli Defence Force unit that specialised in eavesdropping and breaking encrypted communications.
Our journalism is only possible thanks to the trust of our paying members. Not a member yet? Sign up now
In addition to this, an investigation by Follow the Money shows that data processed by Zivver is less secure than the service leads its customers to believe. Research found that emails and documents sent by Zivver can be read by the company itself. This was later confirmed by Zivver to Follow the Money.“All of the red flags should have been raised during this acquisition”Zivver maintained, however, that it does not have access to the encryption keys used by customers, and therefore cannot hand over data to U.S. authorities. This is despite independent researchers confirming that the data was — for a brief period — accessible to the company. If U.S. officials wanted access to such communication, Zivver would be legally obligated to provide it.Cybersecurity experts now point to serious security concerns, and ask why this sale seems to have gone through without scrutiny from European authorities.“All of the red flags should have been raised during this acquisition,” said intelligence expert Hugo Vijver, a former long-term officer in AIVD, the Dutch security service.Amsterdam-based Zivver — which was founded in 2015 by Wouter Klinkhamer and Rick Goud — provides systems for the encrypted exchange of information via email, chat and video, among other means. Dutch courts, for example, work with Zivver to send classified documents, and solicitors use the service to send confidential information to the courts. Other government agencies in the Netherlands — including also use Zivver. So do vital infrastructure operators such as the Port of Rotterdam and The Hague Airport.
In the U.K., a number of NHS hospitals and local councils In it is used in major hospitals. The information that Zivver secures for its customers is therefore confidential and sensitive by nature. When approached by Follow the Money, a number of governmental agencies said the company’s Dutch origins were a big factor in their decision to use Zivver. Additionally, the fact that the data transferred via Zivver was stored on servers in Europe also played a role in their decisions. Now that Zivver has been acquired by a company in the United States, that data is This means that the U.S. government can request access to this information if it wishes, regardless of where the data is stored.These laws are not new, but they have become even more draconian since U.S. President Donald Trump’s return to office, according to experts.Bert Hubert, a former regulator of the Dutch intelligence services, warned: “America is deteriorating so rapidly, both legally and democratically, that it would be very naive to hand over your courts and hospitals to their services.” “Trump recently called on Big Tech to ignore European legislation. And that is what they are going to do. We have no control over it,” he added.
In Europe, Hubert said: “We communicate almost exclusively via American platforms. And that means that the U.S. can read our communications and disrupt our entire society if they decide that they no longer like us.”Zivver had offered an alternative — a European platform governed by EU law. “We are now throwing that away. If you want to share something confidential with a court or government, consider using a typewriter. That’s about all we have left,” Hubert said.Beyond American jurisdiction, Kiteworks’ management raises another layer of concern: its links to Israeli intelligence.Several of the company’s top executives, including CEO Yaron, are veterans of Unit 8200, the elite cyber unit of the Israel Defence Force (IDF). The unit is renowned for its code-breaking abilities and feared for its surveillance operations.
In Israel, there is a revolving door between the army, lobby, business and politics
Unit 8200 has been linked to major cyber operations, including the Stuxnet attack on Iranian nuclear facilities in 2007. More recently, it was accused of orchestrating the detonation of thousands of pagers in Lebanon, an incident the United Nations said violated international law and killed at least two children.The unit employs thousands of young recruits identified for their digital skills. It is able to intercept global telephone and internet traffic.International media have reported that Unit 8200 intercepts and stores an average of one million Palestinian phone calls every hour, data that ends up on Some veterans themselves have also objected to the work of the unit. In 2014, dozens of reservists signed a letter to Israeli leaders saying they no longer wanted to participate in surveillance of the occupied territories.“The lines of communication between the Israeli defence apparatus and the business community have traditionally been very short,” said Dutch intelligence expert Vijver. “In Israel, there is a revolving door between the army, lobby, business and politics.”That revolving door is clearly visible in big U.S. tech companies — and Kiteworks is no exception.Aside from Yaron, both Chief Business Officer Yaron Galant and Chief Product Officer served in Unit 8200, according to publicly available information.They played a direct role in negotiating the acquisition of Zivver. Their background was known to Zivver’s directors Goud and Klinkhamer at the time.
What is transpiring within the European Union? What are the goals and aspirations of the EU, and how is the budget allocated?
Other senior figures also have military intelligence backgrounds. Product director Ron Margalit worked in Unit 8200 before serving in the office of Israeli Prime Minister Benjamin Netanyahu. Mergers and acquisitions director Uri Kedem is a former Israeli naval captain.Kiteworks is not unique in this respect. Increasing numbers of U.S. cybersecurity firms now employ former Israeli intelligence officers. This trend, experts say, creates vulnerabilities that are rarely discussed.An independent researcher quoted by U.S. Drop Site News said: “Not all of these veterans will send classified data to Tel Aviv. But the fact that so many former spies work for these companies does create a serious vulnerability: no other country has such access to the Or, as the ex-intelligence regulator Hubert put it: “Gaining access to communication flows is part of Israel’s long-term strategy. A company like Zivver fits perfectly into that strategy.”The information handled by Zivver — confidential communications between governments, hospitals and citizens — is a potential goldmine for intelligence services.According to intelligence expert Vijver, access to this kind of material makes it easier to pressure individuals into cooperating with intelligence agencies. Once an intelligence service has access to medical, financial and personal data, it can more easily pressure people into spying for it, he said.But the gain for intelligence services lies not just in sensitive information, said Hubert: “Any data that allows an agency to tie telephone numbers, addresses or payment data to an individual is of great interest to them.” He added: “It is exactly this type of data that is abundantly present in communications between civilians, governments and care institutions. In other words, the information that flows through a company like Zivver is extremely valuable for intelligence services.”These geopolitical concerns become more pronounced when combined with technical worries about Zivver’s encryption. For years, Zivver presented itself as a European alternative that guaranteed privacy. Its marketing materials claimed that messages were encrypted on the sender’s device and that the company had “zero access” to content. But an investigation by two cybersecurity experts at a Dutch government agency, at the request of Follow the Money, undermines this claim.
The experts, who participated in the investigation on condition of anonymity, explored what happened when that government agency logged into Zivver’s web application to send information.Tests showed that when government users sent messages through Zivver’s web application, the content — including attachments — was uploaded to Zivver’s servers as readable text before being encrypted. The same process applied to email addresses of senders and recipients.“In these specific cases, Zivver processed the messages in readable form,” said independent cybersecurity researcher Matthijs Koot, who verified the findings. “Even if only briefly, technically speaking it is possible that Zivver was able to view these messages,” he said.He added: “Whether a message is encrypted at a later stage makes little difference. It may help against hackers, but it no longer matters in terms of protection against Zivver.”Despite these findings, Zivver continues to insist on its website and in promotional material elsewhere — including on the U.K. government’s Digital Marketplace — that “contents of secure messages are inaccessible to Zivver and third parties”.So far, no evidence has surfaced that Zivver misused its technical access. But now that the company is owned by Kiteworks, experts see a heightened risk.Former intelligence officer Vijver puts it bluntly: “Given the links between Zivver, Kiteworks and Unit 8200, I believe there is zero chance that no data is going to Israel. To think otherwise is completely naive.”The sale of Zivver could technically have been blocked or investigated under Dutch law. According to the Security Assessment of Investments, Mergers and Acquisitions Act, such sensitive takeovers are supposed to be reviewed by a specialised agency. But the Dutch interior ministry declared that Zivver was not part of the country’s “critical infrastructure,” meaning that no review was carried out.That, in Hubert’s view, was “a huge blunder”.
“It’s bad enough that a company that plays such an important role in government communications is falling into American hands, but the fact that there are all kinds of Israeli spies there is very serious,” he said.“The takeover is taking place in an unsafe world full of geopolitical tensions”Experts say the Zivver case highlights Europe’s lack of strategic control over its digital infrastructure.Mariëtte van Huijstee of the Netherlands-based said: “I doubt whether the security of sensitive emails and files … should be left to the private sector. And if you think that is acceptable, should we leave it to non-European parties over whom we have no control?”“We need to think much more strategically about our digital infrastructure and regulate these kinds of issues much better, for example by designating encryption services as vital infrastructure,” she added.Zivver, for its part, claimed that security will improve under Kiteworks. Zivver’s full responses to Follow the Money’s questions can be read here and here.But Van Huijstee was not convinced.“Kiteworks employs people who come from a service that specialises in decrypting files,” she said. “The takeover is taking place in an unsafe world full of geopolitical tensions, and we are dealing with data that is very valuable. In such a case, trust is not enough and more control is needed.”
...
Read the original on www.ftm.eu »
Given that there would only be one service, it made sense to move all the destination code into one repo, which meant merging all the different dependencies and tests into a single repo. We knew this was going to be messy.
For each of the 120 unique dependencies, we committed to having one version for all our destinations. As we moved destinations over, we’d check the dependencies it was using and update them to the latest versions. We fixed anything in the destinations that broke with the newer versions.
With this transition, we no longer needed to keep track of the differences between dependency versions. All our destinations were using the same version, which significantly reduced the complexity across the codebase. Maintaining destinations now became less time consuming and less risky.
We also wanted a test suite that allowed us to quickly and easily run all our destination tests. Running all the tests was one of the main blockers when making updates to the shared libraries we discussed earlier.
Fortunately, the destination tests all had a similar structure. They had basic unit tests to verify our custom transform logic was correct and would execute HTTP requests to the partner’s endpoint to verify that events showed up in the destination as expected.
Recall that the original motivation for separating each destination codebase into its own repo was to isolate test failures. However, it turned out this was a false advantage. Tests that made HTTP requests were still failing with some frequency. With destinations separated into their own repos, there was little motivation to clean up failing tests. This poor hygiene led to a constant source of frustrating technical debt. Often a small change that should have only taken an hour or two would end up requiring a couple of days to a week to complete.
The outbound HTTP requests to destination endpoints during the test run was the primary cause of failing tests. Unrelated issues like expired credentials shouldn’t fail tests. We also knew from experience that some destination endpoints were much slower than others. Some destinations took up to 5 minutes to run their tests. With over 140 destinations, our test suite could take up to an hour to run.
To solve for both of these, we created Traffic Recorder. Traffic Recorder is built on top of yakbak, and is responsible for recording and saving destinations’ test traffic. Whenever a test runs for the first time, any requests and their corresponding responses are recorded to a file. On subsequent test runs, the request and response in the file is played back instead requesting the destination’s endpoint. These files are checked into the repo so that the tests are consistent across every change. Now that the test suite is no longer dependent on these HTTP requests over the internet, our tests became significantly more resilient, a must-have for the migration to a single repo.
It took milliseconds to complete running the tests for all 140+ of our destinations after we integrated Traffic Recorder. In the past, just one destination could have taken a couple of minutes to complete. It felt like magic.
Once the code for all destinations lived in a single repo, they could be merged into a single service. With every destination living in one service, our developer productivity substantially improved. We no longer had to deploy 140+ services for a change to one of the shared libraries. One engineer can deploy the service in a matter of minutes.
The proof was in the improved velocity. When our microservice architecture was still in place, we made 32 improvements to our shared libraries. One year later, we’ve made 46 improvements.
The change also benefited our operational story. With every destination living in one service, we had a good mix of CPU and memory-intense destinations, which made scaling the service to meet demand significantly easier. The large worker pool can absorb spikes in load, so we no longer get paged for destinations that process small amounts of load.
Moving from our microservice architecture to a monolith overall was huge improvement, however, there are trade-offs:
Fault isolation is difficult. With everything running in a monolith, if a bug is introduced in one destination that causes the service to crash, the service will crash for all destinations. We have comprehensive automated testing in place, but tests can only get you so far. We are currently working on a much more robust way to prevent one destination from taking down the entire service while still keeping all the destinations in a monolith.
In-memory caching is less effective. Previously, with one service per destination, our low traffic destinations only had a handful of processes, which meant their in-memory caches of control plane data would stay hot. Now that cache is spread thinly across 3000+ processes so it’s much less likely to be hit. We could use something like Redis to solve for this, but then that’s another point of scaling for which we’d have to account. In the end, we accepted this loss of efficiency given the substantial operational benefits.
Updating the version of a dependency may break multiple destinations. While moving everything to one repo solved the previous dependency mess we were in, it means that if we want to use the newest version of a library, we’ll potentially have to update other destinations to work with the newer version. In our opinion though, the simplicity of this approach is worth the trade-off. And with our comprehensive automated test suite, we can quickly see what breaks with a newer dependency version.
Our initial microservice architecture worked for a time, solving the immediate performance issues in our pipeline by isolating the destinations from each other. However, we weren’t set up to scale. We lacked the proper tooling for testing and deploying the microservices when bulk updates were needed. As a result, our developer productivity quickly declined.
Moving to a monolith allowed us to rid our pipeline of operational issues while significantly increasing developer productivity. We didn’t make this transition lightly though and knew there were things we had to consider if it was going to work.
We needed a rock solid testing suite to put everything into one repo. Without this, we would have been in the same situation as when we originally decided to break them apart. Constant failing tests hurt our productivity in the past, and we didn’t want that happening again. We accepted the trade-offs inherent in a monolithic architecture and made sure we had a good story around each. We had to be comfortable with some of the sacrifices that came with this change.
When deciding between microservices or a monolith, there are different factors to consider with each. In some parts of our infrastructure, microservices work well but our server-side destinations were a perfect example of how this popular trend can actually hurt productivity and performance. It turns out, the solution for us was a monolith.
The transition to a monolith was made possible by Stephen Mathieson, Rick Branson, Achille Roussel, Tom Holmes, and many more.
Special thanks to Rick Branson for helping review and edit this post at every stage.
...
Read the original on www.twilio.com »
Loved reading through GReg TeChnoLogY Anthony Bourdain’s Lost Li.st’s and seeing the list of lost Anthony Bourdain li.st’s made me think on whether at least some of them we can recover.
Having worked in security and crawling space for majority of my career—I don’t have the access nor permission to use the proprietary storages—I thought we might be able to find something from publicly available crawl archives.
All of the code and examples lead to the source git repository. This article has also been discussed on hackernews. Also, a week before I published this, mirandom had the same idea as me and published their findings—go check them out.
If Internet Archive had the partial list that Greg published, what about the Common Crawl? Reading through their documentation, it seems straightforward enough to get prefix index for Tony’s lists and grep for any sub-paths.
Putting something up with help of Claude to prove my theory, we have commoncrawl_search.py that makes a single index request to a specific dataset and if any hits discovered, retrieve them from the public s3 bucket—since they are small straight-up HTML documents, seems even more feasible than I had initially thought.
Simply have a python version around 3.14.2 and install the dependencies from requirements.txt. Run the below and we are in business. Now, below, you’ll find the command I ran and then some manual archeological effort to prettify the findings.
Images have been lost. Other avenues had struck no luck. I’ll try again later.
Any and all emphasis, missing punctuation, cool grammar is all by Anthony Bourdain. The only modifications I have made is to the layout, to represent li.st as closely as possible with no changes to the content.
If you see these blocks, that’s me commenting if pictures have been lost.
From Greg’s page, let’s go and try each entry one by one, I’ll put the table of what I wasn’t able to find in Common Crawl, but I would assume exists elsewhere—I’d be happy to take another look. And no, none of this above has been written by AI, only the code since I don’t really care about warcio encoding or writing the same python requests method for the Nth time. Enjoy!
Things I No Longer Have Time or Patience For
Dinners where it takes the waiter longer to describe my food than it takes me to eat it.
I admit it: my life doesn’t suck. Some recent views I’ve enjoyed
Montana at sunset : There’s pheasant cooking behind the camera somewhere. To the best of my recollection some very nice bourbon. And it IS a big sky .
Puerto Rico: Thank you Jose Andres for inviting me to this beautiful beach!
Naxos: drinking ouzo and looking at this. Not a bad day at the office .
Istanbul: raki and grilled lamb and this ..
Borneo: The air is thick with hints of durian, sambal, coconut..
Chicago: up early to go train #Redzovic
If I Were Trapped on a Desert Island With Only Three Tv Series
Edge of Darkness (with Bob Peck and Joe Don Baker )
The Film Nobody Ever Made
Dreamcasting across time with the living and the dead, this untitled, yet to be written masterwork of cinema, shot, no doubt, by Christopher Doyle, lives only in my imagination.
If you bought these vinyls from an emaciated looking dude with an eager, somewhat distracted expression on his face somewhere on upper Broadway sometime in the mid 80’s, that was me . I’d like them back. In a sentimental mood.
material things I feel a strange, possibly unnatural attraction to and will buy (if I can) if I stumble across them in my travels. I am not a paid spokesperson for any of this stuff .
Vintage Persol sunglasses : This is pretty obvious. I wear them a lot. I collect them when I can. Even my production team have taken to wearing them.
19th century trepanning instruments: I don’t know what explains my fascination with these devices, designed to drill drain-sized holes into the skull often for purposes of relieving “pressure” or “bad humours”. But I can’t get enough of them. Tip: don’t get a prolonged headache around me and ask if I have anything for it. I do.
Montagnard bracelets: I only have one of these but the few that find their way onto the market have so much history. Often given to the indigenous mountain people ’s Special Forces advisors during the very early days of America’s involvement in Vietnam .
Jiu Jitsi Gi’s: Yeah. When it comes to high end BJJ wear, I am a total whore. You know those people who collect limited edition Nikes ? I’m like that but with Shoyoroll . In my defense, I don’t keep them in plastic bags in a display case. I wear that shit.
Voiture: You know those old school, silver plated (or solid silver) blimp like carts they roll out into the dining room to carve and serve your roast? No. Probably not. So few places do that anymore. House of Prime Rib does it. Danny Bowein does it at Mission Chinese. I don’t have one of these. And I likely never will. But I can dream.
Kramer knives: I don’t own one. I can’t afford one . And I’d likely have to wait for years even if I could afford one. There’s a long waiting list for these individually hand crafted beauties. But I want one. Badly. http://www.kramerknives.com/gallery/
R. CRUMB : All of it. The collected works. These Taschen volumes to start. I wanted to draw brilliant, beautiful, filthy comix like Crumb until I was 13 or 14 and it became clear that I just didn’t have that kind of talent. As a responsible father of an 8 year old girl, I just can’t have this stuff in the house. Too dark, hateful, twisted. Sigh…
THE MAGNIFICENT AMBERSONS : THE UNCUT, ORIGINAL ORSON WELLES VERSION: It doesn’t exist. Which is why I want it. The Holy Grail for film nerds, Welles’ follow up to CITIZEN KANE shoulda, coulda been an even greater masterpiece . But the studio butchered it and re-shot a bullshit ending. I want the original. I also want a magical pony.
Four Spy Novels by Real Spies and One Not by a Spy
I like good spy novels. I prefer them to be realistic . I prefer them to be written by real spies. If the main character carries a gun, I’m already losing interest. Spy novels should be about betrayal.
Ashenden–Somerset Maugham
Somerset wrote this bleak, darkly funny, deeply cynical novel in the early part of the 20th century. It was apparently close enough to the reality of his espionage career that MI6 insisted on major excisions. Remarkably ahead of its time in its atmosphere of futility and betrayal.
The Man Who Lost the War–WT Tyler
WT Tyler is a pseudonym for a former “foreign service” officer who could really really write. This one takes place in post-war Berlin and elsewhere and was, in my opinion, wildly under appreciated. See also his Ants of God.
The Human Factor–Graham Greene
Was Greene thinking of his old colleague Kim Philby when he wrote this? Maybe. Probably. See also Our Man In Havana.
The Tears of Autumn -Charles McCarry
A clever take on the JFK assassination with a Vietnamese angle. See also The Miernik Dossier and The Last Supper
Agents of Innocence–David Ignatius
Ignatius is a journalist not a spook, but this one, set in Beirut, hewed all too closely to still not officially acknowledged events. Great stuff.
I wake up in a lot of hotels, so I am fiercely loyal to the ones I love. A hotel where I know immediately wher I am when I open my eyes in the morning is a rare joy. Here are some of my favorites
CHATEAU MARMONT ( LA) : if I have to die in a hotel room, let it be here. I will work in LA just to stay at the Chateau.
CHILTERN FIREHOUSE (London): Same owner as the Chateau. An amazing Victorian firehouse turned hotel. Pretty much perfection
EDGEWATER INN (Seattle): kind of a lumber theme going on…ships slide right by your window. And the Led Zep “Mudshark incident”.
THE METROPOLE (Hanoi): there’s a theme developing: if Graham Greene stayed at a hotel, chances are I will too.
THE MURRAY (Livingston,Montana): You want the Peckinpah suite
Pictures in each have not been recovered.
5 Photos on My Phone, Chosen at Random
Shame, indeed, no pictures, there was one for each.
People I’d Like to Be for a Day
I’m Hungry and Would Be Very Happy to Eat Any of This Right Now
Spaghetti a la bottarga . I would really, really like some of this. Al dente, lots of chili flakes
A street fair sausage and pepper hero would be nice. Though shitting like a mink is an inevitable and near immediate outcome
Some uni. Fuck it. I’ll smear it on an English muffin at this point.
I wonder if that cheese is still good?
In which my Greek idyll is Suddenly invaded by professional nudists
T-shirt and no pants. Leading one to the obvious question : why bother?
The cheesy crust on the side of the bowl of Onion Soup Gratinee
Before he died, Warren Zevon dropped this wisdom bomb: “Enjoy every sandwich”. These are a few locals I’ve particularly enjoyed:
PASTRAMI QUEEN: (1125 Lexington Ave. ) Pastrami Sandwich. Also the turkey with Russian dressing is not bad. Also the brisket.
EISENBERG’S SANDWICH SHOP: ( 174 5th Ave.) Tuna salad on white with lettuce. I’d suggest drinking a lime Rickey or an Arnold Palmer with that.
THE JOHN DORY OYSTER BAR: (1196 Broadway) the Carta di Musica with Bottarga and Chili is amazing. Is it a sandwich? Yes. Yes it is.
RANDOM STREET FAIRS: (Anywhere tube socks and stale spices are sold. ) New York street fairs suck. The same dreary vendors, same bad food. But those nasty sausage and pepper hero sandwiches are a siren song, luring me, always towards the rocks. Shitting like a mink almost immediately after is guaranteed but who cares?
BARNEY GREENGRASS : ( 541 Amsterdam Ave.) Chopped Liver on rye. The best chopped liver in NYC.
SIBERIA in any of its iterations. The one on the subway being the best
LADY ANNES FULL MOON SALOON a bar so nasty I’d bring out of town visitors there just to scare them
KELLY’S on 43rd and Lex. Notable for 25 cent drafts and regularly and reliably serving me when I was 15
BILLY’S TOPLESS (later, Billy’s Stopless) an atmospheric, working class place, perfect for late afternoon drinking where nobody hustled you for money and everybody knew everybody. Great all-hair metal jukebox . Naked breasts were not really the point.
THE BAR AT HAWAII KAI. tucked away in a giant tiki themed nightclub in Times Square with a midget doorman and a floor show. Best place to drop acid EVER.
THE NURSERY after hours bar decorated like a pediatrician’s office. Only the nursery rhyme characters were punk rockers of the day.
It was surprising to see that only one page was not recoverable from the common crawl.
I’ve enjoyed this little project tremendously—a little archeology project. Can we declare victory for at least this endeavor? Hopefully, we would be able to find images, but that’s a little tougher, since that era’s cloudfront is fully gone.
What else can we work on restoring and setting up some sort of a public archive to store them? I made this a git repository for the sole purpose so that anyone interested can contribute their interest and passion for these kinds of projects.
Thank you and until next time! ◼︎
...
Read the original on sandyuraz.com »
Yesterday I shared a little program called Mark V. Shaney Junior at github.com/susam/mvs. It is a minimal implementation of a Markov text generator inspired by the legendary Mark V. Shaney program from the 1980s. Mark V. Shaney was a synthetic Usenet user that posted messages to various newsgroups using text generated by a Markov model. See the Wikipedia article Mark
V. Shaney for more details about it. In this post, I will discuss my implementation of the model, explain how it works and share some of the results produced by it.
The program I shared yesterday has only about 30 lines of
Python and favours simplicity over efficiency. Even if you have never worked with Markov models before, I am quite confident that it will take you less than 20 minutes to understand the whole program and make complete sense of it. I also offer an explanation further below in this post.
As a hobby, I often engage in exploratory programming where I write computer programs not to solve a specific problem but simply to explore a particular idea or topic for the sole purpose of recreation. I must have written small programs to explore Markov chains for various kinds of state spaces over a dozen times by now. Every time, I just pick my last experimental code and edit it to encode the new state space I am exploring. That’s usually my general approach to such one-off programs. I have hundreds of tiny little experimental programs lying on my disk at any given time.
Once in a while, I get the itch to take one of those exploratory programs, give it some finishing touches, wrap it up in a nice Git repo along with a README.md, CHANGES.md
and the whole shebang and share it on github.com/susam and codeberg.org/susam. The Mark V. Shaney Junior program that I shared yesterday happened to be one such exercise.
If you scroll down the README of this project, you’ll find some nice examples of the gibberish produced by this program. The first few examples there are the result of training the model on A Christmas Carol by Charles Dickens, one of my favourite authors. It is often said that Dickens never used fewer words when more would suffice. So I thought there couldn’t be a better piece of text when it comes to testing out my tiny Markov model. I’ll not reproduce the generated text examples here for the sake of brevity. If you are interested to take a look, just head over to the Gibberish
section of the README.
Soon after sharing the project, I wondered what kind of gibberish it would produce if I fed all 24 years of my blog posts and pages into the program. Well, here’s one of the results:
$ ./mvs < susam.txt
while a query replace operation is approved by the user. The above
variable defines the build job. It can be incredibly useful while
working on assembly language and machine code. In fact, all internal
resources like the result to refine the search prompt changes from
bck-i-search: to fwd-i-search:. Now type C-SPC (i.e. ctrl+space) to
set a mark causes Emacs to use 32-bit registers like EBP, ESP,
etc. Thus the behaviour is undefined. Such code may behave differently
when compiled with the readily available GNU tools like the shape
of 8. Flipping “P” horizontally makes it a proper quine: cat $0
This is the text that comes out after the program consumes over 200 posts consisting of about 200,000 words. My blog also has a comments section with over 500 comments consisting of about 40,000 words. All comments were excluded while training the model. Here is another output example:
$ ./mvs < susam.txt
enjoy asking “what happens if” and then type M-x zap-up-to-char RET
b. The buffer for this specific video, the actual fare for 8.3 km and
11 are all written from scratch. No prior knowledge is expected to
slow down in future. For now, I will add a statement like x =
0.999…, the inner corner square as discussed in the code segment
into the REPL window. Unlike Slimv, Vlime can work with and debug
executable files, it can be more convenient. M-x: Execute Extended
Command The key sequence M-q invokes the command cat and type TAB to
indent the current
Here is a particularly incoherent but amusing one:
$ ./mvs < susam.txt
Then open a new Lisp source file and the exact answer could harm
students’ self-esteem. Scientists have arbitrarily assumed that an
integral domain. However, the string and comment text. To demonstrate
how a build job can trigger itself, pass input to standard output or
standard error), Eshell automatically runs the following command in
Vim and Emacs will copy the message length limit of 512 characters,
etc. For example, while learning to play the game between normal mode
to move the point is on an old dictionary lying around our house and
that is moving to the small and supportive community
No, I have never said anywhere that opening a Lisp source file could harm anyone’s self-esteem. The text generator has picked up the ‘Lisp source file’ phrase from my Lisp in Vim post and the ‘self-esteem’ bit from the From Perl
to Pi post.
By default, this program looks at trigrams (all sequences of three adjacent words) and creates a map where the first two words of the trigram are inserted as the key and the third word is appended to its list value. This map is the model. In this way, the model captures each pair of adjacent words along with the words that immediately follow each pair. The text generator first chooses a key (a pair of words) at random and selects a word that follows. If there are multiple followers, it picks one uniformly at random. It then repeats this process with the most recent pair of words, consisting of one word from the previous pair and the word that was just picked. It continues to do this until it can no longer find a follower or a fixed word limit (100 by default) is reached. That is pretty much the whole algorithm. There isn’t much more to it. It is as simple as it gets. For that reason, I often describe a simple Markov model like this as the ‘hello, world’ for language models.
If the same trigram occurs multiple times in the training data, the model records the follower word (the third word) multiple times in the list associated with the key (the first two words). This representation can be optimised, of course, by keeping frequencies of the follower words rather than duplicating them in the list, but that is left as an exercise to the reader. In any case, when the text generator chooses a follower for a given pair of words, a follower that occurs more frequently after that pair has a higher probability of being chosen. In effect, the next word is sampled based only on the previous two words and not on the full history of the generated text. This memoryless dependence on the current state is what makes the generator Markov. Formally, for a discrete-time stochastic process, the Markov property can be expressed as
\[ P(X_{n+1} \mid X_n, X_{n-1}, \ldots, X_1) = P(X_{n+1} \mid X_n). \]
where \( X_n \) represents the \( n \)th state. In our case, each state \( X_n \) is a pair of words \( (w_{n-1}, w_{n}) \) but the state space could just as well consist of other objects, such as a pair of characters, pixel values or musical notes. The sequence of states \( (X_1, X_2, \dots) \) visited by the program forms a Markov chain. The left-hand side of the equation denotes the conditional distribution of the next state \( X_{n+1} \) given the entire history of states \( X_1, X_2, \dots, X_n, \) while the right-hand side conditions only on the current state. When both are equal, it means that the probability of the next state depends only on the current state and not on the earlier states. This is the Markov property. It applies to the text generation process only, not the training data. The training data is used only to estimate the transition probabilities of the model.
In 2025, given the overwhelming popularity of large language models (LLMs), Markov models like this look unimpressive. Unlike LLMs, a simple Markov model cannot capture global structure or long-range dependencies within the text. It relies entirely on local word transition statistics. Also, these days, one hardly needs a Markov model to generate gibberish; social media provides an ample supply. Nevertheless, I think the simplicity of its design and implementation serves as a good entry point into language models.
In my implementation, the number of words in the key of the map can be set via command line arguments. By default, it is 2 as described above. This value is also known as the order of the model. So by default the order is 2. If we increase it to, say, 3 or 4, the generated text becomes a little more coherent. Here is one such example:
$ ./mvs 4 < susam.txt
It is also possible to search for channels by channel names. For
example, on Libera Chat, to search for all channels with ‘python’ in
its name, enter the IRC command: /msg alis list python. Although I
have used Libera Chat in the examples above, there are plenty of
infinite fields, so they must all be integral domains too. Consider
the field of rational numbers Q. Another quick way to arrive at this
fact is to observe that when one knight is placed on a type D square,
only two positions for the second knight such that the two knights
attack
Except for a couple of abrupt and meaningless transitions, the text is mostly coherent. We need to be careful about not increasing the order too much. In fact, if we increase the order of the model to 5, the generated text becomes very dry and factual because it begins to quote large portions of the blog posts verbatim. Not much fun can be had with that.
Before I end this post, let me present one final example where I ask it to generate text from an initial prompt:
$ ./mvs 2 100 ‘Finally we’ < susam.txt
Finally we divide this number by a feed aggregrator for Emacs-related
blogs. The following complete key sequences describe the effects of
previous evaluations shall have taken a simple and small to contain
bad content. This provides an interactive byte-compiled Lisp function
in MATLAB and GNU bash 5.1.4 on Debian is easily reproducible in
Windows XP. Older versions might be able to run that server for me it
played a significant burden on me as soon as possible. C-u F: Visit
the marked files or directories in the sense that it was already
initiated and we were to complete the proof.
Apparently, this is how I would sound if I ever took up speaking gibberish!
...
Read the original on susam.net »
Part of the Accepted! series, explaining the upcoming Go changes in simple terms. The new runtime/secret package lets you run a function in secret mode. After the function finishes, it immediately erases (zeroes out) the registers and stack it used. Heap allocations made by the function are erased as soon as the garbage collector decides they are no longer reachable.secret.Do(func() {
// Generate a session key and
// use it to encrypt the data.
This helps make sure sensitive information doesn’t stay in memory longer than needed, lowering the risk of attackers getting to it.The package is experimental and is mainly for developers of cryptographic libraries, not for application developers.Cryptographic protocols like WireGuard or TLS have a property called “forward secrecy”. This means that even if an attacker gains access to long-term secrets (like a private key in TLS), they shouldn’t be able to decrypt past communication sessions. To make this work, session keys (used to encrypt and decrypt data during a specific communication session) need to be erased from memory after they’re used. If there’s no reliable way to clear this memory, the keys could stay there indefinitely, which would break forward secrecy.In Go, the runtime manages memory, and it doesn’t guarantee when or how memory is cleared. Sensitive data might remain in heap allocations or stack frames, potentially exposed in core dumps or through memory attacks. Developers often have to use unreliable “hacks” with reflection to try to zero out internal buffers in cryptographic libraries. Even so, some data might still stay in memory where the developer can’t reach or control it.The solution is to provide a runtime mechanism that automatically erases all temporary storage used during sensitive operations. This will make it easier for library developers to write secure code without using workarounds.Add the runtime/secret package with Do and Enabled functions:// Do invokes f.
// Do ensures that any temporary storage used by f is erased in a
// timely manner. (In this context, “f” is shorthand for the
// entire call tree initiated by f.)
// - Any registers used by f are erased before Do returns.
// - Any stack used by f is erased before Do returns.
// - Any heap allocation done by f is erased as soon as the garbage
// collector realizes that it is no longer reachable.
// - Do works even if f panics or calls runtime.Goexit. As part of
// that, any panic raised by f will appear as if it originates from
// Do itself.
func Do(f func())
// Enabled reports whether Do appears anywhere on the call stack.
func Enabled() bool
The current implementation has several limitations:Only supported on linux/amd64 and linux/arm64. On unsupported platforms, Do invokes f directly.Protection does not cover any global variables that f writes to.Trying to start a goroutine within f causes a panic.If f calls runtime.Goexit, erasure is delayed until all deferred functions are executed.Heap allocations are only erased if ➊ the program drops all references to them, and ➋ then the garbage collector notices that those references are gone. The program controls the first part, but the second part depends on when the runtime decides to act.If f panics, the panicked value might reference memory allocated inside f. That memory won’t be erased until (at least) the panicked value is no longer reachable.Pointer addresses might leak into data buffers that the runtime uses for garbage collection. Do not put confidential information into pointers.The last point might not be immediately obvious, so here’s an example. If an offset in an array is itself secret (you have a data array and the secret key always starts at data[100]), don’t create a pointer to that location (don’t create a pointer p to &data[100]). Otherwise, the garbage collector might store this pointer, since it needs to know about all active pointers to do its job. If someone launches an attack to access the GC’s memory, your secret offset could be exposed.The package is mainly for developers who work on cryptographic libraries. Most apps should use higher-level libraries that use secret.Do behind the scenes.As of Go 1.26, the runtime/secret package is experimental and can be enabled by setting GOEXPERIMENT=runtimesecret at build time.Use secret.Do to generate a session key and encrypt a message using AES-GCM:// Encrypt generates an ephemeral key and encrypts the message.
// It wraps the entire sensitive operation in secret.Do to ensure
// the key and internal AES state are erased from memory.
func Encrypt(message []byte) ([]byte, error) {
var ciphertext []byte
var encErr error
secret.Do(func() {
// 1. Generate an ephemeral 32-byte key.
// This allocation is protected by secret.Do.
key := make([]byte, 32)
if _, err := io.ReadFull(rand.Reader, key); err != nil {
encErr = err
return
// 2. Create the cipher (expands key into round keys).
// This structure is also protected.
block, err := aes.NewCipher(key)
if err != nil {
encErr = err
return
gcm, err := cipher.NewGCM(block)
if err != nil {
encErr = err
return
nonce := make([]byte, gcm.NonceSize())
if _, err := io.ReadFull(rand.Reader, nonce); err != nil {
encErr = err
return
// 3. Seal the data.
// Only the ciphertext leaves this closure.
ciphertext = gcm.Seal(nonce, nonce, message, nil)
return ciphertext, encErr
Note that secret.Do protects not just the raw key, but also the cipher.Block structure (which contains the expanded key schedule) created inside the function.This is a simplified example, of course — it only shows how memory erasure works, not a full cryptographic exchange. In real situations, the key needs to be shared securely with the receiver (for example, through key exchange) so decryption can work.★ Subscribe to keep up with new posts.Gist of Go: Concurrency is out! →
...
Read the original on antonz.org »
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.