10 interesting stories served every morning and every evening.




1 437 shares, 22 trendiness

Should You Trust Your VPN Location?

In a large-scale analy­sis of 20 pop­u­lar VPNs, IPinfo found that 17 of those VPNs exit traf­fic from dif­fer­ent coun­tries than they claim. Some claim 100+ coun­tries, but many of them point to the same hand­ful of phys­i­cal data cen­ters in the US or Europe. That means the ma­jor­ity of VPN providers we an­a­lyzed don’t route your traf­fic via the coun­tries they claim to, and they claim many more coun­tries than they ac­tu­ally sup­port. An­a­lyz­ing over 150,000 exit IPs across 137 pos­si­ble exit coun­tries, and com­par­ing what providers claim to what IPinfo mea­sures, shows that:17 in 20 providers had traf­fic ex­it­ing in a dif­fer­ent coun­try.38 coun­tries were virtual-only” in our dataset (claimed by at least one provider, but never ob­served as the ac­tual traf­fic exit coun­try for any provider we tested).We were only able to ver­ify all provider an­nounced lo­ca­tions for 3 providers out of the 20.Across ~150,000 VPN exit IPs tested, ProbeNet, our in­ter­net mea­sure­ment plat­form, de­tected roughly 8,000 cases where widely-used IP datasets placed the server in the wrong coun­try — some­times thou­sands of kilo­me­ters off.This re­port walks through what we saw across VPN and IP data providers, pro­vides a closer look at two par­tic­u­larly in­ter­est­ing coun­tries, ex­plores why mea­sure­ment-based IP data mat­ters if you care where your traf­fic re­ally goes, and shares how we ran the in­ves­ti­ga­tion.Which VPNs Matched Reality (And Which Didn’t)Here is the over­lap be­tween the num­ber of listed coun­tries each VPN provider claims to of­fer ver­sus the coun­tries with real VPN traf­fic that we mea­sured — lower per­cent­ages in­di­cate providers whose claimed lists best match our data:

It’s im­por­tant to note that we used the most com­monly and widely sup­ported tech­nolo­gies in this re­search, to make com­par­i­son be­tween providers as fair as pos­si­ble while giv­ing us sig­nif­i­cant data to an­a­lyze, so this will not be the full cov­er­age for each provider. These are some of the most vis­i­ble names in the mar­ket. They also tend to have very long coun­try lists on their web­sites. Notably, three well-known providers had zero mis­matches across all the coun­tries we tested: Mullvad, IVPN, and Windscribe.Country mis­matches does­n’t au­to­mat­i­cally mean some providers of­fer bad VPNs,” but it does mean that if you’re choos­ing a VPN be­cause it claims 100+ coun­tries,” you should know that a sig­nif­i­cant share of those flags may be la­bels, or vir­tual lo­ca­tions.What Virtual Locations” Really MeanWhen a VPN lets you con­nect to, for ex­am­ple, Bahamas” or Somalia,” that does­n’t al­ways mean traf­fic routes through there. In many cases, it’s some­where en­tirely dif­fer­ent, like Miami or London, but pre­sented as if traf­fic is in the coun­try you picked.This setup is known as a vir­tual lo­ca­tion:The IP reg­istry data also says Country X” — be­cause the provider self-de­clared it that way.But the net­work mea­sure­ments (latency and rout­ing) show the traf­fic ac­tu­ally ex­its in Country Y” — of­ten thou­sands of kilo­me­ters away.The prob­lem? Without ac­tive net­work mea­sure­ment, most IP datasets will rely on what the IPs owner told the in­ter­net reg­istry or pub­lished in WHOIS/geofeeds: a self-re­ported coun­try tag. If that record is wrong or out­dated, the mis­take spreads every­where. That’s where IPinfo’s ProbeNet comes in: by run­ning live RTT tests from 1,200+ points of pres­ence world­wide, we an­chor each IP to its real-world lo­ca­tion, not just its de­clared one.Across the dataset, we found 97 coun­tries where at least one VPN brand only ever ap­peared as vir­tual or un­mea­sur­able in our data. In other words, for a no­tice­able slice of the world map, some locations” in VPNs never show up as true ex­its in our mea­sure­ments. We also found 38 coun­tries where every men­tion be­haved this way: at least one VPN claimed them, but none ever pro­duced a sta­ble, mea­sur­able exit in that coun­try in our sam­ple.You can think of these 38 as the unmeasurable” coun­tries in this study — places that ex­ist in server lists, con­fig files, and IP ge­ofeeds, but never once ap­peared as the ac­tual exit coun­try in our mea­sure­ments. They’re not ran­domly scat­tered — they clus­ter in spe­cific parts of the map. By re­gion, that in­cludes:This does­n’t prove there is zero VPN in­fra­struc­ture in those coun­tries glob­ally. It does show that, across the providers and lo­ca­tions we mea­sured, the dom­i­nant pat­tern is to serve those lo­ca­tions from else­where. Here are three of the most in­ter­est­ing ex­am­ples of how this looks at the IP level.Case Studies: Two Countries That Only Exist on the MapTo make this con­crete, let’s look at three coun­tries where every provider in our dataset turned out to be vir­tual: Bahamas, and Somalia.Bahamas: All-Inclusive, Hosted in the USIn our mea­sure­ments, five providers of­fered lo­ca­tions la­beled as Bahamas”: NordVPN, ExpressVPN, Private Internet Access, FastVPN, and IPVanish.For all of them, mea­sured traf­fic was in the United States, usu­ally with sub-mil­lisec­ond RTT to US probes.

Somalia: Mogadishu, via France and the UKSomalia ap­pears in our sam­ple for only two providers: NordVPN and ProtonVPN. Both la­bel Mogadishu ex­plic­itly in their nam­ing, but these RTTs are ex­actly what you’d ex­pect for traf­fic in Western Europe, and com­pletely in­con­sis­tent with traf­fic in East Africa. Both providers go out of their way in the la­bels (e.g. SO, Mogadishu”), but the ac­tual traf­fic is in Nice and London, not Somalia.

When Legacy IP Providers Agree With the Wrong VPN LocationsSo far, we’ve talked about VPN claims ver­sus our mea­sure­ments. But other IP data providers don’t run ac­tive RTT tests. They rely on self-de­clared IP data sources, and of­ten as­sume that if an IP is tagged as Country X,” it must ac­tu­ally be there. In these cases, the IP legacy datasets typ­i­cally follow” the VPN provider’s story: if the VPN mar­kets the end­point as Country X, the legacy IP dataset also places it in Country X.To quan­tify that, we looked at 736 VPN ex­its where ProbeNet’s mea­sured coun­try dis­agreed with one or more widely used legacy IP datasets.We then com­pared the coun­try IPinfo’s ProbeNet mea­sured (backed by RTT and rout­ing) with the coun­try re­ported by these other IP datasets and com­puted the dis­tance be­tween them. The gaps are large:How Far Off Were the Other IP Datasets?

The me­dian er­ror be­tween ProbeNet and the legacy datasets was roughly 3,100 km. On the ProbeNet side, we have strong la­tency ev­i­dence that our mea­sured coun­try is the right one:The me­dian min­i­mum RTT to a probe in the mea­sured coun­try was 0.27 ms. About 90% of these lo­ca­tions had a sub-mil­lisec­ond RTT from at least one probe.That’s what you ex­pect when traf­fic is gen­uinely in that coun­try, not thou­sands of kilo­me­ters away.An IP Example You Can Test YourselfThis be­hav­ior is much more tan­gi­ble if you can see it on a sin­gle IP. Here’s one VPN exit IP where ProbeNet places the server in the United Kingdom, backed by sub-mil­lisec­ond RTT from lo­cal probes, while other widely used legacy IP datasets place the same IP in Mauritius, 9,691 kilo­me­ters away.If you want to check this your­self, you can plug it into a pub­lic mea­sure­ment tool like https://​ping.sx/ and run pings or tracer­outes from dif­fer­ent re­gions. Tools like this one pro­vide a clear vi­sual for where la­tency is low­est.ProbeNet uses the same ba­sic idea, but at a dif­fer­ent scale: we main­tain a net­work of 1,200+ points of pres­ence (PoPs) around the world, so we can usu­ally get even closer to the real phys­i­cal lo­ca­tion than pub­lic tools with smaller net­works.If you’d like to play with more real IPs (not nec­es­sar­ily VPNs) where ProbeNet and IPinfo get the coun­try right and other datasets don’t, you can find a fuller set of ex­am­ples on our IP ge­olo­ca­tion ac­cu­racy page.Why This Happens and How It Impacts TrustIt’s worth sep­a­rat­ing tech­ni­cal rea­sons from trust is­sues. There are tech­ni­cal rea­sons to use vir­tual or hubbed in­fra­struc­ture:Risk & reg­u­la­tion. Hosting in cer­tain coun­tries can ex­pose both the provider and users to lo­cal sur­veil­lance or seizure.In­fra­struc­ture qual­ity. Some re­gions sim­ply don’t have the same den­sity of re­li­able data cen­ters or high-ca­pac­ity in­ter­net links, so run­ning servers there is harder and riskier.Per­for­mance & cost. Serving Bahamas” from Miami or Cambodia” from Singapore can be cheaper, faster, and eas­ier to main­tain.From this per­spec­tive, a vir­tual lo­ca­tion can be a rea­son­able com­pro­mise: you get a re­gional IP and con­tent un­block­ing with­out the down­sides of host­ing in a frag­ile en­vi­ron­ment.Where It Becomes a Trust ProblemLack of dis­clo­sure. Marking some­thing clearly as Virtual Bahamas (US-based)” is trans­par­ent. Listing Bahamas” along­side Germany” with­out any hint that one is vir­tual and the other is phys­i­cal blurs the line be­tween mar­ket­ing and re­al­ity.Scale of the mis­match. It’s one thing to have a few vir­tual lo­ca­tions in hard-to-host places. It’s an­other when dozens of coun­tries ex­ist only as la­bels across your en­tire foot­print, or when more than half of your tested lo­ca­tions are ac­tu­ally some­where else.Down­stream re­liance. Journalists, ac­tivists, and NGOs may pick lo­ca­tions based on safety as­sump­tions. Fraud sys­tems, com­pli­ance work­flows, and geo-re­stricted ser­vices may treat Somalia” vs France” as a mean­ing­ful dif­fer­ence. If both the VPN UI and the IP data say Somalia” while the traf­fic is phys­i­cally in France, every­one is mak­ing de­ci­sions on a false premise.That last point leads di­rectly into the IP data prob­lem that we are fo­cused on solv­ing.So How Much Should You Trust Your VPN?If you’re a VPN user, here are some prac­ti­cal take­aways from this work:Treat 100+ coun­tries” as a mar­ket­ing num­ber, not a guar­an­tee. In our sam­ple, 97 coun­tries ex­isted only as claims, not re­al­ity, across 17 providers.Check how your provider talks about lo­ca­tions. Do they clearly la­bel virtual” servers? Document where they’re ac­tu­ally hosted? Or do they qui­etly mix vir­tual and phys­i­cal lo­ca­tions in one long list?If you rely on IP data pro­fes­sion­ally, ask where it comes from. A sta­tic 99.x% ac­cu­rate world­wide” claim does­n’t tell you how an IP data provider han­dles fast-mov­ing, high-stakes en­vi­ron­ments like VPN in­fra­struc­ture.Ul­ti­mately, this is­n’t an ar­gu­ment against VPNs, or even against vir­tual lo­ca­tions. It’s an ar­gu­ment for hon­esty and ev­i­dence. If a VPN provider wants you to trust that map of flags, they should be will­ing, and able, to show that it matches the real net­work un­der­neath.Most legacy IP data providers rely on re­gional in­ter­net reg­istry (RIR) al­lo­ca­tion data and heuris­tics around rout­ing and ad­dress blocks. These providers will of­ten ac­cept self-de­clared data like cus­tomer feed­back, cor­rec­tions, and ge­ofeeds, with­out a clear way to ver­ify them. Pro­pri­etary ProbeNet with 1,200+ points of pres­ence

We main­tain an in­ter­net mea­sure­ment plat­form of PoPs in lo­ca­tions around the world.Ac­tive mea­sure­ments

For each vis­i­ble IP on the in­ter­net, in­clud­ing both IPv4 and IPv6 ad­dresses, we mea­sure RTT from mul­ti­ple probes.Ev­i­dence-based ge­olo­ca­tion

We com­bine these mea­sure­ments with IPinfo’s other sig­nals to as­sign a coun­try (and more gran­u­lar lo­ca­tion) that’s grounded in how the in­ter­net ac­tu­ally be­haves.This mea­sure­ment-first ap­proach is unique in the IP data space. Once we re­al­ized how much in­ac­cu­racy came from self-de­clared data, we started in­vest­ing heav­ily in re­search and build­ing ProbeNet to use ac­tive mea­sure­ments at scale. Our goal is to make IP data as ev­i­dence-based as pos­si­ble, ver­i­fy­ing with ob­ser­va­tion on how the in­ter­net ac­tu­ally be­haves.Our Methodology for This ReportWe ap­proached this VPN in­ves­ti­ga­tion the way a skep­ti­cal but well-equipped user would: start from the VPNs’ own claims, then test them.For each of the 20 VPN providers, we pulled to­gether three kinds of data:Mar­ket­ing promises: The servers in X coun­tries” claims and coun­try lists from their web­sites. When a coun­try was clearly listed there, we treated it as the lo­ca­tions they ac­tively pro­mote. Con­fig­u­ra­tions and lo­ca­tions lists: Configurations from dif­fer­ent pro­to­cols like OpenVPN or WireGuard were col­lected along with lo­ca­tion in­for­ma­tion avail­able on provider com­mand-line tools, mo­bile ap­pli­ca­tions, or APIs.Unique provider–lo­ca­tion en­tries: We ended up with over 6,000,000 data points and a list of provider + lo­ca­tion com­bi­na­tions we could ac­tu­ally try to con­nect to with mul­ti­ple IPs each.Step 2: Observing Where the Traffic Really GoesNext, we used IPinfo in­fra­struc­ture and ProbeNet to dial into those lo­ca­tions and watch what ac­tu­ally hap­pens:We con­nected to each VPN location” and cap­tured the exit IP ad­dresses.For each exit IP ad­dress, we used IPinfo + ProbeNet’s ac­tive mea­sure­ments to de­ter­mine a mea­sured coun­try, plus:The round-trip time (RTT) from that probe (often un­der 1 ms), which is a strong hint about phys­i­cal prox­im­i­tyNow we had two views for each lo­ca­tion:Ex­pected/​Claimed coun­try: What the VPN claims in its UI/configs/websiteMeasured coun­try: Where IPinfo + ProbeNet ac­tu­ally see the exit IPFor each lo­ca­tion where a coun­try was clearly spec­i­fied, we asked a very sim­ple ques­tion: Does the ex­pected coun­try match the mea­sured coun­try?If yes, we counted it as a match. If not, it be­came a mis­match: a lo­ca­tion where the app says one coun­try, but the traf­fic ex­its some­where else.We de­lib­er­ately used a very nar­row de­f­i­n­i­tion of mismatch.” For a lo­ca­tion to be counted, two things had to be true: the provider had to clearly claim a spe­cific coun­try (on their web­site, in their app, or in con­figs), and we had di­rect ac­tive mea­sure­ments from ProbeNet for the exit IPs be­hind that lo­ca­tion.We ig­nored any lo­ca­tions where the mar­ket­ing was am­bigu­ous, where we had­n’t mea­sured the exit di­rectly, or where we only had weaker hints like host­name strings, reg­istry data, or third-party IP data­bases. Those sig­nals can be use­ful and true, but we wanted our num­bers to be as hard-to-ar­gue-with as pos­si­ble.The re­sult is that the mis­match rates we show here are con­ser­v­a­tive. With a looser method­ol­ogy that also leaned on those ad­di­tional hints, the num­bers would al­most cer­tainly be higher, not lower.

...

Read the original on ipinfo.io »

2 401 shares, 15 trendiness

What Is the Nicest Thing A Stranger Has Ever Done for You?

So there I was, ped­al­ing my bi­cy­cle as fast as I could down a long, straight stretch of road, feel­ing great. I’d just dis­cov­ered the plea­sures of rid­ing a road bike, and I loved every minute that I could get away. Always a data geek, I tracked my mileage, av­er­age speed, heart rate, etc. It was a beau­ti­ful Indian sum­mer Sunday af­ter­noon in September. I was in my late 30s, still a baby. Out of nowhere, my chain came off right in the mid­dle of the sprint I was tim­ing. In true mas­cu­line fash­ion, I threw a fit, curs­ing and hit­ting the brakes as hard as I could. At this point, I found out that ex­pe­ri­enced rid­ers don’t do that be­cause I flew right over the han­dle­bars, land­ing on the pave­ment amid speed­ing cars. I mo­men­tar­ily lost con­scious­ness, and when I re­gained my senses, I knew I’d screwed up badly. The pain in my shoul­der was nau­se­at­ing. I could­n’t move my arm, and I had to just roll off the road onto the shoul­der. I just lay there, hurt­ing, un­able to think clearly. Within sec­onds, it seemed, a man ma­te­ri­al­ized be­side me.

He was ex­cep­tion­ally calm. He did­n’t ask me if I was OK, since I clearly was­n’t. It was ob­vi­ous that he knew what he was do­ing. He made cer­tain I could breathe, paused long enough to dial 911, and then started pulling stuff out of a med­ical bag (WTF?) to clean the ex­ten­sive road rash I had. In a minute, he asked for my home phone num­ber so he could call my wife to let her know I was go­ing to be rid­ing in an am­bu­lance to the hos­pi­tal. He told her he was an emer­gency room doc­tor who just hap­pened to be right be­hind me when I crashed. He ex­plained that he would stay with me un­til the medics ar­rived and that he would call ahead to make sure one of the doc­tors on duty would take good care of me.”

When he hung up, he asked me if I’d heard the con­ver­sa­tion. I told him that I had and that I could­n’t be­lieve how lucky I was un­der the cir­cum­stances. He agreed. To keep my mind off the pain, he just kept chat­ting, telling me that be­cause I was ar­riv­ing by am­bu­lance, I’d be treated im­me­di­ately. He told me that I’d be get­ting the good drugs” to take care of the pain. That sounded awe­some.

I don’t re­mem­ber telling him good­bye. I cer­tainly did­n’t ask him his name or find out any­thing about him. He briefed the EMTs when they ar­rived and stood there un­til the am­bu­lance doors closed. The ER was in­deed ready for me when the am­bu­lance got there. They treated me like a VIP. I got some Dilaudid for the pain, and it was in­deed the good stuff. They cov­ered the road rash with Tegaderm and took x-rays, which re­vealed that I’d torn my col­lar­bone away from my shoul­der blade. That was go­ing to re­quire a cou­ple of surg­eries and lots of phys­i­cal ther­apy. I had a con­cus­sion and was glad that I had a hel­met on.

All of this hap­pened al­most 25 years ago. I’ve had plenty of other bike wrecks, but that re­mains the worst one. My daugh­ter is a nurse, and she’s like a mag­net for car crashes, hav­ing stopped mul­ti­ple times to ren­der aid. She does­n’t do it with a smile on her face, though; emer­gency med­i­cine is­n’t her gig, and if any­one asks her if she’s a doc­tor, her stock an­swer is I’m a YMCA mem­ber.”

The guy who helped me that day was an ab­solute an­gel. I have no idea what I would have done with­out him. I did­n’t even have a cell phone at the time. But he was there at a time when I could­n’t have needed him any more badly. He helped me and then got in his car and com­pleted his trip. I think of that day of­ten, es­pe­cially when the American med­ical sys­tem makes me mad, which hap­pens reg­u­larly these days.

I’ve en­joyed the kind­ness of a lot of strangers over the years, par­tic­u­larly dur­ing the long hike my wife and I did for our hon­ey­moon (2,186 miles) when we hitch­hiked to a town in NJ in the rain and got a ride from the first car to pass. Another time, in Connecticut, a man gave us a $100 bill and told us to have a nice din­ner at the restau­rant atop Mt. Greylock, the high­est moun­tain in Massachusetts. In Virginia, a moth flew into my wife’s ear, and I mean all the way into her ear un­til it was bump­ing into her eardrum. We hiked sev­eral miles to the road and weren’t there for a minute be­fore a man stopped and took us to ur­gent care, 30 miles away.

When you get down in the dumps, I hope you have some mem­o­ries like that to look back on, to re­store your faith in hu­man­ity. There are a lot of re­ally good peo­ple in the world.

Enjoyed it? Please up­vote 👇

...

Read the original on louplummer.lol »

3 322 shares, 10 trendiness

Useful patterns for building HTML tools

I’ve started us­ing the term HTML tools to re­fer to HTML ap­pli­ca­tions that I’ve been build­ing which com­bine HTML, JavaScript, and CSS in a sin­gle file and use them to pro­vide use­ful func­tion­al­ity. I have built over 150 of these in the past two years, al­most all of them writ­ten by LLMs. This ar­ti­cle pre­sents a col­lec­tion of use­ful pat­terns I’ve dis­cov­ered along the way.

First, some ex­am­ples to show the kind of thing I’m talk­ing about:

pypi-changelog lets you gen­er­ate (and copy to clip­board) diffs be­tween dif­fer­ent PyPI pack­age re­leases.

bluesky-thread pro­vides a nested view of a dis­cus­sion thread on Bluesky.

These are some of my re­cent fa­vorites. I have dozens more like this that I use on a reg­u­lar ba­sis.

You can ex­plore my col­lec­tion on tools.si­mon­willi­son.net—the by month view is use­ful for brows­ing the en­tire col­lec­tion.

If you want to see the code and prompts, al­most all of the ex­am­ples in this post in­clude a link in their footer to view source” on GitHub. The GitHub com­mits usu­ally con­tain ei­ther the prompt it­self or a link to the tran­script used to cre­ate the tool.

These are the char­ac­ter­is­tics I have found to be most pro­duc­tive in build­ing tools of this na­ture:

A sin­gle file: in­line JavaScript and CSS in a sin­gle HTML file means the least has­sle in host­ing or dis­trib­ut­ing them, and cru­cially means you can copy and paste them out of an LLM re­sponse.

Avoid React, or any­thing with a build step. The prob­lem with React is that JSX re­quires a build step, which makes every­thing mas­sively less con­ve­nient. I prompt no re­act” and skip that whole rab­bit hole en­tirely.

Load de­pen­den­cies from a CDN. The fewer de­pen­den­cies the bet­ter, but if there’s a well known li­brary that helps solve a prob­lem I’m happy to load it from CDNjs or js­de­livr or sim­i­lar.

Keep them small. A few hun­dred lines means the main­tain­abil­ity of the code does­n’t mat­ter too much: any good LLM can read them and un­der­stand what they’re do­ing, and rewrit­ing them from scratch with help from an LLM takes just a few min­utes.

The end re­sult is a few hun­dred lines of code that can be cleanly copied and pasted into a GitHub repos­i­tory.

The eas­i­est way to build one of these tools is to start in ChatGPT or Claude or Gemini. All three have fea­tures where they can write a sim­ple HTML+JavaScript ap­pli­ca­tion and show it to you di­rectly.

Claude calls this Artifacts”, ChatGPT and Gemini both call it Canvas”. Claude has the fea­ture en­abled by de­fault, ChatGPT and Gemini may re­quire you to tog­gle it on in their tools” menus.

Try this prompt in Gemini or ChatGPT:

Build a can­vas that lets me paste in JSON and con­verts it to YAML. No React.

Or this prompt in Claude:

Build an ar­ti­fact that lets me paste in JSON and con­verts it to YAML. No React.

I al­ways add No React” to these prompts, be­cause oth­er­wise they tend to build with React, re­sult­ing in a file that is harder to copy and paste out of the LLM and use else­where. I find that at­tempts which use React take longer to dis­play (since they need to run a build step) and are more likely to con­tain crash­ing bugs for some rea­son, es­pe­cially in ChatGPT.

All three tools have share” links that pro­vide a URL to the fin­ished ap­pli­ca­tion. Examples:

Coding agents such as Claude Code and Codex CLI have the ad­van­tage that they can test the code them­selves while they work on it us­ing tools like Playwright. I of­ten up­grade to one of those when I’m work­ing on some­thing more com­pli­cated, like my Bluesky thread viewer tool shown above.

I also fre­quently use asyn­chro­nous cod­ing agents like Claude Code for web to make changes to ex­ist­ing tools. I shared a video about that in Building a tool to copy-paste share ter­mi­nal ses­sions us­ing Claude Code for web.

Claude Code for web and Codex Cloud run di­rectly against my si­monw/​tools repo, which means they can pub­lish or up­grade tools via Pull Requests (here are dozens of ex­am­ples) with­out me need­ing to copy and paste any­thing my­self.

Any time I use an ad­di­tional JavaScript li­brary as part of my tool I like to load it from a CDN.

The three ma­jor LLM plat­forms sup­port spe­cific CDNs as part of their Artifacts or Canvas fea­tures, so of­ten if you tell them Use PDF.js” or sim­i­lar they’ll be able to com­pose a URL to a CDN that’s on their al­low-list.

Sometimes you’ll need to go and look up the URL on cd­njs or js­De­livr and paste it into the chat.

CDNs like these have been around for long enough that I’ve grown to trust them, es­pe­cially for URLs that in­clude the pack­age ver­sion.

The al­ter­na­tive to CDNs is to use npm and have a build step for your pro­jects. I find this re­duces my pro­duc­tiv­ity at hack­ing on in­di­vid­ual tools and makes it harder to self-host them.

I don’t like leav­ing my HTML tools hosted by the LLM plat­forms them­selves for a cou­ple of rea­sons. First, LLM plat­forms tend to run the tools in­side a tight sand­box with a lot of re­stric­tions. They’re of­ten un­able to load data or im­ages from ex­ter­nal URLs, and some­times even fea­tures like link­ing out to other sites are dis­abled.

The end-user ex­pe­ri­ence of­ten is­n’t great ei­ther. They show warn­ing mes­sages to new users, of­ten take ad­di­tional time to load and de­light in show­ing pro­mo­tions for the plat­form that was used to cre­ate the tool.

They’re also not as re­li­able as other forms of sta­tic host­ing. If ChatGPT or Claude are hav­ing an out­age I’d like to still be able to ac­cess the tools I’ve cre­ated in the past.

Being able to eas­ily self-host is the main rea­son I like in­sist­ing on no React” and us­ing CDNs for de­pen­den­cies—the ab­sence of a build step makes host­ing tools else­where a sim­ple case of copy­ing and past­ing them out to some other provider.

My pre­ferred provider here is GitHub Pages be­cause I can paste a block of HTML into a file on github.com and have it hosted on a per­ma­nent URL a few sec­onds later. Most of my tools end up in my si­monw/​tools repos­i­tory which is con­fig­ured to serve sta­tic files at tools.si­mon­willi­son.net.

One of the most use­ful in­put/​out­put mech­a­nisms for HTML tools comes in the form of copy and paste.

I fre­quently build tools that ac­cept pasted con­tent, trans­form it in some way and let the user copy it back to their clip­board to paste some­where else.

Copy and paste on mo­bile phones is fid­dly, so I fre­quently in­clude Copy to clip­board” but­tons that pop­u­late the clip­board with a sin­gle touch.

Most op­er­at­ing sys­tem clip­boards can carry mul­ti­ple for­mats of the same copied data. That’s why you can paste con­tent from a word proces­sor in a way that pre­serves for­mat­ting, but if you paste the same thing into a text ed­i­tor you’ll get the con­tent with for­mat­ting stripped.

These rich copy op­er­a­tions are avail­able in JavaScript paste events as well, which opens up all sorts of op­por­tu­ni­ties for HTML tools.

hacker-news-thread-ex­port lets you paste in a URL to a Hacker News thread and gives you a copy­able con­densed ver­sion of the en­tire thread, suit­able for past­ing into an LLM to get a use­ful sum­mary.

paste-rich-text lets you copy from a page and paste to get the HTML—particularly use­ful on mo­bile where view-source is­n’t avail­able.

alt-text-ex­trac­tor lets you paste in im­ages and then copy out their alt text.

The key to build­ing in­ter­est­ing HTML tools is un­der­stand­ing what’s pos­si­ble. Building cus­tom de­bug­ging tools is a great way to ex­plore these op­tions.

clip­board-viewer is one of my most use­ful. You can paste any­thing into it (text, rich text, im­ages, files) and it will loop through and show you every type of paste data that’s avail­able on the clip­board.

This was key to build­ing many of my other tools, be­cause it showed me the in­vis­i­ble data that I could use to boot­strap other in­ter­est­ing pieces of func­tion­al­ity.

key­board-de­bug shows the keys (and KeyCode val­ues) cur­rently be­ing held down.

cors-fetch re­veals if a URL can be ac­cessed via CORS.

HTML tools may not have ac­cess to server-side data­bases for stor­age but it turns out you can store a lot of state di­rectly in the URL.

I like this for tools I may want to book­mark or share with other peo­ple.

icon-ed­i­tor is a cus­tom 24x24 icon ed­i­tor I built to help hack on icons for the GitHub Universe badge. It per­sists your in-progress icon de­sign in the URL so you can eas­ily book­mark and share it.

The lo­cal­Stor­age browser API lets HTML tools store data per­sis­tently on the user’s de­vice, with­out ex­pos­ing that data to the server.

I use this for larger pieces of state that don’t fit com­fort­ably in a URL, or for se­crets like API keys which I re­ally don’t want any­where near my server —even sta­tic hosts might have server logs that are out­side of my in­flu­ence.

word-counter is a sim­ple tool I built to help me write to spe­cific word counts, for things like con­fer­ence ab­stract sub­mis­sions. It uses lo­cal­Stor­age to save as you type, so your work is­n’t lost if you ac­ci­den­tally close the tab.

ren­der-mark­down uses the same trick—I some­times use this one to craft blog posts and I don’t want to lose them.

haiku is one of a num­ber of LLM demos I’ve built that re­quest an API key from the user (via the prompt() func­tion) and then store that in lo­cal­Stor­age. This one uses Claude Haiku to write haikus about what it can see through the user’s we­b­cam.

CORS stands for Cross-origin re­source shar­ing. It’s a rel­a­tively low-level de­tail which con­trols if JavaScript run­ning on one site is able to fetch data from APIs hosted on other do­mains.

APIs that pro­vide open CORS head­ers are a gold­mine for HTML tools. It’s worth build­ing a col­lec­tion of these over time.

Here are some I like:

* iNat­u­ral­ist for fetch­ing sight­ings of an­i­mals, in­clud­ing URLs to pho­tos

* GitHub be­cause any­thing in a pub­lic repos­i­tory in GitHub has a CORS-enabled anony­mous API for fetch­ing that con­tent from the raw.githubuser­con­tent.com do­main, which is be­hind a caching CDN so you don’t need to worry too much about rate lim­its or feel guilty about adding load to their in­fra­struc­ture.

* Bluesky for all sorts of op­er­a­tions

* Mastodon has gen­er­ous CORS poli­cies too, as used by ap­pli­ca­tions like phanpy.so­cial

GitHub Gists are a per­sonal fa­vorite here, be­cause they let you build apps that can per­sist state to a per­ma­nent Gist through mak­ing a cross-ori­gin API call.

species-ob­ser­va­tion-map uses iNat­u­ral­ist to show a map of re­cent sight­ings of a par­tic­u­lar species.

zip-wheel-ex­plorer fetches a .whl file for a Python pack­age from PyPI, un­zips it (in browser mem­ory) and lets you nav­i­gate the files.

github-is­sue-to-mark­down fetches is­sue de­tails and com­ments from the GitHub API (including ex­pand­ing any per­ma­nent code links) and turns them into copy­able Markdown.

ter­mi­nal-to-html can op­tion­ally save the user’s con­verted ter­mi­nal ses­sion to a Gist.

bluesky-quote-finder dis­plays quotes of a spec­i­fied Bluesky post, which can then be sorted by likes or by time.

All three of OpenAI, Anthropic and Gemini of­fer JSON APIs that can be ac­cessed via CORS di­rectly from HTML tools.

Unfortunately you still need an API key, and if you bake that key into your vis­i­ble HTML any­one can steal it and use to rack up charges on your ac­count.

I use the lo­cal­Stor­age se­crets pat­tern to store API keys for these ser­vices. This sucks from a user ex­pe­ri­ence per­spec­tive—telling users to go and cre­ate an API key and paste it into a tool is a lot of fric­tion—but it does work.

haiku uses the Claude API to write a haiku about an im­age from the user’s we­b­cam.

gem­ini-bbox demon­strates Gemini 2.5’s abil­ity to re­turn com­plex shaped im­age masks for ob­jects in im­ages, see Image seg­men­ta­tion us­ing Gemini 2.5.

You don’t need to up­load a file to a server in or­der to make use of the el­e­ment. JavaScript can ac­cess the con­tent of that file di­rectly, which opens up a wealth of op­por­tu­ni­ties for use­ful func­tion­al­ity.

ocr is the first tool I built for my col­lec­tion, de­scribed in Running OCR against PDFs and im­ages di­rectly in your browser. It uses PDF.js and Tesseract.js to al­low users to open a PDF in their browser which it then con­verts to an im­age-per-page and runs through OCR.

so­cial-me­dia-crop­per lets you open (or paste in) an ex­ist­ing im­age and then crop it to com­mon di­men­sions needed for dif­fer­ent so­cial me­dia plat­forms—2:1 for Twitter and LinkedIn, 1.4:1 for Substack etc.

ffm­peg-crop lets you open and pre­view a video file in your browser, drag a crop box within it and then copy out the ffm­peg com­mand needed to pro­duce a cropped copy on your own ma­chine.

An HTML tool can gen­er­ate a file for down­load with­out need­ing help from a server.

The JavaScript li­brary ecosys­tem has a huge range of pack­ages for gen­er­at­ing files in all kinds of use­ful for­mats.

Pyodide is a dis­tri­b­u­tion of Python that’s com­piled to WebAssembly and de­signed to run di­rectly in browsers. It’s an en­gi­neer­ing mar­vel and one of the most un­der­rated cor­ners of the Python world.

It also cleanly loads from a CDN, which means there’s no rea­son not to use it in HTML tools!

Even bet­ter, the Pyodide pro­ject in­cludes mi­cropip—a mech­a­nism that can load ex­tra pure-Python pack­ages from PyPI via CORS.

py­o­dide-bar-chart demon­strates run­ning Pyodide, Pandas and mat­plotlib to ren­der a bar chart di­rectly in the browser.

numpy-py­o­dide-lab is an ex­per­i­men­tal in­ter­ac­tive tu­to­r­ial for Numpy.

apsw-query demon­strates the APSW SQLite li­brary run­ning in a browser, us­ing it to show EXPLAIN QUERY plans for SQLite queries.

Pyodide is pos­si­ble thanks to WebAssembly. WebAssembly means that a vast col­lec­tion of soft­ware orig­i­nally writ­ten in other lan­guages can now be loaded in HTML tools as well.

Squoosh.app was the first ex­am­ple I saw that con­vinced me of the power of this pat­tern—it makes sev­eral best-in-class im­age com­pres­sion li­braries avail­able di­rectly in the browser.

I’ve used WebAssembly for a few of my own tools:

The biggest ad­van­tage of hav­ing a sin­gle pub­lic col­lec­tion of 100+ tools is that it’s easy for my LLM as­sis­tants to re­com­bine them in in­ter­est­ing ways.

Sometimes I’ll copy and paste a pre­vi­ous tool into the con­text, but when I’m work­ing with a cod­ing agent I can ref­er­ence them by name—or tell the agent to search for rel­e­vant ex­am­ples be­fore it starts work.

The source code of any work­ing tool dou­bles as clear doc­u­men­ta­tion of how some­thing can be done, in­clud­ing pat­terns for us­ing edit­ing li­braries. An LLM with one or two ex­ist­ing tools in their con­text is much more likely to pro­duce work­ing code.

And then, af­ter it had found and read the source code for zip-wheel-ex­plorer:

Build a new tool pypi-changelog.html which uses the PyPI API to get the wheel URLs of all avail­able ver­sions of a pack­age, then it dis­plays them in a list where each pair has a Show changes” click­able in be­tween them - click­ing on that fetches the full con­tents of the wheels and dis­plays a nicely ren­dered diff rep­re­sent­ing the dif­fer­ence be­tween the two, as close to a stan­dard diff for­mat as you can get with JS li­braries from CDNs, and when that is dis­played there is a Copy” but­ton which copies that diff to the clip­board

See Running OCR against PDFs and im­ages di­rectly in your browser for an­other de­tailed ex­am­ple of remix­ing tools to cre­ate some­thing new.

I like keep­ing (and pub­lish­ing) records of every­thing I do with LLMs, to help me grow my skills at us­ing them over time.

For HTML tools I built by chat­ting with an LLM plat­form di­rectly I use the share” fea­ture for those plat­forms.

For Claude Code or Codex CLI or other cod­ing agents I copy and paste the full tran­script from the ter­mi­nal into my ter­mi­nal-to-html tool and share that us­ing a Gist.

In ei­ther case I in­clude links to those tran­scripts in the com­mit mes­sage when I save the fin­ished tool to my repos­i­tory. You can see those in my tools.si­mon­willi­son.net colophon.

I’ve had so much fun ex­plor­ing the ca­pa­bil­i­ties of LLMs in this way over the past year and a half, and build­ing tools in this way has been in­valu­able in help­ing me un­der­stand both the po­ten­tial for build­ing tools with HTML and the ca­pa­bil­i­ties of the LLMs that I’m build­ing them with.

If you’re in­ter­ested in start­ing your own col­lec­tion I highly rec­om­mend it! All you need to get started is a free GitHub repos­i­tory with GitHub Pages en­abled (Settings -> Pages -> Source -> Deploy from a branch -> main) and you can start copy­ing in .html pages gen­er­ated in what­ever man­ner you like.

...

Read the original on simonwillison.net »

4 316 shares, 11 trendiness

I Tried Gleam for Advent of Code, and I Get the Hype

I do Advent of Code every year.

For the last seven years, in­clud­ing this one, I have man­aged to get all the stars. I do not say that to brag. I say it be­cause it ex­plains why I keep com­ing back.

It is one of the few tech tra­di­tions I never get bored of, even af­ter do­ing it for a long time. I like the time pres­sure. I like the com­mu­nity vibe. I like that every December I can pick one lan­guage and go all in.

Advent of Code is usu­ally 25 days. This year Eric de­cided to do 12 days in­stead.

So in­stead of 50 parts, it was 24.

That sounds like a re­laxed year. It was not, but not in a bad way.

The eas­ier days were harder than the easy days in past years, but they were also re­ally en­gag­ing and fun to work through. The hard days were hard, es­pe­cially the last three, but they were still the good kind of hard. They were prob­lems I ac­tu­ally wanted to wres­tle with.

It also changes the pac­ing in a funny way. In a nor­mal year, by day 10 you have a pretty comfy tool­box. This year it felt like the puz­zles were al­ready de­mand­ing that tool­box while I was still build­ing it.

That turned out to be a per­fect setup for learn­ing a new lan­guage.

Gleam is easy to like quickly.

The syn­tax is clean. The com­piler is help­ful, and the er­ror mes­sages are su­per duper good. Rust good.

Most im­por­tantly, the lan­guage strongly nudges you into a style that fits Advent of Code re­ally well. Parse some text. Transform it a few times. Fold. Repeat.

One thing I did not ex­pect was how good the ed­i­tor ex­pe­ri­ence would be. The LSP worked much bet­ter than I ex­pected. It ba­si­cally worked per­fectly the whole time. I used the Gleam ex­ten­sion for IntelliJ and it was great.

I also just like FP.

FP is not al­ways eas­ier, but it is of­ten eas­ier. When it clicks, you stop writ­ing in­struc­tions and you start de­scrib­ing the so­lu­tion.

The first thing I fell in love with was echo.

It is ba­si­cally a print state­ment that does not make you earn it. You can echo any value. You do not have to for­mat any­thing. You do not have to build a string. You can just drop it into a pipeline and keep go­ing.

This is the kind of thing I mean:

You can quickly in­spect val­ues at mul­ti­ple points with­out break­ing the flow.

I did miss string in­ter­po­la­tion, es­pe­cially early on. echo made up for a lot of that.

It mostly hit when I needed to gen­er­ate text, not when I needed to in­spect val­ues. The day where I gen­er­ated an LP file for glp­sol is the best ex­am­ple. It is not hard code, but it is a lot of string build­ing. Without in­ter­po­la­tion it turns into a bit of a mess of <>s.

This is a small ex­cerpt from my LP gen­er­a­tor:

It works. It is just the kind of code where you re­ally feel miss­ing in­ter­po­la­tion.

Grids are where you nor­mally ei­ther crash into out of bounds bugs, or you lit­ter your code with bounds checks you do not care about.

In my day 4 so­lu­tion I used a dict as a grid. The key er­gonomic part is that dict.get gives you an op­tion-like re­sult, which makes neigh­bour check­ing safe by de­fault.

This is the neigh­bour func­tion from my so­lu­tion:

That last line is the whole point.

No bounds checks. No sen­tinel val­ues. Out of bounds just dis­ap­pears.

I ex­pected to write parsers and helpers, and I did. What I did not ex­pect was how of­ten Gleam al­ready had the ex­act list func­tion I needed.

I read the in­put, chun­ked it into rows, trans­posed it, and sud­denly the rest of the puz­zle be­came ob­vi­ous.

In a lot of lan­guages you end up writ­ing your own trans­pose yet again. In Gleam it is al­ready there.

Another ex­am­ple is list.com­bi­na­tion_­pairs.

In day 8 I needed all pairs of 3D points. In an im­per­a­tive lan­guage you would prob­a­bly write nested loops and then ques­tion your off by one logic.

In Gleam it is a one liner:

Sometimes FP is not about be­ing clever. It is about hav­ing the right func­tion name.

If I had to pick one fea­ture that made me want to keep writ­ing Gleam af­ter AoC, it is fold_un­til.

Early exit with­out hacks is fan­tas­tic in puz­zles.

In day 8 part 2 I kept merg­ing sets un­til the first set in the list con­tained all boxes. When that hap­pens, I stop.

The core shape looks like this:

It is small, ex­plicit, and it reads like in­tent.

I also used fold_un­til in day 10 part 1 to find the small­est com­bi­na­tion size that works.

Even though I en­joyed Gleam a lot, I did hit a few re­cur­ring fric­tion points.

None of these are deal break­ers. They are just the kind of things you no­tice when you do 24 parts in a row.

This one sur­prised me on day 1.

For AoC you read a file every day. In this repo I used sim­pli­file every­where be­cause you need some­thing. It is fine, I just did not ex­pect ba­sic file IO to be out­side the stan­dard li­brary.

Day 2 part 2 pushed me into regex and I had to add gleam_reg­exp.

This is the style I used, build­ing a regex from a sub­string:

Again, to­tally fine. It just sur­prised me.

You can do [first, ..rest] and you can do [first, sec­ond].

But you can­not do [first, ..middle, last].

It is not the end of the world, but it would have made some pars­ing cleaner.

In Gleam a lot of com­par­isons are not booleans. You get an or­der value.

This is great for sort­ing. It is also very ex­plicit. It can be a bit ver­bose when you just want an check.

In day 5 I ended up writ­ing pat­terns like this:

I used bigi a few times this year.

On the Erlang VM, in­te­gers are ar­bi­trary pre­ci­sion, so you usu­ally do not care about over­flow. That is one of the nicest things about the BEAM.

If you want your Gleam code to also tar­get JavaScript, you do care. JavaScript has lim­its, and sud­denly us­ing bigi be­comes nec­es­sary for some puz­zles.

I wish that was just part of Int, with a sin­gle con­sis­tent story across tar­gets.

Day 10 part 1 was my fa­vorite part of the whole event.

The mo­ment I saw the tog­gling be­hav­ior, it clicked as XOR. Represent the lights as a num­ber. Represent each but­ton as a bit­mask. Find the small­est com­bi­na­tion of bit­masks that XOR to the tar­get.

This is the fold from my so­lu­tion:

It felt clean, it felt fast, and it felt like the rep­re­sen­ta­tion did most of the work.

I knew brute force was out. It was clearly a sys­tem of lin­ear equa­tions.

In pre­vi­ous years I would reach for Z3, but there are no Z3 bind­ings for Gleam. I tried to stay in Gleam, and I ended up gen­er­at­ing an LP file and shelling out to glp­sol us­ing shell­out.

It worked, and hon­estly the LP for­mat is beau­ti­ful.

Here is the call:

It is a hack, but it is a prag­matic hack, and that is also part of Advent of Code.

Day 11 part 2 is where I was happy I was writ­ing Gleam.

The im­por­tant de­tail was that the memo key is not just the node. It is the node plus your state.

In my case the key was:

Once I got the memo thread­ing right, it ran in­stantly.

The last day was the only puz­zle I did not fully en­joy.

Not be­cause it was bad. It just felt like it re­lied on as­sump­tions about the in­put, and I am one of those peo­ple that does not love do­ing that.

I over­thought it for a bit, then I learned it was more of a troll prob­lem. The do the ar­eas of the pieces, when fully in­ter­locked, fit on the board” heuris­tic was enough.

In my so­lu­tion it is lit­er­ally this:

Sometimes you build a beau­ti­ful men­tal model and then the right an­swer is a sin­gle in­equal­ity.

I am very happy I picked Gleam this year.

It has sharp edges, mostly around where the stan­dard li­brary draws the line and a few lan­guage con­straints that show up in puz­zle code. But it also has real strengths.

Pipelines feel good. Options and Results make un­safe prob­lems feel safe. The list tool­box is bet­ter than I ex­pected. fold_un­til is in­cred­i­ble. Once you stop try­ing to write loops and you let it be func­tional, the so­lu­tions start to feel clearer.

I can­not wait to try Gleam in a real pro­ject. I have been think­ing about us­ing it to write a web­server, and I am gen­uinely ex­cited to give it a go.

And of course, I can­not wait for next year’s Advent of Code.

If you want to look at the source for all 12 days, it is here:

...

Read the original on blog.tymscar.com »

5 291 shares, 15 trendiness

Linux Sandboxes And Fil-C

Memory safety and sand­box­ing are two dif­fer­ent things. It’s rea­son­able to think of them as or­thog­o­nal: you could have mem­ory safety but not be sand­boxed, or you could be sand­boxed but not mem­ory safe.

* Example of mem­ory safe but not sand­boxed: a pure Java pro­gram that opens files on the filesys­tem for read­ing and writ­ing and ac­cepts file­names from the user. The OS will al­low this pro­gram to over­write any file that the user has ac­cess to. This pro­gram can be quite dan­ger­ous even if it is mem­ory safe. Worse, imag­ine that the pro­gram did­n’t have any code to open files for read­ing and writ­ing, but also had no sand­box to pre­vent those syscalls from work­ing. If there was a bug in the mem­ory safety en­force­ment of this pro­gram (say, be­cause of a bug in the Java im­ple­men­ta­tion), then an at­tacker could cause this pro­gram to over­write any file if they suc­ceeded at achiev­ing code ex­e­cu­tion via weird state.

* Example of sand­boxed but not mem­ory safe: a pro­gram writ­ten in as­sem­bly that starts by re­quest­ing that the OS re­voke all of its ca­pa­bil­i­ties be­yond just pure com­pute. If the pro­gram did want to open a file or write to it, then the ker­nel will kill the process, based on the ear­lier re­quest to have this ca­pa­bil­ity re­voked. This pro­gram could have lots of mem­ory safety bugs (because it’s writ­ten in as­sem­bly), but even if it did, then the at­tacker can­not make this pro­gram over­write any file un­less they find some way to by­pass the sand­box.

In prac­tice, sand­boxes have holes by de­sign. A typ­i­cal sand­box al­lows the pro­gram to send and re­ceive mes­sages to bro­ker processes that have higher priv­i­leges. So, an at­tacker may first use a mem­ory safety bug to make the sand­boxed process send ma­li­cious mes­sages, and then use those ma­li­cious mes­sages to break into the bro­kers.

The best kind of de­fense is to have both a sand­box and mem­ory safety. This doc­u­ment de­scribes how to com­bine sand­box­ing and Fil-C’s mem­ory safety by ex­plain­ing what it takes to port OpenSSH’s sec­comp-based Linux sand­box code to Fil-C.

Fil-C is a mem­ory safe im­ple­men­ta­tion of C and C++ and this site has a lot of doc­u­men­ta­tion about it. Unlike most mem­ory safe lan­guages, Fil-C en­forces safety down to where your code meets Linux syscalls and the Fil-C run­time is ro­bust enough that it’s pos­si­ble to use it in low-level sys­tem com­po­nents like init and udevd. Lots of pro­grams work in Fil-C, in­clud­ing OpenSSH, which makes use of sec­comp-BPF sand­box­ing.

This doc­u­ment fo­cuses on how OpenSSH uses sec­comp and other tech­nolo­gies on Linux to build a sand­box around its un­priv­i­leged sshd-ses­sion process. Let’s re­view what tools Linux gives us that OpenSSH uses:

* ch­root to re­strict the process’s view of the filesys­tem.

* Running the process with the sshd user and group, and giv­ing that user/​group no priv­i­leges.

* setr­limit to pre­vent open­ing files, start­ing processes, or writ­ing to files.

* sec­comp-BPF syscall fil­ter to re­duce the at­tack sur­face by al­lowlist­ing only the set of syscalls that are le­git­i­mate for the un­priv­i­leged process. Syscalls not in the al­lowlist will crash the process with SIGSYS.

The Chromium de­vel­op­ers and the Mozilla de­vel­op­ers both have ex­cel­lent notes about how to do sand­box­ing on Linux us­ing sec­comp. Seccomp-BPF is a well-doc­u­mented ker­nel fea­ture that can be used as part of a larger sand­box­ing story.

Fil-C makes it easy to use ch­root and dif­fer­ent users and groups. The syscalls that are used for that part of the sand­box are triv­ially al­lowed by Fil-C and no spe­cial care is re­quired to use them.

Both setr­limit and sec­comp-BPF re­quire spe­cial care be­cause the Fil-C run­time starts threads, al­lo­cates mem­ory, and per­forms syn­chro­niza­tion. This doc­u­ment de­scribes what you need to know to make ef­fec­tive use of those sand­box­ing tech­nolo­gies in Fil-C. First, I de­scribe how to build a sand­box that pre­vents thread cre­ation with­out break­ing Fil-C’s use of threads. Then, I de­scribe what tweaks I had to make to OpenSSH’s sec­comp fil­ter. Finally, I de­scribe how the Fil-C run­time im­ple­ments the syscalls used to in­stall sec­comp fil­ters.

The Fil-C run­time uses mul­ti­ple back­ground threads for garbage col­lec­tion and has the abil­ity to au­to­mat­i­cally shut those threads down when they are not in use. If the pro­gram wakes up and starts al­lo­cat­ing mem­ory again, then those threads are au­to­mat­i­cally restarted.

Starting threads vi­o­lates the no new processes” rule that OpenSSH’s setr­limit sand­box tries to achieve (since threads are just light­weight processes on Linux). It also re­lies on syscalls like clone3 that are not part of OpenSSH’s sec­comp fil­ter al­lowlist.

It would be a re­gres­sion to the sand­box to al­low process cre­ation just be­cause the Fil-C run­time re­lies on it. Instead, I added a new API to :

void zlock­_run­time_threads(void);

This forces the run­time to im­me­di­ately cre­ate what­ever threads it needs, and to dis­able shut­ting them down on de­mand. Then, I added a call to zlock­_run­time_threads() in OpenSSH’s ssh_sand­box_child func­tion be­fore ei­ther the setr­limit or sec­comp-BPF sand­box calls hap­pen.

Because the use of zlock­_run­time_threads() pre­vents sub­se­quent thread cre­ation from hap­pen­ing, most of the OpenSSH sand­box just works. I did not have to change how OpenSSH uses setr­limit. I did change the fol­low­ing about the sec­comp fil­ter:

* Failure re­sults in SECCOMP_RET_KILL_PROCESS rather than SECCOMP_RET_KILL. This en­sures that Fil-C’s back­ground threads are also killed if a sand­box vi­o­la­tion oc­curs.

* MAP_NORESERVE is added to the mmap al­lowlist, since the Fil-C al­lo­ca­tor uses it. This is not a mean­ing­ful re­gres­sion to the fil­ter, since MAP_NORESERVE is not a mean­ing­ful ca­pa­bil­ity for an at­tacker to have.

* sched_yield is al­lowed. This is not a dan­ger­ous syscall (it’s se­man­ti­cally a no-op). The Fil-C run­time uses it as part of its lock im­ple­men­ta­tion.

Nothing else had to change, since the fil­ter al­ready al­lowed all of the fu­tex syscalls that Fil-C uses for syn­chro­niza­tion.

The OpenSSH sec­comp fil­ter is in­stalled us­ing two prctl calls. First, we PR_SET_NO_NEW_PRIVS:

if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) == -1) {

de­bug(“%s: prctl(PR_SET_NO_NEW_PRIVS): %s”,

__func__, str­error(er­rno));

nnp_­failed = 1;

This pre­vents ad­di­tional priv­i­leges from be­ing ac­quired via ex­ecve. It’s re­quired that un­priv­i­leged processes that in­stall sec­comp fil­ters first set the no_new_privs bit.

if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &preauth_program) == -1)

de­bug(“%s: prctl(PR_SET_SEC­COMP): %s”,

__func__, str­error(er­rno));

else if (nnp_failed)

fa­tal(“%s: SECCOMP_MODE_FILTER ac­ti­vated but

PR_SET_NO_NEW_PRIVS failed”, __func__);

This in­stalls the sec­comp fil­ter in preauth_pro­gram. Note that this will fail in the ker­nel if the no_new_privs bit is not set, so the fact that OpenSSH re­ports a fa­tal er­ror if the fil­ter is in­stalled with­out no_new_privs is just healthy para­noia on the part of the OpenSSH au­thors.

The trou­ble with both syscalls is that they af­fect the call­ing thread, not all threads in the process. Without spe­cial care, Fil-C run­time’s back­ground threads would not have the no_new_privs bit set and would not have the fil­ter in­stalled. This would mean that if an at­tacker busted through Fil-C’s mem­ory safety pro­tec­tions (in the un­likely event that they found a bug in Fil-C it­self!), then they could use those other threads to ex­e­cute syscalls that by­pass the fil­ter!

To pre­vent even this un­likely es­cape, the Fil-C run­time’s wrap­per for prctl im­ple­ments PR_SET_NO_NEW_PRIVS and PR_SET_SECCOMP by hand­shak­ing all run­time threads us­ing this in­ter­nal API:

/* Calls the call­back from every run­time thread. */

PAS_API void fil­c_run­time_thread­s_hand­shake(void (*callback)(void* arg), void* arg);

The call­back per­forms the re­quested prctl from each run­time thread. This en­sures that the no_new_privs bit and the fil­ter are in­stalled on all threads in the Fil-C process.

Additionally, be­cause of am­bi­gu­ity about what to do if the process has mul­ti­ple user threads, these two prctl com­mands will trig­ger a Fil-C safety er­ror if the pro­gram has mul­ti­ple user threads.

The best kind of pro­tec­tion if you’re se­ri­ous about se­cu­rity is to com­bine mem­ory safety with sand­box­ing. This doc­u­ment shows how to achieve this us­ing Fil-C and the sand­box tech­nolo­gies avail­able on Linux, all with­out re­gress­ing the level of pro­tec­tion that those sand­boxes en­force or the mem­ory safety guar­an­tees of Fil-C.

...

Read the original on fil-c.org »

6 278 shares, 79 trendiness

Europeans’ health data sold to U.S. firm run by ex-Israeli spies

You’re only see­ing part of our work. Create an ac­count to ex­plore all our in­ves­ti­ga­tions. Try it for free for 30 days, no strings at­tached.

You’re only see­ing part of our work. Create an ac­count to ex­plore all our in­ves­ti­ga­tions. Try it for free for 30 days, no strings at­tached.

The European mes­sag­ing ser­vice Zivver — which is used for con­fi­den­tial com­mu­ni­ca­tion by gov­ern­ments and hos­pi­tals in the EU and the U. K. — has been sold to Kiteworks, an American com­pany with strong links to Israeli in­tel­li­gence. Experts have ex­pressed deep con­cerns over the deal.

With the sale of Amsterdam-based data se­cu­rity com­pany Zivver, sen­si­tive in­for­ma­tion about European cit­i­zens is now in the hands of Kiteworks. The CEO of the American tech com­pany is a for­mer cy­ber spe­cial­ist from an elite unit of the Israeli army, as are sev­eral other mem­bers of its top man­age­ment. Var­i­ous in­sti­tu­tions in Europe and the U. K. — from hos­pi­tals to courts and im­mi­gra­tion ser­vices — use Zivver to send con­fi­den­tial doc­u­ments. While Zivver says these doc­u­ments are en­crypted, an in­ves­ti­ga­tion by Follow the Money shows that the com­pany is able to read their con­tents.Why does this mat­ter?Cy­ber­se­cu­rity and in­tel­li­gence ex­perts told Follow the Money that the takeover should ei­ther have been pre­vented or prop­erly as­sessed in ad­vance. Zivver processes in­for­ma­tion that could be ex­tremely valu­able to third par­ties, such as crim­i­nals or for­eign in­tel­li­gence ser­vices. That in­for­ma­tion is now sub­ject to in­va­sive U.S. law, and over­seen by a com­pany with well-doc­u­mented links to Israeli in­tel­li­gence.How was this in­ves­ti­gated?Fol­low the Money in­ves­ti­gated the ac­qui­si­tion of Zivver and the man­age­ment of Kiteworks, and spoke to ex­perts in in­tel­li­gence ser­vices and cy­ber se­cu­rity.

This ar­ti­cle is part of an on­go­ing se­ries.

When the American data se­cu­rity com­pany Kiteworks bought out its Dutch in­dus­try peer Zivver in June, CEO Jonathan Yaron de­scribed it as a proud mo­ment for all of us”. The pur­chase was a sig­nif­i­cant mile­stone in Kiteworks’ con­tin­ued mis­sion to safe­guard sen­si­tive data across all com­mu­ni­ca­tion chan­nels”, he added in a LinkedIn post. But what Yaron did not men­tion was that this ac­qui­si­tion — com­ing at a po­lit­i­cally charged mo­ment be­tween the U.S. and the EU — put highly sen­si­tive, per­sonal data be­long­ing to European and British cit­i­zens di­rectly into American hands. Zivver is used by in­sti­tu­tions in­clud­ing hos­pi­tals, health in­sur­ers, gov­ern­ment ser­vices and im­mi­gra­tion au­thor­i­ties in coun­tries in­clud­ing the Netherlands, Germany, Belgium and the U.K.Neither did Yaron men­tion that much of Kiteworks’ top man­age­ment — him­self in­cluded — are for­mer mem­bers of an elite Israeli Defence Force unit that spe­cialised in eaves­drop­ping and break­ing en­crypted com­mu­ni­ca­tions.

Our jour­nal­ism is only pos­si­ble thanks to the trust of our pay­ing mem­bers. Not a mem­ber yet? Sign up now

In ad­di­tion to this, an in­ves­ti­ga­tion by Follow the Money shows that data processed by Zivver is less se­cure than the ser­vice leads its cus­tomers to be­lieve. Research found that emails and doc­u­ments sent by Zivver can be read by the com­pany it­self. This was later con­firmed by Zivver to Follow the Money.“All of the red flags should have been raised dur­ing this ac­qui­si­tion”Zivver main­tained, how­ever, that it does not have ac­cess to the en­cryp­tion keys used by cus­tomers, and there­fore can­not hand over data to U.S. au­thor­i­ties. This is de­spite in­de­pen­dent re­searchers con­firm­ing that the data was — for a brief pe­riod — ac­ces­si­ble to the com­pany. If U.S. of­fi­cials wanted ac­cess to such com­mu­ni­ca­tion, Zivver would be legally ob­lig­ated to pro­vide it.Cy­ber­se­cu­rity ex­perts now point to se­ri­ous se­cu­rity con­cerns, and ask why this sale seems to have gone through with­out scrutiny from European au­thor­i­ties.“All of the red flags should have been raised dur­ing this ac­qui­si­tion,” said in­tel­li­gence ex­pert Hugo Vijver, a for­mer long-term of­fi­cer in AIVD, the Dutch se­cu­rity ser­vice.Am­s­ter­dam-based Zivver — which was founded in 2015 by Wouter Klinkhamer and Rick Goud — pro­vides sys­tems for the en­crypted ex­change of in­for­ma­tion via email, chat and video, among other means. Dutch courts, for ex­am­ple, work with Zivver to send clas­si­fied doc­u­ments, and so­lic­i­tors use the ser­vice to send con­fi­den­tial in­for­ma­tion to the courts. Other gov­ern­ment agen­cies in the Netherlands — in­clud­ing also use Zivver. So do vi­tal in­fra­struc­ture op­er­a­tors such as the Port of Rotterdam and The Hague Airport.

In the U.K., a num­ber of NHS hos­pi­tals and lo­cal coun­cils In it is used in ma­jor hos­pi­tals. The in­for­ma­tion that Zivver se­cures for its cus­tomers is there­fore con­fi­den­tial and sen­si­tive by na­ture. When ap­proached by Follow the Money, a num­ber of gov­ern­men­tal agen­cies said the com­pa­ny’s Dutch ori­gins were a big fac­tor in their de­ci­sion to use Zivver. Additionally, the fact that the data trans­ferred via Zivver was stored on servers in Europe also played a role in their de­ci­sions. Now that Zivver has been ac­quired by a com­pany in the United States, that data is This means that the U.S. gov­ern­ment can re­quest ac­cess to this in­for­ma­tion if it wishes, re­gard­less of where the data is stored.These laws are not new, but they have be­come even more dra­con­ian since U.S. President Donald Trump’s re­turn to of­fice, ac­cord­ing to ex­perts.Bert Hubert, a for­mer reg­u­la­tor of the Dutch in­tel­li­gence ser­vices, warned: America is de­te­ri­o­rat­ing so rapidly, both legally and de­mo­c­ra­t­i­cally, that it would be very naive to hand over your courts and hos­pi­tals to their ser­vices.” “Trump re­cently called on Big Tech to ig­nore European leg­is­la­tion. And that is what they are go­ing to do. We have no con­trol over it,” he added.

In Europe, Hubert said: We com­mu­ni­cate al­most ex­clu­sively via American plat­forms. And that means that the U.S. can read our com­mu­ni­ca­tions and dis­rupt our en­tire so­ci­ety if they de­cide that they no longer like us.”Zivver had of­fered an al­ter­na­tive — a European plat­form gov­erned by EU law. “We are now throw­ing that away. If you want to share some­thing con­fi­den­tial with a court or gov­ern­ment, con­sider us­ing a type­writer. That’s about all we have left,” Hubert said.Be­yond American ju­ris­dic­tion, Kiteworks’ man­age­ment raises an­other layer of con­cern: its links to Israeli in­tel­li­gence.Sev­eral of the com­pa­ny’s top ex­ec­u­tives, in­clud­ing CEO Yaron, are vet­er­ans of Unit 8200, the elite cy­ber unit of the Israel Defence Force (IDF). The unit is renowned for its code-break­ing abil­i­ties and feared for its sur­veil­lance op­er­a­tions.

In Israel, there is a re­volv­ing door be­tween the army, lobby, busi­ness and pol­i­tics

Unit 8200 has been linked to ma­jor cy­ber op­er­a­tions, in­clud­ing the Stuxnet at­tack on Iranian nu­clear fa­cil­i­ties in 2007. More re­cently, it was ac­cused of or­ches­trat­ing the det­o­na­tion of thou­sands of pagers in Lebanon, an in­ci­dent the United Nations said vi­o­lated in­ter­na­tional law and killed at least two chil­dren.The unit em­ploys thou­sands of young re­cruits iden­ti­fied for their dig­i­tal skills. It is able to in­ter­cept global tele­phone and in­ter­net traf­fic.In­ter­na­tional me­dia have re­ported that Unit 8200 in­ter­cepts and stores an av­er­age of one mil­lion Palestinian phone calls every hour, data that ends up on Some vet­er­ans them­selves have also ob­jected to the work of the unit. In 2014, dozens of re­servists signed a let­ter to Israeli lead­ers say­ing they no longer wanted to par­tic­i­pate in sur­veil­lance of the oc­cu­pied ter­ri­to­ries.“The lines of com­mu­ni­ca­tion be­tween the Israeli de­fence ap­pa­ra­tus and the busi­ness com­mu­nity have tra­di­tion­ally been very short,” said Dutch in­tel­li­gence ex­pert Vijver. In Israel, there is a re­volv­ing door be­tween the army, lobby, busi­ness and pol­i­tics.”That re­volv­ing door is clearly vis­i­ble in big U.S. tech com­pa­nies — and Kiteworks is no ex­cep­tion.Aside from Yaron, both Chief Business Officer Yaron Galant and Chief Product Officer served in Unit 8200, ac­cord­ing to pub­licly avail­able in­for­ma­tion.They played a di­rect role in ne­go­ti­at­ing the ac­qui­si­tion of Zivver. Their back­ground was known to Zivver’s di­rec­tors Goud and Klinkhamer at the time.

What is tran­spir­ing within the European Union? What are the goals and as­pi­ra­tions of the EU, and how is the bud­get al­lo­cated?

Other se­nior fig­ures also have mil­i­tary in­tel­li­gence back­grounds. Product di­rec­tor Ron Margalit worked in Unit 8200 be­fore serv­ing in the of­fice of Israeli Prime Minister Benjamin Netanyahu. Mergers and ac­qui­si­tions di­rec­tor Uri Kedem is a for­mer Israeli naval cap­tain.Kite­works is not unique in this re­spect. In­creas­ing num­bers of U.S. cy­ber­se­cu­rity firms now em­ploy for­mer Israeli in­tel­li­gence of­fi­cers. This trend, ex­perts say, cre­ates vul­ner­a­bil­i­ties that are rarely dis­cussed.An in­de­pen­dent re­searcher quoted by U.S. Drop Site News said: Not all of these vet­er­ans will send clas­si­fied data to Tel Aviv. But the fact that so many for­mer spies work for these com­pa­nies does cre­ate a se­ri­ous vul­ner­a­bil­ity: no other coun­try has such ac­cess to the Or, as the ex-in­tel­li­gence reg­u­la­tor Hubert put it: Gaining ac­cess to com­mu­ni­ca­tion flows is part of Israel’s long-term strat­egy. A com­pany like Zivver fits per­fectly into that strat­egy.”The in­for­ma­tion han­dled by Zivver — con­fi­den­tial com­mu­ni­ca­tions be­tween gov­ern­ments, hos­pi­tals and cit­i­zens — is a po­ten­tial gold­mine for in­tel­li­gence ser­vices.Ac­cord­ing to in­tel­li­gence ex­pert Vijver, ac­cess to this kind of ma­te­r­ial makes it eas­ier to pres­sure in­di­vid­u­als into co­op­er­at­ing with in­tel­li­gence agen­cies. Once an in­tel­li­gence ser­vice has ac­cess to med­ical, fi­nan­cial and per­sonal data, it can more eas­ily pres­sure peo­ple into spy­ing for it, he said.But the gain for in­tel­li­gence ser­vices lies not just in sen­si­tive in­for­ma­tion, said Hubert: Any data that al­lows an agency to tie tele­phone num­bers, ad­dresses or pay­ment data to an in­di­vid­ual is of great in­ter­est to them.” He added: It is ex­actly this type of data that is abun­dantly pre­sent in com­mu­ni­ca­tions be­tween civil­ians, gov­ern­ments and care in­sti­tu­tions. In other words, the in­for­ma­tion that flows through a com­pany like Zivver is ex­tremely valu­able for in­tel­li­gence ser­vices.”These geopo­lit­i­cal con­cerns be­come more pro­nounced when com­bined with tech­ni­cal wor­ries about Zivver’s en­cryp­tion. For years, Zivver pre­sented it­self as a European al­ter­na­tive that guar­an­teed pri­vacy. Its mar­ket­ing ma­te­ri­als claimed that mes­sages were en­crypted on the sender’s de­vice and that the com­pany had zero ac­cess” to con­tent. But an in­ves­ti­ga­tion by two cy­ber­se­cu­rity ex­perts at a Dutch gov­ern­ment agency, at the re­quest of Follow the Money, un­der­mines this claim.

The ex­perts, who par­tic­i­pated in the in­ves­ti­ga­tion on con­di­tion of anonymity, ex­plored what hap­pened when that gov­ern­ment agency logged into Zivver’s web ap­pli­ca­tion to send in­for­ma­tion.Tests showed that when gov­ern­ment users sent mes­sages through Zivver’s web ap­pli­ca­tion, the con­tent — in­clud­ing at­tach­ments — was up­loaded to Zivver’s servers as read­able text be­fore be­ing en­crypted. The same process ap­plied to email ad­dresses of senders and re­cip­i­ents.“In these spe­cific cases, Zivver processed the mes­sages in read­able form,” said in­de­pen­dent cy­ber­se­cu­rity re­searcher Matthijs Koot, who ver­i­fied the find­ings. “Even if only briefly, tech­ni­cally speak­ing it is pos­si­ble that Zivver was able to view these mes­sages,” he said.He added: Whether a mes­sage is en­crypted at a later stage makes lit­tle dif­fer­ence. It may help against hack­ers, but it no longer mat­ters in terms of pro­tec­tion against Zivver.”Despite these find­ings, Zivver con­tin­ues to in­sist on its web­site and in pro­mo­tional ma­te­r­ial else­where — in­clud­ing on the U.K. gov­ern­men­t’s Digital Marketplace — that contents of se­cure mes­sages are in­ac­ces­si­ble to Zivver and third par­ties”.So far, no ev­i­dence has sur­faced that Zivver mis­used its tech­ni­cal ac­cess. But now that the com­pany is owned by Kiteworks, ex­perts see a height­ened risk.For­mer in­tel­li­gence of­fi­cer Vijver puts it bluntly: Given the links be­tween Zivver, Kiteworks and Unit 8200, I be­lieve there is zero chance that no data is go­ing to Israel. To think oth­er­wise is com­pletely naive.”The sale of Zivver could tech­ni­cally have been blocked or in­ves­ti­gated un­der Dutch law. According to the Security Assessment of Investments, Mergers and Acquisitions Act, such sen­si­tive takeovers are sup­posed to be re­viewed by a spe­cialised agency. But the Dutch in­te­rior min­istry de­clared that Zivver was not part of the coun­try’s critical in­fra­struc­ture,” mean­ing that no re­view was car­ried out.That, in Hubert’s view, was a huge blun­der”.

It’s bad enough that a com­pany that plays such an im­por­tant role in gov­ern­ment com­mu­ni­ca­tions is falling into American hands, but the fact that there are all kinds of Israeli spies there is very se­ri­ous,” he said.“The takeover is tak­ing place in an un­safe world full of geopo­lit­i­cal ten­sions”Ex­perts say the Zivver case high­lights Europe’s lack of strate­gic con­trol over its dig­i­tal in­fra­struc­ture.Mar­iëtte van Huijstee of the Netherlands-based said: I doubt whether the se­cu­rity of sen­si­tive emails and files … should be left to the pri­vate sec­tor. And if you think that is ac­cept­able, should we leave it to non-Eu­ro­pean par­ties over whom we have no con­trol?”“We need to think much more strate­gi­cally about our dig­i­tal in­fra­struc­ture and reg­u­late these kinds of is­sues much bet­ter, for ex­am­ple by des­ig­nat­ing en­cryp­tion ser­vices as vi­tal in­fra­struc­ture,” she added.Zivver, for its part, claimed that se­cu­rity will im­prove un­der Kiteworks. Zivver’s full re­sponses to Follow the Money’s ques­tions can be read here and here.But Van Huijstee was not con­vinced.“Kite­works em­ploys peo­ple who come from a ser­vice that spe­cialises in de­crypt­ing files,” she said. “The takeover is tak­ing place in an un­safe world full of geopo­lit­i­cal ten­sions, and we are deal­ing with data that is very valu­able. In such a case, trust is not enough and more con­trol is needed.”

...

Read the original on www.ftm.eu »

7 252 shares, 11 trendiness

Goodbye Microservices

Given that there would only be one ser­vice, it made sense to move all the des­ti­na­tion code into one repo, which meant merg­ing all the dif­fer­ent de­pen­den­cies and tests into a sin­gle repo. We knew this was go­ing to be messy.

For each of the 120 unique de­pen­den­cies, we com­mit­ted to hav­ing one ver­sion for all our des­ti­na­tions. As we moved des­ti­na­tions over, we’d check the de­pen­den­cies it was us­ing and up­date them to the lat­est ver­sions. We fixed any­thing in the des­ti­na­tions that broke with the newer ver­sions.

With this tran­si­tion, we no longer needed to keep track of the dif­fer­ences be­tween de­pen­dency ver­sions. All our des­ti­na­tions were us­ing the same ver­sion, which sig­nif­i­cantly re­duced the com­plex­ity across the code­base. Maintaining des­ti­na­tions now be­came less time con­sum­ing and less risky.

We also wanted a test suite that al­lowed us to quickly and eas­ily run all our des­ti­na­tion tests. Running all the tests was one of the main block­ers when mak­ing up­dates to the shared li­braries we dis­cussed ear­lier.

Fortunately, the des­ti­na­tion tests all had a sim­i­lar struc­ture. They had ba­sic unit tests to ver­ify our cus­tom trans­form logic was cor­rect and would ex­e­cute HTTP re­quests to the part­ner’s end­point to ver­ify that events showed up in the des­ti­na­tion as ex­pected.

Recall that the orig­i­nal mo­ti­va­tion for sep­a­rat­ing each des­ti­na­tion code­base into its own repo was to iso­late test fail­ures. However, it turned out this was a false ad­van­tage. Tests that made HTTP re­quests were still fail­ing with some fre­quency. With des­ti­na­tions sep­a­rated into their own re­pos, there was lit­tle mo­ti­va­tion to clean up fail­ing tests. This poor hy­giene led to a con­stant source of frus­trat­ing tech­ni­cal debt. Often a small change that should have only taken an hour or two would end up re­quir­ing a cou­ple of days to a week to com­plete.

The out­bound HTTP re­quests to des­ti­na­tion end­points dur­ing the test run was the pri­mary cause of fail­ing tests. Unrelated is­sues like ex­pired cre­den­tials should­n’t fail tests. We also knew from ex­pe­ri­ence that some des­ti­na­tion end­points were much slower than oth­ers. Some des­ti­na­tions took up to 5 min­utes to run their tests. With over 140 des­ti­na­tions, our test suite could take up to an hour to run.

To solve for both of these, we cre­ated Traffic Recorder. Traffic Recorder is built on top of yak­bak, and is re­spon­si­ble for record­ing and sav­ing des­ti­na­tions’ test traf­fic. Whenever a test runs for the first time, any re­quests and their cor­re­spond­ing re­sponses are recorded to a file. On sub­se­quent test runs, the re­quest and re­sponse in the file is played back in­stead re­quest­ing the des­ti­na­tion’s end­point. These files are checked into the repo so that the tests are con­sis­tent across every change. Now that the test suite is no longer de­pen­dent on these HTTP re­quests over the in­ter­net, our tests be­came sig­nif­i­cantly more re­silient, a must-have for the mi­gra­tion to a sin­gle repo.

It took mil­lisec­onds to com­plete run­ning the tests for all 140+ of our des­ti­na­tions af­ter we in­te­grated Traffic Recorder. In the past, just one des­ti­na­tion could have taken a cou­ple of min­utes to com­plete. It felt like magic.

Once the code for all des­ti­na­tions lived in a sin­gle repo, they could be merged into a sin­gle ser­vice. With every des­ti­na­tion liv­ing in one ser­vice, our de­vel­oper pro­duc­tiv­ity sub­stan­tially im­proved. We no longer had to de­ploy 140+ ser­vices for a change to one of the shared li­braries. One en­gi­neer can de­ploy the ser­vice in a mat­ter of min­utes.

The proof was in the im­proved ve­loc­ity. When our mi­croser­vice ar­chi­tec­ture was still in place, we made 32 im­prove­ments to our shared li­braries. One year later,  we’ve made 46 im­prove­ments.

The change also ben­e­fited our op­er­a­tional story. With every des­ti­na­tion liv­ing in one ser­vice, we had a good mix of CPU and mem­ory-in­tense des­ti­na­tions, which made scal­ing the ser­vice to meet de­mand sig­nif­i­cantly eas­ier. The large worker pool can ab­sorb spikes in load, so we no longer get paged for des­ti­na­tions that process small amounts of load.

Moving from our mi­croser­vice ar­chi­tec­ture to a mono­lith over­all was huge im­prove­ment, how­ever, there are trade-offs:

Fault iso­la­tion is dif­fi­cult. With every­thing run­ning in a mono­lith, if a bug is in­tro­duced in one des­ti­na­tion that causes the ser­vice to crash, the ser­vice will crash for all des­ti­na­tions. We have com­pre­hen­sive au­to­mated test­ing in place, but tests can only get you so far. We are cur­rently work­ing on a much more ro­bust way to pre­vent one des­ti­na­tion from tak­ing down the en­tire ser­vice while still keep­ing all the des­ti­na­tions in a mono­lith.

In-memory caching is less ef­fec­tive. Previously, with one ser­vice per des­ti­na­tion, our low traf­fic des­ti­na­tions only had a hand­ful of processes, which meant their in-mem­ory caches of con­trol plane data would stay hot. Now that cache is spread thinly across 3000+ processes so it’s much less likely to be hit. We could use some­thing like Redis to solve for this, but then that’s an­other point of scal­ing for which we’d have to ac­count. In the end, we ac­cepted this loss of ef­fi­ciency given the sub­stan­tial op­er­a­tional ben­e­fits.

Updating the ver­sion of a de­pen­dency may break mul­ti­ple des­ti­na­tions. While mov­ing every­thing to one repo solved the pre­vi­ous de­pen­dency mess we were in, it means that if we want to use the newest ver­sion of a li­brary, we’ll po­ten­tially have to up­date other des­ti­na­tions to work with the newer ver­sion. In our opin­ion though, the sim­plic­ity of this ap­proach is worth the trade-off. And with our com­pre­hen­sive au­to­mated test suite, we can quickly see what breaks with a newer de­pen­dency ver­sion.

Our ini­tial mi­croser­vice ar­chi­tec­ture worked for a time, solv­ing the im­me­di­ate per­for­mance is­sues in our pipeline by iso­lat­ing the des­ti­na­tions from each other. However, we weren’t set up to scale. We lacked the proper tool­ing for test­ing and de­ploy­ing the mi­croser­vices when bulk up­dates were needed. As a re­sult, our de­vel­oper pro­duc­tiv­ity quickly de­clined.

Moving to a mono­lith al­lowed us to rid our pipeline of op­er­a­tional is­sues while sig­nif­i­cantly in­creas­ing de­vel­oper pro­duc­tiv­ity. We did­n’t make this tran­si­tion lightly though and knew there were things we had to con­sider if it was go­ing to work.

We needed a rock solid test­ing suite to put every­thing into one repo. Without this, we would have been in the same sit­u­a­tion as when we orig­i­nally de­cided to break them apart. Constant fail­ing tests hurt our pro­duc­tiv­ity in the past, and we did­n’t want that hap­pen­ing again. We ac­cepted the trade-offs in­her­ent in a mono­lithic ar­chi­tec­ture and made sure we had a good story around each. We had to be com­fort­able with some of the sac­ri­fices that came with this change.

When de­cid­ing be­tween mi­croser­vices or a mono­lith, there are dif­fer­ent fac­tors to con­sider with each. In some parts of our in­fra­struc­ture, mi­croser­vices work well but our server-side des­ti­na­tions were a per­fect ex­am­ple of how this pop­u­lar trend can ac­tu­ally hurt pro­duc­tiv­ity and per­for­mance. It turns out, the so­lu­tion for us was a mono­lith.

The tran­si­tion to a mono­lith was made pos­si­ble by Stephen Mathieson, Rick Branson, Achille Roussel, Tom Holmes, and many more.

Special thanks to Rick Branson for help­ing re­view and edit this post at every stage.

...

Read the original on www.twilio.com »

8 251 shares, 12 trendiness

Recovering Anthony Bourdain’s (really) lost Li.st’s

Loved read­ing through GReg TeChnoLogY Anthony Bourdain’s Lost Li.st’s and see­ing the list of lost Anthony Bourdain li.st’s made me think on whether at least some of them we can re­cover.

Having worked in se­cu­rity and crawl­ing space for ma­jor­ity of my ca­reer—I don’t have the ac­cess nor per­mis­sion to use the pro­pri­etary stor­ages—I thought we might be able to find some­thing from pub­licly avail­able crawl archives.

All of the code and ex­am­ples lead to the source git repos­i­tory. This ar­ti­cle has also been dis­cussed on hack­ernews. Also, a week be­fore I pub­lished this, mi­ran­dom had the same idea as me and pub­lished their find­ings—go check them out.

If Internet Archive had the par­tial list that Greg pub­lished, what about the Common Crawl? Reading through their doc­u­men­ta­tion, it seems straight­for­ward enough to get pre­fix in­dex for Tony’s lists and grep for any sub-paths.

Putting some­thing up with help of Claude to prove my the­ory, we have com­mon­crawl_search.py that makes a sin­gle in­dex re­quest to a spe­cific dataset and if any hits dis­cov­ered, re­trieve them from the pub­lic s3 bucket—since they are small straight-up HTML doc­u­ments, seems even more fea­si­ble than I had ini­tially thought.

Simply have a python ver­sion around 3.14.2 and in­stall the de­pen­den­cies from re­quire­ments.txt. Run the be­low and we are in busi­ness. Now, be­low, you’ll find the com­mand I ran and then some man­ual arche­o­log­i­cal ef­fort to pret­tify the find­ings.

Images have been lost. Other av­enues had struck no luck. I’ll try again later.

Any and all em­pha­sis, miss­ing punc­tu­a­tion, cool gram­mar is all by Anthony Bourdain. The only mod­i­fi­ca­tions I have made is to the lay­out, to rep­re­sent li.st as closely as pos­si­ble with no changes to the con­tent.

If you see these blocks, that’s me com­ment­ing if pic­tures have been lost.

From Greg’s page, let’s go and try each en­try one by one, I’ll put the table of what I was­n’t able to find in Common Crawl, but I would as­sume ex­ists else­where—I’d be happy to take an­other look. And no, none of this above has been writ­ten by AI, only the code since I don’t re­ally care about war­cio en­cod­ing or writ­ing the same python re­quests method for the Nth time. Enjoy!

Things I No Longer Have Time or Patience For

Dinners where it takes the waiter longer to de­scribe my food than it takes me to eat it.

I ad­mit it: my life does­n’t suck. Some re­cent views I’ve en­joyed

Montana at sun­set : There’s pheas­ant cook­ing be­hind the cam­era some­where. To the best of my rec­ol­lec­tion some very nice bour­bon. And it IS a big sky .

Puerto Rico: Thank you Jose Andres for invit­ing me to this beau­ti­ful beach!

Naxos: drink­ing ouzo and look­ing at this. Not a bad day at the of­fice .

Istanbul: raki and grilled lamb and this ..

Borneo: The air is thick with hints of durian, sam­bal, co­conut..

Chicago: up early to go train #Redzovic

If I Were Trapped on a Desert Island With Only Three Tv Series

Edge of Darkness (with Bob Peck and Joe Don Baker )

The Film Nobody Ever Made

Dreamcasting across time with the liv­ing and the dead, this un­ti­tled, yet to be writ­ten mas­ter­work of cin­ema, shot, no doubt, by Christopher Doyle, lives only in my imag­i­na­tion.

If you bought these vinyls from an ema­ci­ated look­ing dude with an ea­ger, some­what dis­tracted ex­pres­sion on his face some­where on up­per Broadway some­time in the mid 80’s, that was me . I’d like them back. In a sen­ti­men­tal mood.

ma­te­r­ial things I feel a strange, pos­si­bly un­nat­ural at­trac­tion to and will buy (if I can) if I stum­ble across them in my trav­els. I am not a paid spokesper­son for any of this stuff .

Vintage Persol sun­glasses : This is pretty ob­vi­ous. I wear them a lot. I col­lect them when I can. Even my pro­duc­tion team have taken to wear­ing them.

19th cen­tury trepan­ning in­stru­ments: I don’t know what ex­plains my fas­ci­na­tion with these de­vices, de­signed to drill drain-sized holes into the skull of­ten for pur­poses of re­liev­ing pressure” or bad hu­mours”. But I can’t get enough of them. Tip: don’t get a pro­longed headache around me and ask if I have any­thing for it. I do.

Montagnard bracelets: I only have one of these but the few that find their way onto the mar­ket have so much his­tory. Often given to the in­dige­nous moun­tain peo­ple s Special Forces ad­vi­sors dur­ing the very early days of America’s in­volve­ment in Vietnam .

Jiu Jitsi Gi’s: Yeah. When it comes to high end BJJ wear, I am a to­tal whore. You know those peo­ple who col­lect lim­ited edi­tion Nikes ? I’m like that but with Shoyoroll . In my de­fense, I don’t keep them in plas­tic bags in a dis­play case. I wear that shit.

Voiture: You know those old school, sil­ver plated (or solid sil­ver) blimp like carts they roll out into the din­ing room to carve and serve your roast? No. Probably not. So few places do that any­more. House of Prime Rib does it. Danny Bowein does it at Mission Chinese. I don’t have one of these. And I likely never will. But I can dream.

Kramer knives: I don’t own one. I can’t af­ford one . And I’d likely have to wait for years even if I could af­ford one. There’s a long wait­ing list for these in­di­vid­u­ally hand crafted beau­ties. But I want one. Badly. http://​www.kramerknives.com/​gallery/

R. CRUMB : All of it. The col­lected works. These Taschen vol­umes to start. I wanted to draw bril­liant, beau­ti­ful, filthy comix like Crumb un­til I was 13 or 14 and it be­came clear that I just did­n’t have that kind of tal­ent. As a re­spon­si­ble fa­ther of an 8 year old girl, I just can’t have this stuff in the house. Too dark, hate­ful, twisted. Sigh…

THE MAGNIFICENT AMBERSONS : THE UNCUT, ORIGINAL ORSON WELLES VERSION: It does­n’t ex­ist. Which is why I want it. The Holy Grail for film nerds, Welles’ fol­low up to CITIZEN KANE shoulda, coulda been an even greater mas­ter­piece . But the stu­dio butchered it and re-shot a bull­shit end­ing. I want the orig­i­nal. I also want a mag­i­cal pony.

Four Spy Novels by Real Spies and One Not by a Spy

I like good spy nov­els. I pre­fer them to be re­al­is­tic . I pre­fer them to be writ­ten by real spies. If the main char­ac­ter car­ries a gun, I’m al­ready los­ing in­ter­est. Spy nov­els should be about be­trayal.

Ashenden–Somerset Maugham

Somerset wrote this bleak, darkly funny, deeply cyn­i­cal novel in the early part of the 20th cen­tury. It was ap­par­ently close enough to the re­al­ity of his es­pi­onage ca­reer that MI6 in­sisted on ma­jor ex­ci­sions. Remarkably ahead of its time in its at­mos­phere of fu­til­ity and be­trayal.

The Man Who Lost the War–WT Tyler

WT Tyler is a pseu­do­nym for a for­mer foreign ser­vice” of­fi­cer who could re­ally re­ally write. This one takes place in post-war Berlin and else­where and was, in my opin­ion, wildly un­der ap­pre­ci­ated. See also his Ants of God.

The Human Factor–Graham Greene

Was Greene think­ing of his old col­league Kim Philby when he wrote this? Maybe. Probably. See also Our Man In Havana.

The Tears of Autumn -Charles McCarry

A clever take on the JFK as­sas­si­na­tion with a Vietnamese an­gle. See also The Miernik Dossier and The Last Supper

Agents of Innocence–David Ignatius

Ignatius is a jour­nal­ist not a spook, but this one, set in Beirut, hewed all too closely to still not of­fi­cially ac­knowl­edged events. Great stuff.

I wake up in a lot of ho­tels, so I am fiercely loyal to the ones I love. A ho­tel where I know im­me­di­ately wher I am when I open my eyes in the morn­ing is a rare joy. Here are some of my fa­vorites

CHATEAU MARMONT ( LA) : if I have to die in a ho­tel room, let it be here. I will work in LA just to stay at the Chateau.

CHILTERN FIREHOUSE (London): Same owner as the Chateau. An amaz­ing Victorian fire­house turned ho­tel. Pretty much per­fec­tion

EDGEWATER INN (Seattle): kind of a lum­ber theme go­ing on…ships slide right by your win­dow. And the Led Zep Mudshark in­ci­dent”.

THE METROPOLE (Hanoi): there’s a theme de­vel­op­ing: if Graham Greene stayed at a ho­tel, chances are I will too.

THE MURRAY (Livingston,Montana): You want the Peckinpah suite

Pictures in each have not been re­cov­ered.

5 Photos on My Phone, Chosen at Random

Shame, in­deed, no pic­tures, there was one for each.

People I’d Like to Be for a Day

I’m Hungry and Would Be Very Happy to Eat Any of This Right Now

Spaghetti a la bot­targa . I would re­ally, re­ally like some of this. Al dente, lots of chili flakes

A street fair sausage and pep­per hero would be nice. Though shit­ting like a mink is an in­evitable and near im­me­di­ate out­come

Some uni. Fuck it. I’ll smear it on an English muf­fin at this point.

I won­der if that cheese is still good?

In which my Greek idyll is Suddenly in­vaded by pro­fes­sional nud­ists

T-shirt and no pants. Leading one to the ob­vi­ous ques­tion : why bother?

The cheesy crust on the side of the bowl of Onion Soup Gratinee

Before he died, Warren Zevon dropped this wis­dom bomb: Enjoy every sand­wich”. These are a few lo­cals I’ve par­tic­u­larly en­joyed:

PASTRAMI QUEEN: (1125 Lexington Ave. ) Pastrami Sandwich. Also the turkey with Russian dress­ing is not bad. Also the brisket.

EISENBERG’S SANDWICH SHOP: ( 174 5th Ave.) Tuna salad on white with let­tuce. I’d sug­gest drink­ing a lime Rickey or an Arnold Palmer with that.

THE JOHN DORY OYSTER BAR: (1196 Broadway) the Carta di Musica with Bottarga and Chili is amaz­ing. Is it a sand­wich? Yes. Yes it is.

RANDOM STREET FAIRS: (Anywhere tube socks and stale spices are sold. ) New York street fairs suck. The same dreary ven­dors, same bad food. But those nasty sausage and pep­per hero sand­wiches are a siren song, lur­ing me, al­ways to­wards the rocks. Shitting like a mink al­most im­me­di­ately af­ter is guar­an­teed but who cares?

BARNEY GREENGRASS : ( 541 Amsterdam Ave.) Chopped Liver on rye. The best chopped liver in NYC.

SIBERIA in any of its it­er­a­tions. The one on the sub­way be­ing the best

LADY ANNES FULL MOON SALOON a bar so nasty I’d bring out of town vis­i­tors there just to scare them

KELLY’S on 43rd and Lex. Notable for 25 cent drafts and reg­u­larly and re­li­ably serv­ing me when I was 15

BILLY’S TOPLESS (later, Billy’s Stopless) an at­mos­pheric, work­ing class place, per­fect for late af­ter­noon drink­ing where no­body hus­tled you for money and every­body knew every­body. Great all-hair metal juke­box . Naked breasts were not re­ally the point.

THE BAR AT HAWAII KAI. tucked away in a gi­ant tiki themed night­club in Times Square with a midget door­man and a floor show. Best place to drop acid EVER.

THE NURSERY af­ter hours bar dec­o­rated like a pe­di­a­tri­cian’s of­fice. Only the nurs­ery rhyme char­ac­ters were punk rock­ers of the day.

It was sur­pris­ing to see that only one page was not re­cov­er­able from the com­mon crawl.

I’ve en­joyed this lit­tle pro­ject tremen­dously—a lit­tle arche­ol­ogy pro­ject. Can we de­clare vic­tory for at least this en­deavor? Hopefully, we would be able to find im­ages, but that’s a lit­tle tougher, since that er­a’s cloud­front is fully gone.

What else can we work on restor­ing and set­ting up some sort of a pub­lic archive to store them? I made this a git repos­i­tory for the sole pur­pose so that any­one in­ter­ested can con­tribute their in­ter­est and pas­sion for these kinds of pro­jects.

Thank you and un­til next time! ◼︎

...

Read the original on sandyuraz.com »

9 246 shares, 12 trendiness

I Fed 24 Years of My Blog Posts to a Markov Model

Yesterday I shared a lit­tle pro­gram called Mark V. Shaney Junior at github.com/​susam/​mvs. It is a min­i­mal im­ple­men­ta­tion of a Markov text gen­er­a­tor in­spired by the leg­endary Mark V. Shaney pro­gram from the 1980s. Mark V. Shaney was a syn­thetic Usenet user that posted mes­sages to var­i­ous news­groups us­ing text gen­er­ated by a Markov model. See the Wikipedia ar­ti­cle Mark

V. Shaney for more de­tails about it. In this post, I will dis­cuss my im­ple­men­ta­tion of the model, ex­plain how it works and share some of the re­sults pro­duced by it.

The pro­gram I shared yes­ter­day has only about 30 lines of

Python and favours sim­plic­ity over ef­fi­ciency. Even if you have never worked with Markov mod­els be­fore, I am quite con­fi­dent that it will take you less than 20 min­utes to un­der­stand the whole pro­gram and make com­plete sense of it. I also of­fer an ex­pla­na­tion fur­ther be­low in this post.

As a hobby, I of­ten en­gage in ex­ploratory pro­gram­ming where I write com­puter pro­grams not to solve a spe­cific prob­lem but sim­ply to ex­plore a par­tic­u­lar idea or topic for the sole pur­pose of recre­ation. I must have writ­ten small pro­grams to ex­plore Markov chains for var­i­ous kinds of state spaces over a dozen times by now. Every time, I just pick my last ex­per­i­men­tal code and edit it to en­code the new state space I am ex­plor­ing. That’s usu­ally my gen­eral ap­proach to such one-off pro­grams. I have hun­dreds of tiny lit­tle ex­per­i­men­tal pro­grams ly­ing on my disk at any given time.

Once in a while, I get the itch to take one of those ex­ploratory pro­grams, give it some fin­ish­ing touches, wrap it up in a nice Git repo along with a README.md, CHANGES.md

and the whole she­bang and share it on github.com/​susam and code­berg.org/​susam. The Mark V. Shaney Junior pro­gram that I shared yes­ter­day hap­pened to be one such ex­er­cise.

If you scroll down the README of this pro­ject, you’ll find some nice ex­am­ples of the gib­ber­ish pro­duced by this pro­gram. The first few ex­am­ples there are the re­sult of train­ing the model on A Christmas Carol by Charles Dickens, one of my favourite au­thors. It is of­ten said that Dickens never used fewer words when more would suf­fice. So I thought there could­n’t be a bet­ter piece of text when it comes to test­ing out my tiny Markov model. I’ll not re­pro­duce the gen­er­ated text ex­am­ples here for the sake of brevity. If you are in­ter­ested to take a look, just head over to the Gibberish

sec­tion of the README.

Soon af­ter shar­ing the pro­ject, I won­dered what kind of gib­ber­ish it would pro­duce if I fed all 24 years of my blog posts and pages into the pro­gram. Well, here’s one of the re­sults:

$ ./mvs < susam.txt

while a query re­place op­er­a­tion is ap­proved by the user. The above

vari­able de­fines the build job. It can be in­cred­i­bly use­ful while

work­ing on as­sem­bly lan­guage and ma­chine code. In fact, all in­ter­nal

re­sources like the re­sult to re­fine the search prompt changes from

bck-i-search: to fwd-i-search:. Now type C-SPC (i.e. ctrl+space) to

set a mark causes Emacs to use 32-bit reg­is­ters like EBP, ESP,

etc. Thus the be­hav­iour is un­de­fined. Such code may be­have dif­fer­ently

when com­piled with the read­ily avail­able GNU tools like the shape

of 8. Flipping P” hor­i­zon­tally makes it a proper quine: cat $0

This is the text that comes out af­ter the pro­gram con­sumes over 200 posts con­sist­ing of about 200,000 words. My blog also has a com­ments sec­tion with over 500 com­ments con­sist­ing of about 40,000 words. All com­ments were ex­cluded while train­ing the model. Here is an­other out­put ex­am­ple:

$ ./mvs < susam.txt

en­joy ask­ing what hap­pens if” and then type M-x zap-up-to-char RET

b. The buffer for this spe­cific video, the ac­tual fare for 8.3 km and

11 are all writ­ten from scratch. No prior knowl­edge is ex­pected to

slow down in fu­ture. For now, I will add a state­ment like x =

0.999…, the in­ner cor­ner square as dis­cussed in the code seg­ment

into the REPL win­dow. Unlike Slimv, Vlime can work with and de­bug

ex­e­cutable files, it can be more con­ve­nient. M-x: Execute Extended

Command The key se­quence M-q in­vokes the com­mand cat and type TAB to

in­dent the cur­rent

Here is a par­tic­u­larly in­co­her­ent but amus­ing one:

$ ./mvs < susam.txt

Then open a new Lisp source file and the ex­act an­swer could harm

stu­dents’ self-es­teem. Scientists have ar­bi­trar­ily as­sumed that an

in­te­gral do­main. However, the string and com­ment text. To demon­strate

how a build job can trig­ger it­self, pass in­put to stan­dard out­put or

stan­dard er­ror), Eshell au­to­mat­i­cally runs the fol­low­ing com­mand in

Vim and Emacs will copy the mes­sage length limit of 512 char­ac­ters,

etc. For ex­am­ple, while learn­ing to play the game be­tween nor­mal mode

to move the point is on an old dic­tio­nary ly­ing around our house and

that is mov­ing to the small and sup­port­ive com­mu­nity

No, I have never said any­where that open­ing a Lisp source file could harm any­one’s self-es­teem. The text gen­er­a­tor has picked up the Lisp source file’ phrase from my Lisp in Vim post and the self-esteem’ bit from the From Perl

to Pi post.

By de­fault, this pro­gram looks at tri­grams (all se­quences of three ad­ja­cent words) and cre­ates a map where the first two words of the tri­gram are in­serted as the key and the third word is ap­pended to its list value. This map is the model. In this way, the model cap­tures each pair of ad­ja­cent words along with the words that im­me­di­ately fol­low each pair. The text gen­er­a­tor first chooses a key (a pair of words) at ran­dom and se­lects a word that fol­lows. If there are mul­ti­ple fol­low­ers, it picks one uni­formly at ran­dom. It then re­peats this process with the most re­cent pair of words, con­sist­ing of one word from the pre­vi­ous pair and the word that was just picked. It con­tin­ues to do this un­til it can no longer find a fol­lower or a fixed word limit (100 by de­fault) is reached. That is pretty much the whole al­go­rithm. There is­n’t much more to it. It is as sim­ple as it gets. For that rea­son, I of­ten de­scribe a sim­ple Markov model like this as the hello, world’ for lan­guage mod­els.

If the same tri­gram oc­curs mul­ti­ple times in the train­ing data, the model records the fol­lower word (the third word) mul­ti­ple times in the list as­so­ci­ated with the key (the first two words). This rep­re­sen­ta­tion can be op­ti­mised, of course, by keep­ing fre­quen­cies of the fol­lower words rather than du­pli­cat­ing them in the list, but that is left as an ex­er­cise to the reader. In any case, when the text gen­er­a­tor chooses a fol­lower for a given pair of words, a fol­lower that oc­curs more fre­quently af­ter that pair has a higher prob­a­bil­ity of be­ing cho­sen. In ef­fect, the next word is sam­pled based only on the pre­vi­ous two words and not on the full his­tory of the gen­er­ated text. This mem­o­ry­less de­pen­dence on the cur­rent state is what makes the gen­er­a­tor Markov. Formally, for a dis­crete-time sto­chas­tic process, the Markov prop­erty can be ex­pressed as

\[ P(X_{n+1} \mid X_n, X_{n-1}, \ldots, X_1) = P(X_{n+1} \mid X_n). \]

where \( X_n \) rep­re­sents the \( n \)th state. In our case, each state \( X_n \) is a pair of words \( (w_{n-1}, w_{n}) \) but the state space could just as well con­sist of other ob­jects, such as a pair of char­ac­ters, pixel val­ues or mu­si­cal notes. The se­quence of states \( (X_1, X_2, \dots) \) vis­ited by the pro­gram forms a Markov chain. The left-hand side of the equa­tion de­notes the con­di­tional dis­tri­b­u­tion of the next state \( X_{n+1} \) given the en­tire his­tory of states \( X_1, X_2, \dots, X_n, \) while the right-hand side con­di­tions only on the cur­rent state. When both are equal, it means that the prob­a­bil­ity of the next state de­pends only on the cur­rent state and not on the ear­lier states. This is the Markov prop­erty. It ap­plies to the text gen­er­a­tion process only, not the train­ing data. The train­ing data is used only to es­ti­mate the tran­si­tion prob­a­bil­i­ties of the model.

In 2025, given the over­whelm­ing pop­u­lar­ity of large lan­guage mod­els (LLMs), Markov mod­els like this look unim­pres­sive. Unlike LLMs, a sim­ple Markov model can­not cap­ture global struc­ture or long-range de­pen­den­cies within the text. It re­lies en­tirely on lo­cal word tran­si­tion sta­tis­tics. Also, these days, one hardly needs a Markov model to gen­er­ate gib­ber­ish; so­cial me­dia pro­vides an am­ple sup­ply. Nevertheless, I think the sim­plic­ity of its de­sign and im­ple­men­ta­tion serves as a good en­try point into lan­guage mod­els.

In my im­ple­men­ta­tion, the num­ber of words in the key of the map can be set via com­mand line ar­gu­ments. By de­fault, it is 2 as de­scribed above. This value is also known as the or­der of the model. So by de­fault the or­der is 2. If we in­crease it to, say, 3 or 4, the gen­er­ated text be­comes a lit­tle more co­her­ent. Here is one such ex­am­ple:

$ ./mvs 4 < susam.txt

It is also pos­si­ble to search for chan­nels by chan­nel names. For

ex­am­ple, on Libera Chat, to search for all chan­nels with python’ in

its name, en­ter the IRC com­mand: /msg alis list python. Although I

have used Libera Chat in the ex­am­ples above, there are plenty of

in­fi­nite fields, so they must all be in­te­gral do­mains too. Consider

the field of ra­tio­nal num­bers Q. Another quick way to ar­rive at this

fact is to ob­serve that when one knight is placed on a type D square,

only two po­si­tions for the sec­ond knight such that the two knights

at­tack

Except for a cou­ple of abrupt and mean­ing­less tran­si­tions, the text is mostly co­her­ent. We need to be care­ful about not in­creas­ing the or­der too much. In fact, if we in­crease the or­der of the model to 5, the gen­er­ated text be­comes very dry and fac­tual be­cause it be­gins to quote large por­tions of the blog posts ver­ba­tim. Not much fun can be had with that.

Before I end this post, let me pre­sent one fi­nal ex­am­ple where I ask it to gen­er­ate text from an ini­tial prompt:

$ ./mvs 2 100 Finally we’ < susam.txt

Finally we di­vide this num­ber by a feed ag­gre­gra­tor for Emacs-related

blogs. The fol­low­ing com­plete key se­quences de­scribe the ef­fects of

pre­vi­ous eval­u­a­tions shall have taken a sim­ple and small to con­tain

bad con­tent. This pro­vides an in­ter­ac­tive byte-com­piled Lisp func­tion

in MATLAB and GNU bash 5.1.4 on Debian is eas­ily re­pro­ducible in

Windows XP. Older ver­sions might be able to run that server for me it

played a sig­nif­i­cant bur­den on me as soon as pos­si­ble. C-u F: Visit

the marked files or di­rec­to­ries in the sense that it was al­ready

ini­ti­ated and we were to com­plete the proof.

Apparently, this is how I would sound if I ever took up speak­ing gib­ber­ish!

...

Read the original on susam.net »

10 219 shares, 7 trendiness

Secret mode

Part of the Accepted! se­ries, ex­plain­ing the up­com­ing Go changes in sim­ple terms. The new run­time/​se­cret pack­age lets you run a func­tion in se­cret mode. After the func­tion fin­ishes, it im­me­di­ately erases (zeroes out) the reg­is­ters and stack it used. Heap al­lo­ca­tions made by the func­tion are erased as soon as the garbage col­lec­tor de­cides they are no longer reach­able.se­cret.Do(func() {

// Generate a ses­sion key and

// use it to en­crypt the data.

This helps make sure sen­si­tive in­for­ma­tion does­n’t stay in mem­ory longer than needed, low­er­ing the risk of at­tack­ers get­ting to it.The pack­age is ex­per­i­men­tal and is mainly for de­vel­op­ers of cryp­to­graphic li­braries, not for ap­pli­ca­tion de­vel­op­ers.Cryp­to­graphic pro­to­cols like WireGuard or TLS have a prop­erty called forward se­crecy”. This means that even if an at­tacker gains ac­cess to long-term se­crets (like a pri­vate key in TLS), they should­n’t be able to de­crypt past com­mu­ni­ca­tion ses­sions. To make this work, ses­sion keys (used to en­crypt and de­crypt data dur­ing a spe­cific com­mu­ni­ca­tion ses­sion) need to be erased from mem­ory af­ter they’re used. If there’s no re­li­able way to clear this mem­ory, the keys could stay there in­def­i­nitely, which would break for­ward se­crecy.In Go, the run­time man­ages mem­ory, and it does­n’t guar­an­tee when or how mem­ory is cleared. Sensitive data might re­main in heap al­lo­ca­tions or stack frames, po­ten­tially ex­posed in core dumps or through mem­ory at­tacks. Developers of­ten have to use un­re­li­able hacks” with re­flec­tion to try to zero out in­ter­nal buffers in cryp­to­graphic li­braries. Even so, some data might still stay in mem­ory where the de­vel­oper can’t reach or con­trol it.The so­lu­tion is to pro­vide a run­time mech­a­nism that au­to­mat­i­cally erases all tem­po­rary stor­age used dur­ing sen­si­tive op­er­a­tions. This will make it eas­ier for li­brary de­vel­op­ers to write se­cure code with­out us­ing workarounds.Add the run­time/​se­cret pack­age with Do and Enabled func­tions:// Do in­vokes f.

// Do en­sures that any tem­po­rary stor­age used by f is erased in a

// timely man­ner. (In this con­text, f” is short­hand for the

// en­tire call tree ini­ti­ated by f.)

// - Any reg­is­ters used by f are erased be­fore Do re­turns.

// - Any stack used by f is erased be­fore Do re­turns.

// - Any heap al­lo­ca­tion done by f is erased as soon as the garbage

// col­lec­tor re­al­izes that it is no longer reach­able.

// - Do works even if f pan­ics or calls run­time.Goexit. As part of

// that, any panic raised by f will ap­pear as if it orig­i­nates from

// Do it­self.

func Do(f func())

// Enabled re­ports whether Do ap­pears any­where on the call stack.

func Enabled() bool

The cur­rent im­ple­men­ta­tion has sev­eral lim­i­ta­tions:Only sup­ported on linux/​amd64 and linux/​ar­m64. On un­sup­ported plat­forms, Do in­vokes f di­rectly.Pro­tec­tion does not cover any global vari­ables that f writes to.Try­ing to start a gor­ou­tine within f causes a panic.If f calls run­time.Goexit, era­sure is de­layed un­til all de­ferred func­tions are ex­e­cuted.Heap al­lo­ca­tions are only erased if ➊ the pro­gram drops all ref­er­ences to them, and ➋ then the garbage col­lec­tor no­tices that those ref­er­ences are gone. The pro­gram con­trols the first part, but the sec­ond part de­pends on when the run­time de­cides to act.If f pan­ics, the pan­icked value might ref­er­ence mem­ory al­lo­cated in­side f. That mem­ory won’t be erased un­til (at least) the pan­icked value is no longer reach­able.Pointer ad­dresses might leak into data buffers that the run­time uses for garbage col­lec­tion. Do not put con­fi­den­tial in­for­ma­tion into point­ers.The last point might not be im­me­di­ately ob­vi­ous, so here’s an ex­am­ple. If an off­set in an ar­ray is it­self se­cret (you have a data ar­ray and the se­cret key al­ways starts at data[100]), don’t cre­ate a pointer to that lo­ca­tion (don’t cre­ate a pointer p to &data[100]). Otherwise, the garbage col­lec­tor might store this pointer, since it needs to know about all ac­tive point­ers to do its job. If some­one launches an at­tack to ac­cess the GCs mem­ory, your se­cret off­set could be ex­posed.The pack­age is mainly for de­vel­op­ers who work on cryp­to­graphic li­braries. Most apps should use higher-level li­braries that use se­cret.Do be­hind the scenes.As of Go 1.26, the run­time/​se­cret pack­age is ex­per­i­men­tal and can be en­abled by set­ting GOEXPERIMENT=runtimesecret at build time.Use se­cret.Do to gen­er­ate a ses­sion key and en­crypt a mes­sage us­ing AES-GCM:// Encrypt gen­er­ates an ephemeral key and en­crypts the mes­sage.

// It wraps the en­tire sen­si­tive op­er­a­tion in se­cret.Do to en­sure

// the key and in­ter­nal AES state are erased from mem­ory.

func Encrypt(message []byte) ([]byte, er­ror) {

var ci­pher­text []byte

var encErr er­ror

se­cret.Do(func() {

// 1. Generate an ephemeral 32-byte key.

// This al­lo­ca­tion is pro­tected by se­cret.Do.

key := make([]byte, 32)

if _, err := io.Read­Full(rand.Reader, key); err != nil {

encErr = err

re­turn

// 2. Create the ci­pher (expands key into round keys).

// This struc­ture is also pro­tected.

block, err := aes.New­Ci­pher(key)

if err != nil {

encErr = err

re­turn

gcm, err := ci­pher.NewGCM(block)

if err != nil {

encErr = err

re­turn

nonce := make([]byte, gcm.Non­ce­Size())

if _, err := io.Read­Full(rand.Reader, nonce); err != nil {

encErr = err

re­turn

// 3. Seal the data.

// Only the ci­pher­text leaves this clo­sure.

ci­pher­text = gcm.Seal(nonce, nonce, mes­sage, nil)

re­turn ci­pher­text, encErr

Note that se­cret.Do pro­tects not just the raw key, but also the ci­pher.Block struc­ture (which con­tains the ex­panded key sched­ule) cre­ated in­side the func­tion.This is a sim­pli­fied ex­am­ple, of course — it only shows how mem­ory era­sure works, not a full cryp­to­graphic ex­change. In real sit­u­a­tions, the key needs to be shared se­curely with the re­ceiver (for ex­am­ple, through key ex­change) so de­cryp­tion can work.★ Sub­scribe to keep up with new posts.Gist of Go: Concurrency is out! →

...

Read the original on antonz.org »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.