10 interesting stories served every morning and every evening.

Hyundai takes full control of Boston Dynamics as SoftBank exits for $325 million

startupfortune.com

Hyundai’s move to buy SoftBank’s re­main­ing 9.65% stake in Boston Dynamics for $325 mil­lion is not just cleanup from an old deal. It gives Hyundai full con­trol of one of the few hu­manoid ro­bot­ics com­pa­nies with real fac­tory work in sight.

Hyundai Motor Group is ex­pected to ap­prove the pur­chase on June 22, clos­ing out SoftBank’s last piece of Boston Dynamics and turn­ing the Waltham, Massachusetts ro­bot­ics com­pany into a wholly owned Hyundai busi­ness. The price is $325 mil­lion for the re­main­ing stake, ac­cord­ing to the deal terms, and it fol­lows the put op­tion SoftBank re­tained when Hyundai bought con­trol of Boston Dynamics in 2021.

You should read that as a sig­nal, not a foot­note. Hyundai paid about $880 mil­lion for an 80% stake in Boston Dynamics in the 2021 trans­ac­tion, valu­ing the com­pany at roughly $1.1 bil­lion at the time. SoftBank had bought Boston Dynamics from Alphabet in 2017, af­ter Google had ac­quired the ro­bot­ics lab in 2013. It was a strange own­er­ship path for a com­pany whose ro­bots be­came fa­mous on YouTube long be­fore they be­came ob­vi­ous com­mer­cial prod­ucts.

That part is chang­ing. At CES in Las Vegas on January 5, 2026, Hyundai and Boston Dynamics showed the elec­tric Atlas hu­manoid ro­bot in pub­lic, with the Associated Press re­port­ing that the life-sized ro­bot stood up, walked around the stage and was re­motely pi­loted for the demon­stra­tion. The use­ful de­tail was not the stage­craft. It was the de­ploy­ment plan. A pro­duc­tion ver­sion of Atlas is ex­pected to be­gin work at Hyundai’s elec­tric ve­hi­cle plant near Savannah, Georgia, by 2028.

Boston Dynamics has spent years mak­ing ro­bots that looked too good to be busi­nesses. Spot, its four-legged ro­bot, be­came the first ob­vi­ous com­mer­cial suc­cess. Atlas is the harder test be­cause hu­manoid ro­bots have to jus­tify them­selves in places where tra­di­tional au­toma­tion al­ready ex­ists. Business Insider re­ported in January that Boston Dynamics CEO Robert Playter said Atlas would need to learn new fac­tory tasks in a day or two and reach 99.9% re­li­a­bil­ity be­fore it could be truly use­ful on the floor. That’s a high bar. It’s also the right one.

Hyundai’s ad­van­tage is that it does­n’t have to imag­ine the first cus­tomer. It owns the fac­to­ries, the ve­hi­cle pro­grams and now the whole ro­bot­ics com­pany. The Verge re­ported from CES that Hyundai plans to start Atlas with parts se­quenc­ing at its Metaplant in Georgia, then move to­ward heav­ier and more com­plex op­er­a­tions by 2030. If you’re build­ing ro­bots for the phys­i­cal world, that kind of con­trolled de­ploy­ment mat­ters more than a per­fect demo video.

The sup­ply chain is part of the story too. Hyundai Mobis, the group’s com­po­nents arm, has been tied to ac­tu­a­tor pro­duc­tion for Atlas, which keeps one of the ro­bot’s most im­por­tant hard­ware sys­tems closer to Hyundai’s own in­dus­trial base. Frankly, that is the dif­fer­ence be­tween treat­ing ro­bot­ics as a side bet and treat­ing it as a man­u­fac­tur­ing ca­pa­bil­ity. A hu­manoid ro­bot is only as use­ful as the parts, ser­vice net­work and pro­duc­tion dis­ci­pline be­hind it.

The field around Boston Dynamics is no longer sleepy. Tesla has shifted part of its Fremont fac­tory story to­ward Optimus af­ter end­ing Model S and Model X pro­duc­tion, a move re­ported by Axios and The Verge ear­lier this year. Figure AI has pushed hu­manoid ro­bots into BMW fac­tory tri­als. Unitree has made lower-cost hu­manoids im­pos­si­ble to ig­nore. None of those com­pa­nies has Boston Dynamics’ long record in lo­co­mo­tion, but they don’t need to. They need to make ro­bots cheap enough, use­ful enough and re­li­able enough to win spe­cific jobs.

That is why full own­er­ship mat­ters for Hyundai. Boston Dynamics does­n’t have to beat every hu­manoid ri­val in every mar­ket. It has to make Atlas work in­side Hyundai plants first, where the tasks are known, the lay­out is con­trolled and the pay­off can be mea­sured in pro­duc­tion up­time rather than con­fer­ence ap­plause. If it works there, Hyundai gets a ro­bot­ics plat­form and a proof point at the same time.

SoftBank has moved on to a big­ger AI bet

For Masayoshi Son, the Boston Dynamics exit looks small be­side SoftBank’s cur­rent AI in­fra­struc­ture cam­paign. The Wall Street Journal re­ported in April that SoftBank is form­ing Roze AI, a new ven­ture meant to use ar­ti­fi­cial in­tel­li­gence and ro­bot­ics to build phys­i­cal in­fra­struc­ture, in­clud­ing data cen­ters. Tom’s Hardware, cit­ing the Financial Times, re­ported that Son is aim­ing for a $100 bil­lion val­u­a­tion for Roze and a pub­lic list­ing as soon as this year.

That puts the $325 mil­lion Boston Dynamics pro­ceeds in per­spec­tive. SoftBank is not walk­ing away from ro­bot­ics as an idea. It is mov­ing to­ward ro­bots as part of the AI build­out, tied to data cen­ters, en­ergy, land and con­struc­tion. Boston Dynamics is a prod­uct com­pany with hard en­gi­neer­ing prob­lems and a slower rev­enue curve. Son now wants the in­fra­struc­ture layer.

Hyundai wants the ro­bot on the fac­tory floor. That is a nar­rower bet, but it is eas­ier to judge. By 2028, Atlas is sup­posed to be do­ing real work in Georgia, not just walk­ing across a stage in Las Vegas. If Hyundai can turn that into re­peat­able man­u­fac­tur­ing value, the SoftBank exit will look less like a tidy cleanup and more like the mo­ment Hyundai stopped bor­row­ing a ro­bot­ics fu­ture and de­cided to own it out­right.

Also read: Texas just rewrote the rules for con­nect­ing AI data cen­ters to its power grid • Elastic’s $85 mil­lion bet on DeductiveAI is a sig­nal that AI-native ops tool­ing is now ac­qui­si­tion cur­rency • The U.S. gov­ern­ment just told ASML one of its most re­stricted ma­chines may be in­side China

Project Valhalla, Explained: How a Decade of Work Arrives in JDK 28 - JVM Weekly vol. 180

www.jvm-weekly.com

On June 15, Oracle en­gi­neer Lois Foltan con­firmed what a good chunk of the in­dus­try had stopped be­liev­ing: JEP 401: Value Classes and Objects will be in­te­grated into the main OpenJDK repos­i­tory and is tar­get­ing JDK 28.

The change is so large that the re­main­ing com­mit­ters were asked to hold off on big­ger com­mits dur­ing the in­te­gra­tion. The pull re­quest alone adds over 197 thou­sand lines of code across 1,816 files.

Before we pop the cham­pagne, though: this is pre­view, dis­abled by de­fault, and, as Brian Goetz was quick to cool every­one down, only the first part of Valhalla.” Goetz added a great ob­ser­va­tion that the they’ll never ship it” crowd will now smoothly switch over to but they did­n’t ship the most im­por­tant part” (and a joke has been go­ing around the com­mu­nity for years that we’ll sooner end up in Valhalla our­selves, the Norse-afterlife one, than the pro­ject ships).

You have to earn your own haters.

So this is a good mo­ment to tell the whole story. This is­sue is one big deep-dive, writ­ten on the as­sump­tion that you’ve never fol­lowed the work on Valhalla be­fore: from the 2014 prob­lem, through the evo­lu­tion of ideas (a fair num­ber of which ended up in the trash), all the way to what ex­actly we’ll be get­ting our hands on in JDK 28. Brew your­self a cof­fee. I’ve been sit­ting on this edi­tion for a long time, sav­ing it for ex­actly this oc­ca­sion.

The slo­gan Valhalla has car­ried from the start is: codes like a class, works like an int.” In a sin­gle sen­tence it cap­tures the whole point of the pro­ject: we want to write nor­mal, read­able classes with meth­ods, con­struc­tor val­i­da­tion, and sen­si­ble field names, but we want the JVM to be able to treat them as ef­fi­ciently as prim­i­tives.

To un­der­stand why this is a prob­lem, you have to go back to Java’s foun­da­tion. In this lan­guage, with the ex­cep­tion of the eight prim­i­tives (int, long, dou­ble, boolean, and the rest), every­thing is a ref­er­ence type. When you write Point p = new Point(1, 2), the vari­able p is­n’t a point. The vari­able p is a pointer, a coat-check num­ber: some­where on the heap sits an ob­ject, and you’re hold­ing a slip of pa­per with its ad­dress. Every time you want to read a field, the JVM has to go to the coat check,” per­form­ing a hop through the pointer (pointer in­di­rec­tion).

For a sin­gle ob­ject, that’s noth­ing. The prob­lem starts at scale. Every ob­ject on the heap has its own header (a dozen-or-so bytes of meta­data: among other things, so the JVM knows what type it is and whether any­one is syn­chro­niz­ing on it). Incidentally, this is ex­actly the prob­lem Project Lilliput has been tack­ling lately, help­ing to shrink ob­ject header sizes. But header size is­n’t every­thing. Every ob­ject has to be al­lo­cated, and later garbage col­lected. And since ob­jects are scat­tered across the heap, an ar­ray of a mil­lion Points is in prac­tice a mil­lion slips of pa­per point­ing at a mil­lion boxes strewn across the whole ware­house.

Brian Goetz, in his State of Valhalla” doc­u­ments, calls such a mem­ory lay­out fluffy”: puffed up, bloated. What we dream of is a dense lay­out, one where the data lies side by side.

Why does den­sity mat­ter? Because the hard­ware changed faster than Java did. In 1995, a mem­ory ac­cess cost roughly the same as a CPU op­er­a­tion. Today the CPU is two or­ders of mag­ni­tude faster than main mem­ory, and the whole gap is bridged by the cache. The proces­sor reads mem­ory in chunks called cache lines (usually 64 bytes). If the data lies densely and in or­der, one such chunk brings in a ton of use­ful val­ues at once. If we’re hop­ping across point­ers, every ac­cess risks a cache miss, and that can be a hun­dred times slower than a hit. This is lo­cal­ity of ref­er­ence, and it’s the real stake in this whole game.

But the JVM has es­cape analy­sis,” some­one sharp will say. True: the vir­tual ma­chine can rec­og­nize that some ob­ject never escapes” be­yond a lo­cal frag­ment of code, and then it does­n’t al­lo­cate it at all. From the pro­gram­mer’s point of view it looks as if the ob­ject ex­ists, but in re­al­ity its fields get spread out into or­di­nary vari­ables or CPU reg­is­ters. In the best case, the cost of al­lo­ca­tion and the later cleanup by the garbage col­lec­tor drops to prac­ti­cally zero.

The trou­ble is that this op­ti­miza­tion is un­pre­dictable and frag­ile. It works only when the JIT com­piler can trace the ob­jec­t’s en­tire flow with high con­fi­dence. But all it takes is for the ob­ject to land in a field of an­other class, get stored in an ar­ray, get passed into a more com­plex method, or ap­pear be­yond the bound­ary of code the JIT can an­a­lyze, and the whole trick stops work­ing. The source code stays iden­ti­cal, but the per­for­mance be­hav­ior can change dra­mat­i­cally.

This is pre­cisely why ex­pe­ri­enced JVM pro­gram­mers treat es­cape analy­sis as a nice bonus, not a pro­jec­t’s foun­da­tion. If an ap­pli­ca­tion’s per­for­mance de­pends on whether a par­tic­u­lar JIT ver­sion man­ages to ap­ply this op­ti­miza­tion, it’s very easy to fall into the trap of hard-to-pre­dict re­gres­sions. A mi­nor refac­tor, a JDK up­date, or a change in code struc­ture can send ob­jects back onto the heap, and the costs of al­lo­ca­tion and garbage-col­lec­tor work re­turn in full force.

That leaves the brute-force op­tion: give up on ob­jects and en­code the data by hand. Instead of a Color class, hold three bytes r, g, b. This is­n’t just an aca­d­e­mic ex­am­ple. The ap­proach has been used for years in game en­gines, graph­ics li­braries, im­age-pro­cess­ing sys­tems, data­bases, an­a­lyt­ics en­gines, and HPC code, where every byte of mem­ory and every al­lo­ca­tion mat­ters. The trou­ble is that the speed comes at the cost of safety and read­abil­ity. We lose names, pri­vate state, val­i­da­tion, and meth­ods. JEP 401 gives a sim­ple ex­am­ple: a de­vel­oper work­ing on raw” color bytes might mis­tak­enly in­ter­pret them as BGR in­stead of RGB, swap red with blue, and qui­etly cor­rupt the en­tire im­age. A class would­n’t have al­lowed it. A bare int? Sure it would.

And it’s ex­actly this di­chotomy, ei­ther con­ve­nient classes, or fast prim­i­tives, that Valhalla is try­ing to erase.

Officially, Project Valhalla started in 2014. James Gosling de­scribed it at the time as six PhDs tied into a sin­gle knot,” and that was no ex­ag­ger­a­tion. Interestingly, the idea is older than the pro­ject it­self: Java’s cre­ators wanted value types as early as the first ver­sion of the lan­guage, but in 1995 they gave up, be­cause the prob­lem was too hard.

The goal was set am­bi­tiously: to re­store align­ment be­tween the pro­gram­ming model and the per­for­mance char­ac­ter­is­tics of mod­ern hard­ware. In other words, to let pro­gram­mers de­clare their own types that are flat and dense in mem­ory like prim­i­tives, but look and be­have like nor­mal classes.

Easier said than done. Over the fol­low­ing years the team built five dif­fer­ent pro­to­types, each prob­ing a dif­fer­ent as­pect of the prob­lem. And this is where the most in­ter­est­ing part of the story be­gins, be­cause to ap­pre­ci­ate Valhalla’s cur­rent shape, you have to see how many ideas died along the way.

The early pro­to­types went in a di­rec­tion we now call Q World.” It as­sumed that the new value types were a fun­da­men­tally dif­fer­ent beast from ob­jects, with sep­a­rate type de­scrip­tors, sep­a­rate byte­codes, and sep­a­rate top types, ex­actly like prim­i­tives. Sounds log­i­cal: if they’re sup­posed to work like int, let them be rep­re­sented like int. The trou­ble is that such a sep­a­ra­tion flooded the en­tire JVM type sys­tem with ex­tra com­plex­ity: every­thing had to be done in two vari­ants.

The break­through came with a pro­to­type chris­tened L World” (roughly around 2019). The name comes from the fact that value types started shar­ing the same L car­rier” (the L de­scrip­tor, the same one the JVM uses for or­di­nary ref­er­ences) with ob­ject ref­er­ences. The team ex­pected such a uni­fi­ca­tion to be too hard, and yet, to their own sur­prise, it worked with­out ma­jor com­pro­mises and in­ci­den­tally solved a whole pile of prob­lems from the ear­lier rounds.

L World pro­duced one more fun­da­men­tal aha” that shaped every­thing that came af­ter: the lan­guage model and the JVM model don’t have to over­lap one hun­dred per­cent. L World is the right model for the vir­tual ma­chine, but you can treat it as a trans­la­tion tar­get and of­fer the pro­gram­mer some­thing more con­ve­nient in the lan­guage. This sep­a­ra­tion of lay­ers turned out to be the key to the rest of the pro­ject.

That’s also when the plan to split the work into two phases crys­tal­lized: first value classes (still called some­thing else at the time, more on that shortly), and only then, spe­cial­ized gener­ics. We’ll come back to gener­ics in sec­tion 6, be­cause that’s a sep­a­rate, longer trea­tise.

If you’ve ever tried to read about Valhalla and bounced off a wall of con­tra­dic­tory terms, it’s not your fault. The nam­ing changed sev­eral times here, and not cos­met­i­cally: be­hind each name change stood a change in the model. Let’s trace it, be­cause it’s the best il­lus­tra­tion of how this fea­ture was de­signed.

Stage 1: value types: The ear­li­est term. Vague, be­cause it was­n’t yet clear what ex­actly these things were sup­posed to be.

Stage 2: in­line classes: Around 2019 – 2020 a dis­tinc­tion set­tled in that has sur­vived to this day in its essence: classes split into iden­tity classes (the ones with iden­tity, that is, every­thing we’ve known un­til now) and the new in­line classes (without iden­tity). That’s when the slo­gan codes like a class, works like an int” was coined, and the ba­sic con­straints were set: in­line classes are fi­nal by de­fault, their fields are fi­nal, you can’t syn­chro­nize on them.

Stage 3: primitive classes” and the two-pro­jec­tion model. And here it gets in­ter­est­ing, be­cause this is ex­actly the idea that got sig­nif­i­cantly cut down. In the 2021 State of Valhalla” doc­u­ments, Valhalla promised three things: value ob­jects, prim­i­tive classes, and spe­cial­ized gener­ics. The idea for a primitive class” was that a sin­gle type would have two pro­jec­tions: a value vari­ant (flat, never null, be­hav­ing like a prim­i­tive) and a ref­er­ence vari­ant (a box that al­lows null). Across var­i­ous it­er­a­tions this was writ­ten as Point.val/Point.ref, and later they ex­per­i­mented with the Point! and Point? syn­tax.

The model was pow­er­ful, but also men­tally heavy. A pro­gram­mer would have to jug­gle two forms of the same type day to day and un­der­stand when a con­ver­sion be­tween them hap­pens. The team, faith­ful to the les­son simplify the model for the user, even at the cost of the per­for­mance ceil­ing,” ul­ti­mately dis­man­tled this du­al­ism.

Stage 4 (today): value classes” and value ob­jects.” The cur­rent JEP 401, au­thored by Dan Smith (reviewer: Brian Goetz), puts it sim­ply. There’s one new thing: a value class, de­clared with the value mod­i­fier. Its in­stances are value ob­jects: ob­jects with­out iden­tity. And (this is key) a value class is still a ref­er­ence type. The whole tricky busi­ness of non-nul­la­bil­ity has been split off into a sep­a­rate, op­tional JEP (Null-Restricted Value Class Types), which we’ll get to. So in­stead of one com­pli­cated con­cept we have two sim­ple, or­thog­o­nal ones: does it have iden­tity?” and, sep­a­rately, for later, does it al­low null?”

Worth re­mem­ber­ing, be­cause if you come across an older ar­ti­cle (or Baeldung de­scrib­ing primitive classes” as a sep­a­rate mech­a­nism), you’re read­ing about an out­dated model. In the OpenJDK canon, primitive classes” in that sense no longer ex­ist.

More things fell along the way. The orig­i­nal Value Objects” JEP draft was with­drawn and re­placed by JEP 401. The orig­i­nal Universal Generics” draft also went back for re­work. JEP 401 is ac­com­pa­nied by JEP 402: Enhanced Primitive Boxing (also pre­view), plus a whole se­ries of early-ac­cess builds (LW1, LW2, LW3…) and talks from the JVM Language Summit, among them Frédéric Parain on heap flat­ten­ing and Daniel Smith on the new ob­ject-ini­tial­iza­tion model.

The moral of this sec­tion is this: twelve years was­n’t twelve years of writing code.” It was twelve years of re­ject­ing ideas, un­til the one that can ac­tu­ally be main­tained was left.

Let’s get to specifics. Here’s ex­actly what we get.

Declaration. You cre­ate a value class by adding the value mod­i­fier:

value class USDCurrency im­ple­ments Comparable<USDCurrency> { pri­vate int cents; // im­plic­itly fi­nal pub­lic USDCurrency(int dol­lars, int cents) { this.cents = dol­lars * 100 + cents; }

pub­lic USDCurrency plus(US­D­Cur­rency that) { re­turn new USDCurrency(0, this.cents + that.cents); }

// dol­lars(), cents(), com­pareTo(), toString()… }

It can also be a value record. The rules: all in­stance fields are im­plic­itly fi­nal, meth­ods may not be syn­chro­nized, the class is fi­nal by de­fault (or it can form a hi­er­ar­chy com­posed of value classes and ab­stract value classes), it can’t in­herit from a class with iden­tity, but it hap­pily im­ple­ments in­ter­faces. Beyond these con­straints, it’s an or­di­nary class.

The defin­ing trait: no iden­tity. This is the crux. An or­di­nary ob­ject has iden­tity: two sep­a­rately cre­ated new Point(1,2) are two dif­fer­ent ob­jects, even if they have iden­ti­cal con­tents. A value ob­ject has no iden­tity, just as there aren’t two different” fours of type int. From this flow all the con­se­quences:

== changes mean­ing. Until now == com­pared iden­tity (whether it’s the same ad­dress). For value ob­jects, == checks sub­sti­tutabil­ity: whether both val­ues are the same class with the same fields, com­pared re­cur­sively (primitive fields bit by bit, ob­ject fields again via ==). That’s why new USDCurrency(3,95) == new USDCurrency(3,95) re­turns true. That’s good news: it ends the fa­mous con­fu­sion with == on Integers. But care­ful: == looks at in­ter­nal state, which is­n’t al­ways what the ob­ject rep­re­sents, so for is this the same data” com­par­isons keep us­ing equals.

== changes mean­ing. Until now == com­pared iden­tity (whether it’s the same ad­dress). For value ob­jects, == checks sub­sti­tutabil­ity: whether both val­ues are the same class with the same fields, com­pared re­cur­sively (primitive fields bit by bit, ob­ject fields again via ==). That’s why new USDCurrency(3,95) == new USDCurrency(3,95) re­turns true. That’s good news: it ends the fa­mous con­fu­sion with == on Integers. But care­ful: == looks at in­ter­nal state, which is­n’t al­ways what the ob­ject rep­re­sents, so for is this the same data” com­par­isons keep us­ing equals.

syn­chro­nized throws. There’s noth­ing to syn­chro­nize on. An at­tempt ends in IdentityException. When you need to force iden­tity, you have the new helpers Objects.requireIdentity and Objects.hasIdentity.

syn­chro­nized throws. There’s noth­ing to syn­chro­nize on. An at­tempt ends in IdentityException. When you need to force iden­tity, you have the new helpers Objects.requireIdentity and Objects.hasIdentity.

And now the most im­por­tant con­cep­tual trap: value ob­jects can STILL be null. This sur­prises every­one who thinks value = like a prim­i­tive = never null.” In the JDK 28 model, value class is a ref­er­ence type, so USDCurrency d = null; is per­fectly le­gal. Non-nullable types (with a null re­stric­tion) are a sep­a­rate, fu­ture JEP. They’re not in JDK 28. We’ll come back to this, be­cause it’s not a de­tail: it’s the key to full per­for­mance.

JEP 401 gives the JVM free­dom thanks to which value ob­jects can be op­ti­mized in two main ways.

Scalarization is a JIT com­piler tech­nique. A ref­er­ence to a value ob­ject gets broken down into its prime fac­tors,” re­duced to its essence, the set of fields, with no wrap­ping. Instead of pass­ing a pointer to Color, the JIT sim­ply passes three bytes r, g, b (plus one flag bit in­di­cat­ing whether the ref­er­ence is­n’t null). Such an ob­ject is in prac­tice free: no al­lo­ca­tion, no work for the GC. It’s a bit like es­cape analy­sis, but much more pre­dictable and far-reach­ing: it works even across the bound­aries of method calls the JIT did­n’t in­line. The lim­i­ta­tion: scalar­iza­tion usu­ally won’t work when a vari­able has a type that is a su­per­type of the value class (e.g. Object or, im­por­tantly, an erased generic pa­ra­me­ter). Then the ob­ject has to be ma­te­ri­al­ized on the heap.

Heap flat­ten­ing is the sec­ond mech­a­nism. The ob­jec­t’s essence gets en­coded as a com­pact bit vec­tor and writ­ten di­rectly into a field or an ar­ray cell, with­out a pointer to an­other place in mem­ory. This is ex­actly where den­sity and lo­cal­ity are born.

There’s a catch worth know­ing about here, though: flat­tened data has to be read­able and writable atom­i­cally (otherwise it risks tearing” un­der con­cur­rent ac­cess). On typ­i­cal plat­forms, small enough” to­day means as lit­tle as 64 bits, in­clud­ing the null flag. That’s why many small value classes will flat­ten beau­ti­fully, but a class with, say, two int fields or one dou­ble may not fit in an atomic write and end up as an or­di­nary ob­ject on the heap any­way. In the fu­ture, 128-bit en­cod­ings will ar­rive, and the afore­men­tioned JEP about null-re­stricted types will al­low flat­ten­ing larger classes in ex­change for giv­ing up the atom­ic­ity guar­an­tee. This is ex­actly the mo­ment where non-nul­la­bil­ity stops be­ing cos­met­ics and be­comes a lever for per­for­mance.

Remember the age-old cost of box­ing, wrap­ping int in Integer? In the new model, the wrap­per classes them­selves be­come value classes (when pre­view is on, Integer, Long, Double, and com­pany lose their iden­tity). Since the box no longer has iden­tity, the JVM can scalar­ize and flat­ten it. The ef­fect: Integer[] starts ap­proach­ing the ef­fi­ciency of int[], and the box­ing over­head, to quote JEP 401, shrinks dra­mat­i­cally. The ac­com­pa­ny­ing JEP 402 (Enhanced Primitive Boxing) goes fur­ther and smooths out con­ver­sions be­tween prim­i­tives and their boxes, open­ing the road to writ­ing things like List<int>. But that’s a sep­a­rate, still-ma­tur­ing piece, so don’t as­sume it’ll roll in com­plete along­side 401.

This is where the ef­fect shows best. Instead of hold­ing a mil­lion point­ers to a mil­lion scat­tered ob­jects, a Color[] ar­ray can store di­rectly flat­tened, 32-bit en­cod­ings of suc­ces­sive col­ors (again: plus a null flag). From a mem­ory stand­point, such an ar­ray starts to look and act like a plain int[]: a con­tigu­ous block of data the proces­sor sweeps through se­quen­tially, cache line by cache line.

For all of this to work, some re­ally deep foun­da­tions were moved: the new value mod­i­fier; strict con­struc­tion rules (all fields must be set be­fore any­thing gets to see the new ob­ject, in prac­tice be­fore the su­per() call, so that a mutation” of fi­nal fields can never be ob­served); the re­de­f­i­n­i­tion of == as a sub­sti­tutabil­ity test; adding a value-ob­ject check to the ref­er­ence-com­par­i­son byte­code (acmp); the scalar­iza­tion and flat­ten­ing ma­chin­ery; IdentityException; and the mi­gra­tion of ex­ist­ing value-based” classes. In short, this is­n’t syn­tac­tic sugar. It’s a re­build of an as­sump­tion that had been true in Java since 1995: that every ob­ject has iden­tity.

Let’s take the sim­plest pos­si­ble case and trace it so it’s clear even with­out know­ing the JVMs in­ter­nals.

Before Valhalla:

fi­nal class Point { // an or­di­nary class with iden­tity fi­nal int x; fi­nal int y; Point(int x, int y) { this.x = x; this.y = y; } }

Point[] points = new Point[1_000_000];

What’s hap­pen­ing here in mem­ory? The points ar­ray is a mil­lion point­ers. Each pointer leads to a sep­a­rate Point ob­ject ly­ing some­where on the heap. And each such ob­ject is not just its two ints (8 bytes), but also a header (another dozen-or-so bytes of meta­data). These ob­jects are scat­tered: the al­lo­ca­tor cre­ated them at dif­fer­ent mo­ments, in dif­fer­ent places. When you it­er­ate over the ar­ray and sum the co­or­di­nates, for each point the proces­sor has to: read the pointer from the ar­ray, jump to the in­di­cated ad­dress (risk of a cache miss), read the fields. A mil­lion times. This is ex­actly that fluffy” lay­out from sec­tion 1.

After Valhalla:

value class Point { // a value class with­out iden­tity fi­nal int x; fi­nal int y; Point(int x, int y) { this.x = x; this.y = y; } }

Point[] points = new Point[1_000_000];

The dif­fer­ence in the code is ex­actly one word: value. But the dif­fer­ence in mem­ory is fun­da­men­tal. The JVM can now store the val­ues them­selves in the ar­ray, laid out densely one af­ter an­other: 8 bytes per point (plus a pos­si­ble null flag), in a con­tigu­ous block. No head­ers per el­e­ment. No point­ers. No jump­ing around the heap.

[IMAGE: the same Point[] ar­ray in two vari­ants: before” (an ar­ray of ar­rows → scat­tered boxes with head­ers) and after” (a uni­form strip of num­ber pairs)]

When you now it­er­ate over the ar­ray, the proces­sor reads the data se­quen­tially. Each 64-byte cache line im­me­di­ately brings in sev­eral com­plete points. Summing a mil­lion co­or­di­nates run­ning at mem­ory-band­width speed, in­stead of chok­ing on misses. On data-in­ten­sive code that can be a dif­fer­ence of mul­ti­ples, not per­cent­ages.

And, most im­por­tant for main­tain­abil­ity, you did­n’t pay for it with ab­strac­tion. Point is still a class: it has a name, it has a con­struc­tor, it could have val­i­da­tion (if (x < 0) throw …), it could have meth­ods. You don’t have to, like be­fore, split points into two raw int[] xs and int[] ys ar­rays and pray you never mix up the in­dices. You got the den­sity of a prim­i­tive and the read­abil­ity of a class. That’s the whole of Project Valhalla in a sin­gle ex­am­ple.

This is the sec­ond half of Valhalla, and hon­estly the harder one. Let’s start with the source of the prob­lem.

Java im­ple­ments gener­ics through type era­sure. In prac­tice: List<String> and List<Integer> are, at run­time, the same or­di­nary List, and the type pa­ra­me­ter T is erased to Object. This of­ten gets mocked, but it’s worth know­ing it was a de­lib­er­ate, de­fen­si­ble de­ci­sion, not lazi­ness. Erasure gave Java grad­ual mi­gra­tion com­pat­i­bil­ity: you could take an ex­ist­ing, non-generic class and make it generic with­out break­ing a sin­gle ex­ist­ing source file or com­piled class, and clients could mi­grate right away, later, or never. In 2004, when Java al­ready had a huge code­base, the al­ter­na­tive (”here are gener­ics, but throw out all your li­braries”) would have been a ter­ri­ble deal. Today it would be even worse.

The trou­ble is that era­sure clashes with Valhalla ex­actly where we’d care most about per­for­mance. Since T erases to Object, a value ob­ject put into a List<Point> has to be ma­te­ri­al­ized as an or­di­nary ob­ject on the heap. In other words: your beau­ti­ful, flat­ten­able Point in a generic col­lec­tion loses its flat­ten­ing: the con­tainer holds ref­er­ences, not flat val­ues. All the den­sity you gained in Point[] evap­o­rates in ArrayList<Point>.

The re­pair plan is, like all of Valhalla, two-phase:

Phase 1: Universal Generics. This is a change at the lan­guage level: it lets type vari­ables also cover value types, that is, so you can even ex­press some­thing like ArrayList<Point> or List<int>. For now still through era­sure. The pro­gram­mer will feel it mainly as new com­piler warn­ings about null pol­lu­tion,” be­cause a field of type T starts out as null by de­fault, even if T is a value type. Addressing these warn­ings makes APIs specialization-ready.”

Phase 2: Specialized Generics. These are the fu­ture JVM ex­ten­sions that will gen­er­ate het­ero­ge­neous, spe­cial­ized class lay­outs for con­crete type ar­gu­ments (in the pro­jec­t’s jar­gon: species and type re­stric­tions). Only then will ArrayList<Point> re­ally be backed by flat mem­ory. This is the piece that’s still largely re­search work.

The con­se­quences for li­braries and frame­works are enor­mous, and that’s ex­actly why it’s hap­pen­ing grad­u­ally. Ultimately, col­lec­tions, streams, and en­tire APIs can be­come flat and al­lo­ca­tion-free over value types. But li­brary au­thors will have to ad­dress the new warn­ings and de­sign with spe­cial­iza­tion in mind. Let’s be hon­est: the orig­i­nal Universal Generics draft went through re­work, and the full re­ward from spe­cial­iza­tion is a mat­ter of fu­ture re­leases. JDK 28 does­n’t bring it.

Let’s gather this in one place, be­cause it’s easy to get lost be­tween it’s here al­ready!” and it’s not here yet.”

What got ac­cepted: JEP 401 (Value Classes and Objects) as a pre­view fea­ture, tar­get­ing JDK 28 (release in March 2027), with in­te­gra­tion into main­line planned for roughly July 2026. 197 thou­sand lines, 1,816 files, co­or­di­na­tion on Lois Foltan’s side, the re­quest to other com­mit­ters to hold off on large changes. Disabled by de­fault: to play with the syn­tax, you have to flip on –enable-preview.

What ac­tu­ally reaches users: the abil­ity to de­clare value class and value record; mi­gra­tion of ex­ist­ing value-based” classes in the JDK (among them the prim­i­tive wrap­pers like Integer) to value classes un­der pre­view; scalar­iza­tion and flat­ten­ing for qual­i­fy­ing classes; cheaper box­ing.

What can still evolve, and what’s NOT in 28: null-re­stricted types (non-nullable); full spe­cial­ized gener­ics; 128-bit en­cod­ings; a fully ma­ture JEP 402. And the syn­tax it­self, be­cause this is pre­view, and that’s ex­actly what’s ex­pected of it: that it can change from re­lease to re­lease in re­sponse to feed­back. Hence Goetz’s quote about only the first part.”

How it might af­fect the ecosys­tem: for high-per­for­mance Java (data, vec­tor com­pu­ta­tion, ML, gamedev, fi­nance, codecs) this is the path to dense data with­out giv­ing up ab­strac­tion, which is ex­actly what some of these do­mains have been wait­ing years for. Frameworks and li­braries will start mi­grat­ing their value-based classes. You’ll also have to watch out for a long tail of be­hav­ioral sur­prises around == and syn­chro­nized in code that (knowingly or not) re­lied on iden­tity. And one more thing worth keep­ing in mind when plan­ning: JDK 28 is not an LTS re­lease: the next LTS will prob­a­bly be JDK 29 in September 2027. So most com­pa­nies will meet a sta­bi­lized Valhalla only at the LTS, but it’s pre­cisely the pre­view in 28 that kicks off the real feed­back loop with ac­tual code. If you’re work­ing on some­thing that could ben­e­fit from this, now is the mo­ment to start ex­per­i­ment­ing and sub­mit­ting feed­back.

Why do I call this one of the biggest changes in the plat­for­m’s his­tory? Because Valhalla does­n’t bolt yet an­other fea­ture onto the lan­guage; it moves its deep­est as­sump­tion. Every ob­ject has iden­tity” had been true in Java since 1995; it’s the foun­da­tion every­thing else stood on. Letting the pro­gram­mer opt out of that as­sump­tion (choose which ob­jects need iden­tity and which don’t) is­n’t a refac­tor, it’s a shift of the foun­da­tion. And that’s ex­actly why it un­locks a whole decade of fur­ther work: uni­fy­ing prim­i­tives and ob­jects, spe­cial­iz­ing gener­ics, denser col­lec­tions, faster nu­mer­ics.

At the same time, and this is the hon­est ver­sion of the head­line, Valhalla rolls into JDK 28“ is a half-truth. It’s the first, pre­view step of a multi-stage roll­out. But it’s pre­cisely this team’s dis­ci­pline (simplify the model for the hu­man, do the hard per­for­mance things as op­tional) that’s the rea­son it took twelve years, and the rea­son it can be shipped at all now.

For us, as pro­gram­mers, one thing to take away mat­ters more than the syn­tax: in­ter­nal­ize the dis­tinc­tion iden­tity ver­sus value. The rest (==, flat­ten­ing, gener­ics) are con­se­quences of that one dis­tinc­tion. And the early-ac­cess builds are al­ready here: you can touch this on your own code be­fore your com­peti­tor does.

1. Is value class just a record? No, they’re two or­thog­o­nal de­ci­sions. record means I give up sep­a­rate in­ter­nal state” (content = com­po­nents). value means I give up iden­tity.” You can have any com­bi­na­tion: an or­di­nary class, a record, a value class, and a value record.

2. Can I com­pare value ob­jects with ==? Yes, but == now means some­thing dif­fer­ent: sub­sti­tutabil­ity, i.e. a com­par­i­son of all fields (recursively), not the ad­dress in mem­ory. For the ques­tion do they rep­re­sent the same data,” it’s still usu­ally bet­ter to use equals, be­cause == looks at in­ter­nal state, which is­n’t al­ways equal to the rep­re­sented state.

3. Can a value class be null? In the JDK 28 model, yes. value class is still a ref­er­ence type. Non-nullable types (with a null re­stric­tion) are a sep­a­rate, fu­ture JEP, and they’re the ones that will un­lock flat­ten­ing of larger value classes. They’re not in JDK 28.

4. Integer be­comes a value class, won’t that break my code? In most cases, no. Binaries still link, and the only new com­pi­la­tion er­rors are at­tempts to syn­chro­nize on such a type. The changes you might no­tice con­cern code that de­pends on iden­tity: == on Integers will start com­par­ing by value, and syn­chro­nized (someInteger) will stop work­ing. If you re­lied on ei­ther of those, it was frag­ile code any­way.

5. Will I get a fast, flat ArrayList<Point>? Not yet. Because of type era­sure, ob­jects in a generic col­lec­tion are ma­te­ri­al­ized on the heap. Flat generic col­lec­tions re­quire uni­ver­sal and spe­cial­ized gener­ics: that’s the fu­ture. In JDK 28, flat­ten­ing works di­rectly for fields and ar­rays of a value type, e.g. for Point[].

6. How is this dif­fer­ent from struct in C#? A struct in C# has iden­tity and mu­ta­tion, so the se­man­tics of copy­ing on as­sign­ment or pass­ing have to be pre­cisely de­fined, which gives a heav­ier model for the pro­gram­mer and less free­dom for the run­time. Value ob­jects in Valhalla have no iden­tity, and the way they’re laid out in mem­ory is left to the JVMs dis­cre­tion. A sim­pler model for the hu­man, more free­dom for the ma­chine.

7. Wasn’t es­cape analy­sis do­ing all of this al­ready? As I al­ready men­tioned, partly. Escape analy­sis can avoid al­lo­cat­ing an ob­ject when it proves the ob­ject does­n’t de­pend on iden­tity, but it’s un­pre­dictable and does­n’t help when the ob­ject lands in a field, in an ar­ray, or escapes” be­yond the op­ti­miza­tion’s reach. Scalarization of value ob­jects is pre­dictable and reaches much fur­ther, in­clud­ing across method-call bound­aries.

8. Do I have to rewrite code to ben­e­fit? For your own classes, it’s usu­ally enough to add the value mod­i­fier to those that rep­re­sent simple do­main val­ues” and don’t rely on iden­tity; the mi­gra­tion is mostly com­pat­i­ble. Some of the gains you’ll even get for free, be­cause it’s the JDK mi­grat­ing its own classes (like the prim­i­tive wrap­pers).

10. When will I see full Valhalla, with gener­ics, non-null types, and the whole rest? In fu­ture re­leases. The team ships it in­cre­men­tally: JDK 28 is the first pre­view of value classes. The full story (specialized gener­ics, null-re­stricted types, 128-bit en­cod­ings) will spread across many re­leases and will most likely sta­bi­lize only around the next LTS.

PS: You’ll find the early-ac­cess builds at jdk.java.net/​val­halla, and that’s prob­a­bly the best way to form your own opin­ion faster than I can write an­other is­sue on the sub­ject.

No posts

DuckDB Internals: Why is DuckDB Fast? (Part 1)

www.greybeam.ai

DuckDB has gone from a re­search pro­ject at CWI Amsterdam in 2019 to one of the most widely adopted data­bases of the past decade. The list of places it shows up is long: note­books, ETL pipelines, dash­boards, CI test run­ners, em­bed­ded an­a­lyt­ics in­side SaaS prod­ucts, even an iPhone run­ning TPC-H at scale fac­tor 100.

Companies have started build­ing real prod­ucts around it. MotherDuck is wrap­ping DuckDB into a cloud data ware­house. BI and data app plat­forms like Hex, Omni, and Evidence use it as an in-app ex­e­cu­tion en­gine and cache. Fivetran’s Managed Data Lake Service uses DuckDB in­side its data-lake writer for merg­ing and com­paction. Rill builds an open-source BI tool on top of it. We use it at Greybeam too, pow­er­ing mil­lions of queries for BI and an­a­lyt­ics work­loads.

What is DuckDB?

DuckDB is an in-process an­a­lyt­i­cal SQL data­base. Analytical means it’s op­ti­mized for the kind of queries that scan mil­lions of rows to fil­ter, ag­gre­gate, and join — not the kind that look up a sin­gle record by pri­mary key. In-process means there’s no server. You don’t con­nect to DuckDB; you load it as a li­brary in­side your pro­gram, the same way you’d load NumPy or Polars.

DuckDB has re­ceived wide­spread adop­tion be­cause it’s just so damn easy to use. It ships as a sin­gle bi­nary un­der 20 MB with no ex­ter­nal de­pen­den­cies. You in­stall it with pip in­stall duckdb, brew in­stall duckdb, or by link­ing lib­duckdb into a C++ pro­ject. It opens any di­rec­tory of Parquet, CSV, or JSON files like they were al­ready a SQL data­base.

DuckDB also hap­pens to be one of the fastest sin­gle-node an­a­lyt­i­cal en­gines avail­able, reg­u­larly hold­ing its own against en­tire clus­ters that cost mil­lions of dol­lars per year.

This is the first post in a three-part deep dive into DuckDB in­ter­nals. We’ll fol­low a query from the mo­ment it en­ters the en­gine to the mo­ment the re­sult is re­turned, and at each stage we’ll look at the de­sign choice that makes it fast.

DuckDB’s speed comes from a hand­ful de­sign choices:

In-process ex­e­cu­tion

Columnar, com­pressed stor­age with zonemaps

Vectorized ex­e­cu­tion

Morsel-driven par­al­lelism

Snapshot iso­la­tion with op­ti­mistic MVCC

And much more!

This post cov­ers the path from your SQL to the mo­ment the en­gine is ready to run the query, plus the stor­age layer the query will read from. By the end you’ll have a clear men­tal model of DuckDB’s setup work and stor­age lay­out. Query ex­e­cu­tion is cov­ered in Part 2 so make sure to sub­scribe!

Queries Run In-Process

You point DuckDB at a 6 GB Parquet file on your lap­top. The re­sults come back in un­der a sec­ond. No clus­ter, no setup, no mi­gra­tion, no CREATE TABLE. How does that work?

SELECT * FROM orders.parquet’;

Most an­a­lyt­i­cal data­bases are servers. Snowflake, Postgres, BigQuery, Redshift. You open a con­nec­tion, send SQL over TCP (a pro­to­col to send data over a net­work), and wait for re­sults to come back. Along the way, every record in the re­sult is se­ri­al­ized into a wire pro­to­col, trans­mit­ted across the net­work, and de­se­ri­al­ized on the other end.

Serializing and Deserializing

Inside a data­base, a query re­sult lives as typed val­ues at spe­cific mem­ory ad­dresses. A 64-bit in­te­ger here, a pointer to a string there. Those ad­dresses only ex­ist in that process. To send the re­sult to a client on an­other ma­chine, the data­base has to rewrite every value into an agreed byte for­mat (Postgres has its own, MySQL has an­other, with ODBC and JDBC as client-side APIs that dri­vers ex­pose on top) so it can be pushed through a TCP socket. The client then parses those bytes back into its own na­tive types. Every value may be touched mul­ti­ple times, once to en­code and once to de­code, and on a large re­sult set, that work of­ten takes longer than the query it­self.

DuckDB is not a server. It’s a li­brary. There is no DuckDB dae­mon, no port, no clus­ter. You load lib­duckdb into your pro­gram and call func­tions di­rectly against it.

In 2017, Mark Raasveldt and Hannes Mühleisen pub­lished Don’t Hold My Data Hostage, a pa­per mea­sur­ing what ac­tu­ally hap­pens when you pull a re­sult set out of a ware­house. They found that the client pro­to­col it­self — ODBC, JDBC, and sim­i­lar row-by-row value APIs — was of­ten the slow­est sin­gle step in the en­tire query, some­times dwarf­ing the time the data­base spent com­put­ing the an­swer.

Two costs drive this. The first is raw band­width: a typ­i­cal gi­ga­bit Ethernet link caps you at around 125 MB/s, and a large re­sult set can take longer to trans­mit than it took to com­pute. The sec­ond is per-value over­head. ODBC and JDBC hand back re­sults one row and one value at a time, which means the client makes a sep­a­rate func­tion call for every field in every row. On a 100-million-row re­sult, that’s hun­dreds of mil­lions of func­tion calls, each one do­ing its own lit­tle mem­ory copy, type check, and string al­lo­ca­tion.

ADBC trans­fers data be­tween sys­tems in colum­nar Arrow for­mat, which avoids the row-by-row se­ri­al­iza­tion/​de­se­ri­al­iza­tion that ODBC and JDBC re­quire. Our friends at Columnar are mak­ing this com­mon­place.

DuckDB side­steps both bot­tle­necks by liv­ing in the same process as the client.

When a Python script runs con.sql(“SE­LECT … FROM my_df”) against a pan­das dataframe, DuckDB can use a fea­ture called a re­place­ment scan. Instead of copy­ing the dataframe into an in­ter­nal table first, DuckDB re­places the table ref­er­ence with a func­tion that reads from the dataframe when the query runs.

In the best case, DuckDB can read the same un­der­ly­ing buffers the Python process al­ready owns, so it avoids ma­te­ri­al­iz­ing a sec­ond full copy of the data. This is zero-copy! If NumPy says here’s a buffer (contiguous chunk of mem­ory) of 1 mil­lion in­t64 val­ues,” DuckDB can of­ten read that same buffer di­rectly be­cause it un­der­stands the same phys­i­cal lay­out.

In prac­tice, whether the path is truly zero-copy de­pends on the dataframe’s phys­i­cal lay­out, col­umn types, null rep­re­sen­ta­tion, and string stor­age. If the types or lay­outs do not line up, DuckDB may al­lo­cate con­verted buffers for some columns.

Arrow is the clean­est ver­sion of this story be­cause Arrow is al­ready a colum­nar, typed mem­ory for­mat de­signed for shar­ing data be­tween sys­tems. That is why re­turn­ing DuckDB re­sults as Arrow, or query­ing Arrow-backed data, can avoid much of the row-by-row con­ver­sion over­head that tra­di­tional APIs im­pose.

From SQL to Logical Plan

Once your SQL reaches DuckDB, it goes through the usual stages: parse, bind, plan, op­ti­mize.

Parsing

The first step is to parse SQL into an ab­stract syn­tax tree (AST). DuckDB uses a fork of the Postgres parser, which is part of why DuckDB’s di­alect feels so fa­mil­iar.

An AST is a tree rep­re­sen­ta­tion of your query where each node is a syn­tac­tic con­struct: a SELECT state­ment, a col­umn ref­er­ence, a func­tion call, a join, a lit­eral. Parsing turns the flat string SELECT sum(l_quan­tity) FROM lineitem WHERE l_­ship­date > 2024 – 01-01’ into a struc­tured ob­ject the en­gine can ac­tu­ally rea­son about.

Select( ex­pres­sions=[ Sum( this=Col­umn( this=Iden­ti­fier(this=l_quan­tity, quoted=False)))], from_=From( this=Table( this=Iden­ti­fier(this=lineitem, quoted=False))), where=Where( this=GT( this=Col­umn( this=Iden­ti­fier(this=l_­ship­date, quoted=False)), ex­pres­sion=Lit­eral(this=‘2024 – 01-01’, is_string=True))))

AST from the SQLGlot li­brary.

A tree struc­ture is what lets the rest of the en­gine do its job. The binder walks the nodes to re­solve l_quan­tity to a spe­cific col­umn in a spe­cific table. The op­ti­mizer pat­tern-matches sub­trees to rec­og­nize that the WHERE pred­i­cate can be pushed down into the scan. The phys­i­cal plan­ner maps func­tion call nodes to ex­e­cutable op­er­a­tors. None of these passes can op­er­ate on raw SQL. They need to tra­verse, pat­tern-match, and rewrite a typed struc­ture.

Binding

The next step is bind­ing, which re­solves every name in the AST against the cat­a­log. lineitem be­comes a spe­cific table with a known schema. l_quan­tity be­comes a spe­cific col­umn with a known type. sum be­comes a spe­cific ag­gre­gate func­tion whose in­put type matches that col­umn. Type check­ing hap­pens here too: com­par­ing l_­ship­date to the string 2024 – 01-01’ works be­cause the binder co­erces the lit­eral to a date.

The out­put is a bound tree where every node knows what it refers to and what type it pro­duces. Errors like un­re­solved columns, am­bigu­ous ref­er­ences, and type mis­matches sur­face at this stage.

At this point, DuckDB has turned raw SQL text into a typed tree. The en­gine no longer sees l_quan­tity as just a string in a query; it sees a spe­cific col­umn with a spe­cific type from a spe­cific table.

The Optimizer

In DuckDB, the op­ti­mizer con­sists of a se­quence of small, fo­cused trans­for­ma­tions that you can, in fact, in­spect and dis­able in­di­vid­u­ally.

D SELECT * FROM duck­d­b_op­ti­miz­ers(); ┌────────────────────────────┐ │ name │ │ var­char │ ├────────────────────────────┤ │ ex­pres­sion_rewriter │ │ fil­ter_pullup │ │ fil­ter_­push­down │ │ emp­ty_re­sult_pullup │ │ cte_­fil­ter_­pusher │ │ regex_range │ │ in­_­clause │ │ join_or­der │ │ de­lim­i­na­tor │ │ unnest_rewriter │ │ un­used_­columns │ │ sta­tis­tic­s_prop­a­ga­tion │ │ com­mon_­subex­pres­sions │ │ com­mon_ag­gre­gate │ │ col­um­n_life­time │ │ lim­it_­push­down │ │ row_­group_pruner │ │ top_n │ │ top_n_win­dow_e­lim­i­na­tion │ │ build_­side_probe_­side │ │ com­pressed_­ma­te­ri­al­iza­tion │ │ du­pli­cate_­groups │ │ re­order_­fil­ter │ │ sam­pling_­push­down │ │ join_­fil­ter_­push­down │ │ ex­ten­sion │ │ ma­te­ri­al­ized_cte │ │ sum_rewriter │ │ late_­ma­te­ri­al­iza­tion │ │ cte_in­lin­ing │ │ com­mon_­sub­plan │ │ join_e­lim­i­na­tion │ │ win­dow_­self­_join │ └────────────────────────────┘ 33 rows

Running SET dis­abled_op­ti­miz­ers = filter_pullup, join_or­der’ turns spe­cific passes off so you can see what they were do­ing.

Here are a few in­ter­est­ing op­ti­miz­ers:

Filter push­down

This is a clas­sic data­base op­ti­miza­tion: move WHERE pred­i­cates as close to the scan as pos­si­ble so you prune data as early as pos­si­ble. DuckDB first pulls fil­ters up to the top of the plan so they can be com­bined and re­or­ga­nized, then pushes them back down as far as pos­si­ble.

Subquery unnest­ing

Correlated sub­queries tra­di­tion­ally force a data­base to run the in­ner query once per outer row, which is slow. DuckDB im­ple­ments tech­niques from the Unnesting Arbitrary Queries pa­per to rewrite these as joins, which are dra­mat­i­cally faster.

Dynamic join-fil­ter push­down

During a hash join (more on hash joins here), the build side has to be fully read be­fore the probe side starts. DuckDB takes ad­van­tage of that or­der­ing: once the build side is in mem­ory, it com­putes the min and max of the join key val­ues it ac­tu­ally con­tains, then pushes those bounds back into the probe-side scan as a run­time fil­ter. If the build side turned out to con­tain val­ues only be­tween 100 and 200, the probe scan can use the table’s zonemaps to skip any row groups out­side that range be­fore read­ing them.

When the build side has fewer than 50 dis­tinct join key val­ues, the fil­ter be­comes an IN list in­stead of a min-max range, which is more pre­cise and skips even more rows.

Join or­der op­ti­miza­tion

Join or­der is the most con­se­quen­tial de­ci­sion the op­ti­mizer makes. The or­der in which joins run de­ter­mines how big each in­ter­me­di­ate re­sult is. A query join­ing six ta­bles has 30,240 pos­si­ble tree shapes, and the dif­fer­ence be­tween best and worst can be or­ders of mag­ni­tude in run­time. Picking well re­quires es­ti­mat­ing how many rows each can­di­date join will pro­duce, which de­pends on table sizes, pred­i­cate se­lec­tiv­ity, and the or­der of joins that came be­fore.

DuckDB mod­els the query as a graph. Each table is a node, and each join pred­i­cate is an edge con­nect­ing the ta­bles it ref­er­ences. The op­ti­miz­er’s job is to pick an or­der to com­bine the nodes into a sin­gle tree, where each com­bi­na­tion is a join. For ex­am­ple, if we have a query join­ing a to b , b to c, and c to d, the graph might look like this:

a ── b ── c ── d

To find the best tree, DuckDB uses dy­namic pro­gram­ming, such as DPhyp or DPccp. Dynamic pro­gram­ming is a fancy name for a sim­ple idea: if you’ve al­ready fig­ured out the best way to join {a, b, c}, you can reuse that an­swer when fig­ur­ing out the best way to join {a, b, c, d}. You don’t need to re-ex­plore all the or­der­ings in­side {a, b, c} . It does this for every con­nected pair, then triplet, then quadru­plet, etc.

There are dozens more op­ti­miza­tions to ex­plore and the en­tire op­ti­miza­tion phase usu­ally fin­ishes in about a mil­lisec­ond. After op­ti­miza­tion, DuckDB has a log­i­cal plan. The next step is to trans­late that plan into some­thing the en­gine can ac­tu­ally ex­e­cute.

If you’ve en­joyed read­ing this so far, con­sider sub­scrib­ing. We’ll con­tinue shar­ing more about the in­tri­ca­cies of DuckDB and many other query en­gines.

The Physical Plan

Imagine the op­ti­mizer hands the en­gine this plan, writ­ten in plain English:

Read events from disk. Drop the rows where even­t_­date is on or be­fore 2026 – 01-01. Group what’s left by cus­tomer_id and add up amount. Sort the re­sult by to­tal de­scend­ing. Return the top 10.

Read events from disk. Drop the rows where even­t_­date is on or be­fore 2026 – 01-01. Group what’s left by cus­tomer_id and add up amount. Sort the re­sult by to­tal de­scend­ing. Return the top 10.

The en­gine now has to de­cide how to ac­tu­ally run those steps in a way that uses the CPU well and par­al­lelizes across cores.

Mapping Logical Steps to Physical Operators

The op­ti­miz­er’s out­put is still a log­i­cal plan. It says what each step needs to com­pute but not which al­go­rithm should do the com­put­ing. Most log­i­cal steps have sev­eral phys­i­cal im­ple­men­ta­tions.

Take a join. The same log­i­cal join can be turned into any of: hash join, in­dex join, piece­wise merge join, carte­sian join.

DuckDB walks the log­i­cal plan and picks a phys­i­cal op­er­a­tor for each node based on the shape of its in­puts and pred­i­cates. The out­put is a phys­i­cal plan — a tree of phys­i­cal op­er­a­tors the ex­ecu­tor knows how to run.

We will save the de­tails of vec­tor­ized ex­e­cu­tion for Part 2, but one ex­e­cu­tion con­cept is use­ful now: the phys­i­cal plan is not run as one gi­ant tree walk. DuckDB breaks it into pipelines.

Pipelines

Think of a pipeline as an as­sem­bly line. Data en­ters at one end and passes through a chain of sta­tions. Each sta­tion does one thing (drop a row, trans­form a col­umn, look up a value in a hash table) and hands the re­sult to the next sta­tion. As long as each sta­tion can de­cide what to do with a row us­ing only that row, the line keeps mov­ing. Examples of pipelines:

WHERE: it ei­ther passes the row through or drops it. No state needed.

A Projection: it com­putes new col­umn val­ues and emits them.

Probe side of hash join: once the hash table has been built, it looks up the row’s key in the hash table and emits the joined row or noth­ing if no match.

In DuckDB, a con­nected chain of stream­ing sta­tions like this is called a pipeline. Pipelines par­al­lelize cleanly since every CPU core can run its own copy of the as­sem­bly line on its own slice of the in­put.

Pipeline break­ers

Some op­er­a­tors can’t work this way. They need to see the en­tire in­put be­fore they can pro­duce an out­put.

ORDER BY can’t emit a sin­gle sorted row un­til its seen every row be­cause it does­n’t know which row be­longs first.

GROUP BY can’t emit the fi­nal sum un­til it has ac­counted for every row in a group­ing.

Build side of a hash join has to build the hash table be­fore it can start look­ing any­thing up.

These op­er­a­tors are called pipeline break­ers or sinks. They mark the end of one pipeline and the be­gin­ning of the next. The phys­i­cal plan is ef­fec­tively a se­quence of pipelines stitched to­gether by sinks.

Going back to our orig­i­nal query, the phys­i­cal plan may look some­thing like this:

Pipeline 1: ends at the GROUP BY sink:scan events → fil­ter even­t_­date > 2026 – 01-01’ → write into GROUP BYs hash table

Pipeline 2: ends at the ORDER BY sink:read groups out of the hash table → write them into the sorted run

Pipeline 3: the fi­nal as­sem­bly line:read sorted runs → take the first 10 rows → re­turn re­sults

Each pipeline runs in par­al­lel in­ter­nally. Multiple threads run the en­tire as­sem­bly line at once, each on its own morsel of in­put. Pipelines that de­pend on each other run in se­quence, be­cause pipeline 2 can’t start read­ing un­til pipeline 1′s GROUP BY is done writ­ing.

What Happens in a Sink

A sink runs in three phases: sink, com­bine, and fi­nal­ize.

Sink

Every thread ac­cepts chunks (DuckDB’s 2048 row batches) and writes them into its own lo­cal state, for ex­am­ple, its own hash table for a HASH_GROUP_BY, its own sorted run for ORDER_BY, its own par­tial ag­gre­gate for UNGROUPED_AGGREGATE, its own hash table for the build side of HASH_JOIN. Threads do not share state. If every thread wrote into one shared hash table, they’d be fight­ing for a lock on every in­sert. Local state lets each thread sink at full speed with no co­or­di­na­tion.

Combine

Once every thread fin­ishes writ­ing to its lo­cal space, the re­sults have to merge into a sin­gle global state. For a GROUP BY, that means com­bin­ing the par­tial sums and counts for each group across all the thread-lo­cal hash ta­bles. DuckDB de­signs the sink so the com­bine step it­self runs across all cores, rather than as a sin­gle-threaded merge at the end (covered in Part 3).

Finalize

The merged global state is read out as the in­put to the next pipeline. For our GROUP BY, that’ll be a stream of cus­tomer_id, to­tal) rows.

Parallelism is Local

A pipeline runs across all cores by giv­ing each thread its own morsel of in­put. A sink runs across all cores by giv­ing each thread its own lo­cal state and merg­ing in par­al­lel. DuckDB does not try to plan global par­al­lelism for the whole query, it par­al­lelizes one pipeline at a time. This is a part of what makes morsel-dri­ven par­al­lelism (covered in Part 3) and vec­tor­ized ex­e­cu­tion (covered in Part 2) work.

The Storage Layer

The amaz­ing thing about DuckDB is that it can turn most files into a SQL data­base, and in fact is of­ten used to di­rectly query file for­mats like Parquet, CSV, JSON, XLSX, etc.

AI Engineer Claims to Have Cracked Linear A

aiclambake.com

Tom Di Mino, a self-taught AI en­gi­neer and an am­a­teur lin­guist, claims to have ac­com­plished a feat that has eluded lin­guis­tics ex­perts for over a cen­tury: de­ci­pher­ing a Bronze-age Minoan writ­ing sys­tem known as Linear A.

His claims are cur­rently be­ing re­viewed by lin­guis­tics ex­perts at Rutgers and Cambridge. While I’m caveat­ing, I will also men­tion that I know Tom so­cially.

Di Mino, who is based in the Hudson Valley, has stud­ied clas­si­cal his­tory, lin­guis­tics, and lan­guages since he was 18. He has been read­ing up on Linear A for 7 years, and has vis­ited Crete twice. He be­gan to work on de­ci­pher­ing Linear A in January this year, and says the ma­jor in­sight came to him on May 22.

If Tom Di Mino has de­ci­phered Linear A, it would be an earth­quake in the field of lin­guis­tics. When a re­lated Minoan script, Linear B, was de­ci­phered in 1952, it made the front page of the New York Times.

Linear A maps to an ex­tinct Semitic lan­guage

Di Mino be­lieves that Linear A be­longs to an ex­tinct Semitic lan­guage that was a pre­cur­sor to bib­li­cal Hebrew, the way that Latin is a pre­cur­sor to Italian.

Di Mino is not the first to ar­gue that Linear A was Semitic. Prior at­tempts to prove it, how­ever, in­clud­ing a 1957 ar­ti­cle pub­lished by Cyrus Gordon in the jour­nal Antiquity, did not un­lock trans­la­tions the way that Di Mino’s so­lu­tion ap­pears to, and Gordon’s work did not gain wide­spread ac­cep­tance in the field.

Some back­ground on Linear A and Linear B

Linear A is a Minoan script that ap­peared some­time around 1800 BC and was used un­til 1450 BC, when Crete was con­quered by Mycenaean Greeks. The Mycenaeans adopted the Minoan sym­bols as their own, with some mi­nor re­vi­sions. The Mycenaean-Greek ver­sion of the sym­bols are known as Linear B. Both scripts were found on var­i­ous tablets, vases, and other ar­ti­facts from the era.

Both scripts use syl­la­bles, not let­ters, as their core el­e­ments. The syl­la­bles are gen­er­ally con­so­nant-vowel pairs.

The two sys­tems have 60 core syl­la­bles in com­mon, and they both also use lo­gograms — sym­bols that rep­re­sent a whole word (“cow”), not just a syl­la­ble.

Linear B was de­ci­phered and iden­ti­fied as Greek in 1952 by Michael Ventris, a British ar­chi­tect, cryp­tog­ra­pher, and am­a­teur lin­guist, like Di Mino. Ventris’s break­through may not have hap­pened with­out prior work on Linear B by Alice Kober, a pro­fes­sor at Brooklyn College.

Kober and Ventris used gram­mat­i­cal and sta­tis­ti­cal analy­ses to look for pat­terns in the lo­ca­tion of the sym­bols (e.g. the first syl­la­ble was more likely to be a vowel) and how the sym­bols shifted.

There are many more in­scrip­tions as­so­ci­ated with Linear B than Linear A, how­ever, which made it eas­ier to de­ci­pher. Also, many Linear A in­scrip­tions are in­ven­to­ries cat­a­loging the trade of dif­fer­ent com­modi­ties, so they don’t tell us much about the lan­guage.

Because Linear A and Linear B have 60 sym­bols in com­mon, and be­cause Linear B has been de­ci­phered, ex­perts could guess what the over­lap­ping Linear A sym­bols sounded like but did­n’t know what the sounds meant. And there were 13 ad­di­tional sym­bols in Linear A that did not ap­pear in Linear B. For those, no sound val­ues have been ac­cepted.

The key that un­locked Linear A

On May 22, Di Mino was an­a­lyz­ing a se­ries of Linear A prayer in­scrip­tions that ad­hered to a for­mula. (Don’t worry, you don’t have to un­der­stand the for­mula, but I’m in­clud­ing it for the nerds.) IOZa2 (Iouktas): A-TA-I-*301-WA-JA · JA-DI-KI-TU · JA-SA-SA-RA-ME · U-NA-KA-NA-SI · I-PI-NA-MA · SI-RU-TE · TA-NA-RA-TE-U-TI-NU · I (Also see Figure 1 be­low.)

On May 22, Di Mino was an­a­lyz­ing a se­ries of Linear A prayer in­scrip­tions that ad­hered to a for­mula. (Don’t worry, you don’t have to un­der­stand the for­mula, but I’m in­clud­ing it for the nerds.)

IOZa2 (Iouktas): A-TA-I-*301-WA-JA · JA-DI-KI-TU · JA-SA-SA-RA-ME · U-NA-KA-NA-SI · I-PI-NA-MA · SI-RU-TE · TA-NA-RA-TE-U-TI-NU · I

(Also see Figure 1 be­low.)

In the for­mula all of the words in each line of the in­scrip­tion were known (based on their over­lap with Linear B syl­la­bles) ex­cept for the first word.

In the for­mula all of the words in each line of the in­scrip­tion were known (based on their over­lap with Linear B syl­la­bles) ex­cept for the first word.

The first word was the same verb root, ap­pear­ing in dif­fer­ent re­gional forms across five sanc­tu­ary sites on the is­land.

The first word was the same verb root, ap­pear­ing in dif­fer­ent re­gional forms across five sanc­tu­ary sites on the is­land.

The verb con­tained 5 known Linear B signs and *301”, which ap­peared to be a Linear A-only sign, na,” which Di Mino used to un­lock the root nawaya,” which means to dwell.” In Hebrew, Akkadian and other Semitic lan­guages there is a 3 syl­la­ble con­so­nant sys­tem. N-W-Y is used for verbs and nouns mean­ing to dwell or in­habit”.

The verb con­tained 5 known Linear B signs and *301”, which ap­peared to be a Linear A-only sign, na,” which Di Mino used to un­lock the root nawaya,” which means to dwell.” In Hebrew, Akkadian and other Semitic lan­guages there is a 3 syl­la­ble con­so­nant sys­tem. N-W-Y is used for verbs and nouns mean­ing to dwell or in­habit”.

Once de­ci­phered, Di Mino saw that the prayer was sim­i­lar to sub­se­quent Hebrew prayers but was ad­dressed to a Goddess.

Once de­ci­phered, Di Mino saw that the prayer was sim­i­lar to sub­se­quent Hebrew prayers but was ad­dressed to a Goddess.

While Cyrus Gordon had pre­vi­ously pro­posed links be­tween ded­i­ca­tion tablets in Linear A and sim­i­lar tablets in Akkadian and Phoenician that he had trans­lated, Di Mino claims to be the first per­son to iden­tify the links be­tween the Linear A in­scrip­tions and Hebrew prayers.

While Cyrus Gordon had pre­vi­ously pro­posed links be­tween ded­i­ca­tion tablets in Linear A and sim­i­lar tablets in Akkadian and Phoenician that he had trans­lated, Di Mino claims to be the first per­son to iden­tify the links be­tween the Linear A in­scrip­tions and Hebrew prayers.

This in­sight not only un­locked the verb in the prayer in­scrip­tions, but it may also shed a broader light on the use of lo­gograms in Linear A.

This in­sight not only un­locked the verb in the prayer in­scrip­tions, but it may also shed a broader light on the use of lo­gograms in Linear A.

Di Mino claims that his in­sights into lo­gograms in Linear A ad­di­tion­ally help to re­solve prob­lems with some trans­la­tions of Linear B, which val­i­dates his find­ings.

Di Mino claims that his in­sights into lo­gograms in Linear A ad­di­tion­ally help to re­solve prob­lems with some trans­la­tions of Linear B, which val­i­dates his find­ings.

Di Mino used Claude Code to build a suite of Python scripts that query, cross-ref­er­ence, and or­ga­nize the dig­i­tized Linear A cor­pus (drawn from the GORILA and SigLA data­bases), en­abling sys­tem­atic hy­poth­e­sis test­ing at a scale that would have been im­prac­ti­cal to do man­u­ally.

Di Mino used Claude Code to build a suite of Python scripts that query, cross-ref­er­ence, and or­ga­nize the dig­i­tized Linear A cor­pus (drawn from the GORILA and SigLA data­bases), en­abling sys­tem­atic hy­poth­e­sis test­ing at a scale that would have been im­prac­ti­cal to do man­u­ally.

Artifacts

Di Mino’s re­search has led to:

Proposed read­ings for 40 of the scrip­t’s signs, in­clud­ing 13 signs whose pho­netic val­ues were pre­vi­ously un­known. He also re­solved the sound val­ues for 5 Linear B signs which were un­known to this day.

Proposed read­ings for 40 of the scrip­t’s signs, in­clud­ing 13 signs whose pho­netic val­ues were pre­vi­ously un­known. He also re­solved the sound val­ues for 5 Linear B signs which were un­known to this day.

A lex­i­con of 408 Linear A terms trans­lated into English

A lex­i­con of 408 Linear A terms trans­lated into English

A 9-page draft of a man­u­script ti­tled Ya Diktu: Grammar of the Minoan Peak Sanctuary Libation Formula, which may form the foun­da­tion for a sub­mis­sion to a peer-re­viewed sci­en­tific jour­nal

A 9-page draft of a man­u­script ti­tled Ya Diktu: Grammar of the Minoan Peak Sanctuary Libation Formula, which may form the foun­da­tion for a sub­mis­sion to a peer-re­viewed sci­en­tific jour­nal

Figure 1. A sum­mary of the sym­bols in line 1 of the Minoan prayer in­scrip­tion. Credit: Tom Di Mino, Ya Diktu: Grammar of the Minoan Peak Sanctuary, June 2026.

Google workspace threatening to block firefox access

tales.fromprod.com

At the time of writ­ing (2026 – 06-18), Google Workspace ap­pears to be start­ing to warn users from Firefox that they must use Chrome. This was for a Google Workspace Business Plus ac­count and work­space, from an up to date browser and OS.

At this time, Firefox ac­cess still seems to work but I’ve no idea for how long.

| 📝 Update as of 15:31Z 2026 – 06-18 | Google sup­port called and claim this will only hap­pen for ad­mins try­ing to ac­cess https://​ad­min.google.com and that it is­n’t block­ing, it’s just a rec­om­men­da­tion. They said they will not be doc­u­ment­ing this pub­licly | | ——————————— | :———————————————————————————————————————————————————————————————————————- |

Specific warn­ing

Icon in­di­cat­ing that the user may soon lose their ac­cess to their ac­count. Secure your de­vice for safe app ac­cess To help keep your data se­cure, make sure that your de­vice meets your or­gan­i­sa­tion’s se­cu­rity re­quire­ments Next steps

Download Chrome Browser and sign in with your work ac­count

This was from a web­page with url https://​ac­cess.work­space.google.com/​re­me­di­ate?url­params=REDACTED

Screenshot be­low

Response from Google sup­port

Absolutely noth­ing use­ful, re­peat­edly trans­ferred around and took ages.

«««< HEAD «««< Updated up­stream =======

Emailed up­date from their sup­port af­ter they called me

I’m pub­lish­ing this in full, none of this ac­tu­ally ad­dresses the is­sue or an­swers any­thing I asked on the call

Emailed up­date from their sup­port

I’m pub­lish­ing this in full, none of this ac­tu­ally ad­dresses the is­sue

[redacted per­sonal in­for­ma­tion about my­self and the sup­port staff] I ap­pre­ci­ate you ac­cept­ing my call ear­lier.

To en­sure your users have the best, most se­cure, and fea­ture-rich ex­pe­ri­ence with Google Workspace ser­vices, it’s cru­cial to use up-to-date, com­pat­i­ble web browsers. Using sup­ported browsers pro­vides ac­cess to the lat­est fea­tures and of­fers im­proved se­cu­rity and per­for­mance.

Here are the browsers com­pat­i­ble with Google Workspace:

Google Chrome: We rec­om­mend and fully sup­port the lat­est ver­sion of Google Chrome. Chrome typ­i­cally up­dates au­to­mat­i­cally, en­sur­ing ac­cess to all Google Workspace fea­tures and func­tion­al­ity.

Mozilla Firefox: Google Workspace works well with Firefox. We sup­port the cur­rent and the pre­vi­ous ma­jor ver­sion. Please note that Firefox does not cur­rently sup­port: Offline ac­cess to Gmail, Google Calendar, Google Docs, Sheets, and Slides. Client-side en­cryp­tion in Google Meet.

Apple Safari: Google Workspace also works well with Safari. We sup­port the cur­rent and the pre­vi­ous ma­jor ver­sion. Safari does not cur­rently sup­port: Offline ac­cess to Gmail, Calendar, Docs, Sheets, and Slides. Desktop no­ti­fi­ca­tions in Gmail.

Microsoft Edge: Google Workspace works well with Microsoft Edge. We sup­port the cur­rent and the pre­vi­ous ma­jor ver­sion.

Key Recommendations:

Keep Browsers Updated: Always en­cour­age users to run the lat­est ver­sions of these sup­ported browsers. For Firefox, Safari, and Edge, when a new browser ver­sion is re­leased, we be­gin sup­port­ing that ver­sion and stop sup­port­ing the third most re­cent ver­sion. Enable Cookies and JavaScript: To use Google Workspace ef­fec­tively, en­sure that both cook­ies and JavaScript are en­abled in the browser set­tings. Unsupported Browsers: While some func­tion­al­ity might work on older or un­sup­ported browsers, we can­not guar­an­tee full fea­ture avail­abil­ity or per­for­mance. Users may en­counter is­sues or find some ap­pli­ca­tions do not open cor­rectly. Mobile Access: For the best ex­pe­ri­ence on mo­bile de­vices (Android, iPhone, and iPad), please use the ded­i­cated Google Workspace mo­bile ap­pli­ca­tions, which are built specif­i­cally for these plat­forms.

By fol­low­ing these guide­lines, your or­ga­ni­za­tion can max­i­mize the ben­e­fits and se­cu­rity of­fered by Google Workspace.

For fu­ture ref­er­ence, please check and re­view these ar­ti­cles: Supported browsers for Google Workspace | Support & trou­bleshoot­ing | Google Workspace Help Service-specific Google Workspace re­quire­ments | Support & trou­bleshoot­ing | Google Workspace Help

Should you have any fur­ther ques­tions, we’d be happy to pro­vide as­sis­tance. This case will be closed in the next 3 busi­ness days, you can al­ways re­ply to this mes­sage within the next 30 days and the case will re­open.

Thank you for choos­ing Google Workspace, and I hope you have a won­der­ful day!

Kind re­gards, [redacted]

Why do I care?

My team need to make sure that their soft­ware works in mul­ti­ple browsers, and I per­son­ally pre­fer us­ing fire­fox and don’t want to be forced to use Chrome for no dis­cern­able ben­e­fit.

Okay, but did­n’t your ad­min con­fig­ure $enterprise_feature

Sadly not, I’m the ad­min and can con­firm the fol­low­ing

We haven’t con­fig­ured, and don’t use IAP (Identity Aware Proxy) - I’ve used this be­fore and yes that is Chrome only due to how it does de­vice ver­i­fi­ca­tion

This is­n’t be­cause of Context Aware Access” this is an en­ter­prise only fea­ture, and we’re on Google Workspace Business Plus

The AirPods Effect

www.theescapenewsletter.com

A LITTLE TIME away can be clar­i­fy­ing. When you’ve had a break from a place, you’re able to see it with fresh eyes. You no­tice things that rou­tine and fa­mil­iar­ity had ren­dered in­vis­i­ble.

During my last trip home to the U.S., one of the things that jumped out at me was the num­ber of peo­ple with AirPods in their ears.

Where I live, in south­west Germany, AirPods are far less com­mon. It was jar­ring to see so many lit­tle white glob­ules drip­ping out of the ears of those around me in cof­fee shops, in gro­cery stores, and pretty much every­where else I went dur­ing my trip to sub­ur­ban Detroit. Whether young or old, chic or grungy, ath­leisured or den­imed, every­one seemed to be sport­ing some type of ear­phone.

Americans are speak­ing less and less to one an­other. The num­ber of spo­ken words ut­tered by the av­er­age per­son fell by 28% be­tween 2005 and 2019.

Americans are speak­ing less and less to one an­other. The num­ber of spo­ken words ut­tered by the av­er­age per­son fell by 28% be­tween 2005 and 2019.

The pop­u­lar­ity of AirPods is noth­ing new. But as the func­tion­al­ity of our tech-con­nected ear gear has im­proved — and as pod­casts have ex­ploded into one of the most con­sumed forms of me­dia in America — earphones have as­sumed a big­ger role in our daily lives.

By some mar­ket es­ti­mates, 44% of Americans use Bluetooth or wire­less ear­phones, and an ad­di­tional 24% use some­thing wired. I could­n’t find good data on the per­cent­age of peo­ple who reg­u­larly wear ear­phones as they go about their daily lives. But dur­ing my re­cent trips to Michigan and Florida, I felt like half the peo­ple around me in pub­lic had some kind of de­vice-con­nected ear­wear on their head.

There is dis­ap­point­ingly lit­tle peer-re­viewed re­search on the ef­fects ear­phones have on our daily lives and in­ter­ac­tions. But the ev­i­dence we do have sug­gests that while AirPods and sim­i­lar tech­nolo­gies do some won­der­ful things for us, they also sub­tly in­flu­ence our be­liefs, re­in­force our in­se­cu­ri­ties, and push us far­ther apart.

During the pre-smart­phone era of iPods and other portable mu­sic de­vices, a small study of col­lege stu­dents found that those who were heavy users of head­phones ex­pe­ri­enced higher lev­els of so­cial iso­la­tion and lone­li­ness.

More than 15 years later, in 2021, a sur­vey con­ducted by the au­dio tech­nol­ogy com­pany Jabra came to sim­i­lar con­clu­sions. Heavy head­phone use makes peo­ple feel lone­lier, the sur­vey found. It also makes peo­ple less likely to have a mean­ing­ful con­ver­sa­tion with some­one new. Many of those in­ter­viewed for the sur­vey said they wore head­phones in part to avoid hav­ing to talk to other peo­ple.

This habit of us­ing head­phones to dodge un­com­fort­able in­ter­ac­tions may be es­pe­cially com­mon among younger adults, for whom so­cial un­ease and feel­ings of iso­la­tion are well-doc­u­mented prob­lems that have be­come more com­mon in re­cent decades.

I be­lieve hu­man in­ter­ac­tion is fad­ing, largely in part to the con­stant us­age of AirPods or other forms of head­phones,” wrote Eva Long, a stu­dent at Liberty University in Virginia, in a 2025 opin­ion piece for her school’s news­pa­per, The Liberty Champion.

No one talks on the bus. No one greets the barista. Even in class, stu­dents are choos­ing to lis­ten to mu­sic in­stead of their pro­fes­sors,” Long wrote. When pass­ing some­one I know who has AirPods in their ears, it’s dif­fi­cult to catch their at­ten­tion un­less we make di­rect eye con­tact. This lack of en­gage­ment is dis­cour­ag­ing, and it makes spon­ta­neous so­cial con­nec­tions less likely.”

Headphones are a so­cial crutch, grant­ing us the abil­ity to tune in or out of the world as we please,” wrote sopho­more Katelyn Halverson in The Cornell Daily Sun. Interpersonal in­ter­ac­tion in pub­lic spaces has be­come more or less op­tional with the use of head­phones — and it ap­pears that the ma­jor­ity (myself in­cluded) have a sneaky ten­dency to opt out.”

Both of these col­lege-pa­per think pieces were writ­ten in 2025, but I found a half-dozen oth­ers — some dat­ing back to 2019. All of them be­moaned the fact that, thanks largely to head­phones, the col­le­giate ex­pe­ri­ence has be­come less so­cial, less im­mer­sive, and less in­ter­ac­tive. Basically, less col­le­gial.

All these lit­tle con­ver­sa­tions add up to us feel­ing like peo­ple are gen­er­ally good, I can talk to any­body, and I have a place in this world. That’s some­thing we all need.’

All these lit­tle con­ver­sa­tions add up to us feel­ing like peo­ple are gen­er­ally good, I can talk to any­body, and I have a place in this world. That’s some­thing we all need.’

While ear­phone-as­sisted com­fort bub­bles are noth­ing new on cam­pus — or for that mat­ter, in cof­fee shops or on pub­lic tran­sit — I see them bleed­ing into sit­u­a­tions where, just a few years ago, they would never have oc­curred.

People now wear their AirPods all day at the of­fice. They keep them in while or­der­ing and pay­ing for things in stores and su­per­mar­kets.

I played golf last sum­mer at a pub­lic course in Michigan, and the guy I was paired with wore AirPods through­out our nine holes to­gether. After shak­ing my hand and of­fer­ing me a terse play well,” the guy did­n’t say five words to me for the rest of our round. I would have felt less iso­lated play­ing alone.

I know that a lot of peo­ple wear AirPods to fa­cil­i­tate com­mu­ni­ca­tion, not to de­ter it. AirPods can func­tion as hear­ing aids — block­ing out back­ground noise while help­fully am­pli­fy­ing the words of a con­ver­sa­tion part­ner.

The prob­lem is that un­less you al­ready know the AirPod wearer and you’re con­fi­dent they won’t be both­ered if you start chat­ting with them, ear­phones are the equiv­a­lent of a Do Not Disturb” sign. We see them and as­sume the per­son wear­ing them is ei­ther lis­ten­ing to some­thing or try­ing to block out dis­trac­tion. To strike up a con­ver­sa­tion with some­one wear­ing ear­buds feels in­tru­sive — like you’re bulling your way into their per­sonal space with­out per­mis­sion.

I’m sure some peo­ple read­ing this will say, Well, so what? Small talk is a drag any­way, es­pe­cially with strangers or loose ac­quain­tances. As long as a per­son has close con­nec­tions in their lives — peo­ple for whom they ei­ther take out their AirPods or use them to con­nect and com­mu­ni­cate — then what’s the harm?

I used to feel this way my­self, but I’ve learned some things that have changed my mind.

For a piece I wrote re­cently for Time mag­a­zine, I de­tailed the find­ings of a new study that found Americans are speak­ing to one an­other far less than they used to. According to that study, the num­ber of spo­ken words ut­tered by the av­er­age per­son fell by 28% be­tween 2005 and 2019. Each year dur­ing that time pe­riod, the num­ber of words peo­ple spoke in an av­er­age day de­clined.

One of the au­thors of that study, the University of Arizona so­cial psy­chol­o­gist Matthias Mehl, told me it’s highly likely that spo­ken com­mu­ni­ca­tion has fallen fur­ther since 2019. He pointed to the loss of idle chitchat and other pub­lic-space in­ter­ac­tions as sig­nif­i­cant con­trib­u­tors to the trend. We can shop for gro­ceries now with­out talk­ing with a check­out per­son, and in restau­rants we can some­times or­der and pay with­out ever talk­ing with a server,” he said. All these ways in which we have ren­dered our daily lives more ef­fi­cient may have also re­sulted in ren­der­ing our so­cial lives more rudi­men­tary.”

When peo­ple lis­tened to pod­cast-style au­dio con­tent through head­phones, they per­ceived the pod­caster to be warmer and friend­lier, more per­sua­sive, and more em­pa­thetic than if they lis­tened to the same piece of con­tent on speak­ers.

When peo­ple lis­tened to pod­cast-style au­dio con­tent through head­phones, they per­ceived the pod­caster to be warmer and friend­lier, more per­sua­sive, and more em­pa­thetic than if they lis­tened to the same piece of con­tent on speak­ers.

For that Time piece, I also spoke with Gillian Sandstrom, a psy­chol­o­gist at the University of Sussex and au­thor of the new book Once Upon a Stranger.

Sandstrom told me that ca­sual con­ver­sa­tions with peo­ple we don’t know well can make us feel more con­nected to one an­other. These con­ver­sa­tions also ex­er­cise and en­hance our so­cial skills. They may even bol­ster our faith in hu­man­ity. When we have these in­ter­ac­tions, they tend to go much bet­ter than we thought they would, and we come away from them with a sense that peo­ple are gen­er­ally good,” she told me.

The more I’ve thought about what she told me, the more im­por­tant her mes­sage feels.

For those of us who wear ear­buds all the time, the peo­ple drift­ing by on the out­side of our ar­ti­fi­cially qui­eted, per­son­ally cu­rated sound si­los can be­gin to re­sem­ble other ve­hi­cles on a traf­fic-choked in­ter­state — that is, like lit­tle more than nui­sances crowd­ing our space and im­ped­ing our progress.

I think we need reg­u­lar doses of real hu­man con­tact — not just with close friends, but with ac­quain­tances, and even with strangers — to coun­ter­bal­ance all the neg­a­tiv­ity we en­counter in the news and on­line, and to re­mind us that, on the whole, peo­ple are kind and well-mean­ing.

Apart from throw­ing up road­blocks that pre­vent these sorts of ca­sual in­ter­ac­tions, ear­buds may change our re­la­tion­ship to the con­tent we con­sume.

For a study creep­ily (but aptly) ti­tled A Voice Inside My Head,” re­searchers at sev­eral University of California schools found that when peo­ple lis­tened to pod­cast-style au­dio con­tent through head­phones, as op­posed to via speak­ers, they tended to form a more pos­i­tive im­pres­sion of the per­son de­liv­er­ing the pod­cast. They per­ceived the pod­caster to be warmer and friend­lier, more per­sua­sive, and more em­pa­thetic than if they lis­tened to the same piece of con­tent on speak­ers.

The ex­pla­na­tion for this, ac­cord­ing to the study’s au­thors, is that head­phones may re­duce the psy­cho­log­i­cal dis­tance be­tween lis­tener and speaker; head­phones give lis­ten­ers the sense that the speak­er’s voice is com­ing from in­side their head — al­most as though the voice they’re hear­ing and their own in­ter­nal thoughts are one and the same. It is im­por­tant to un­der­stand how the medium through which peo­ple lis­ten can af­fect their per­cep­tions, at­ti­tudes, and be­hav­iors,” the study’s au­thors wrote. We find con­sis­tent ev­i­dence that lis­ten­ing to a mes­sage via head­phones (vs. speak­ers) leads lis­ten­ers to feel closer to com­mu­ni­ca­tors, lead­ing to dif­fer­ent psy­cho­log­i­cal and be­hav­ioral re­sponses to mes­sages.”

It’s pos­si­ble that many of us are so taken with pod­casts — and so amenable to the the­o­ries and opin­ions we en­counter in them — in part be­cause of these sub­tle per­cep­tual and psy­cho­log­i­cal ef­fects. (As Marshall McLuhan fa­mously put it, the medium is the mes­sage.”)

While all these con­se­quences are con­cern­ing, I think the great­est prob­lem our ear­phones pose to us — and the one that led me, sev­eral years ago, to cut back my own use — is the way au­dio con­tent can crowd out time we should prop­erly spend with our own thoughts.

Back in 2019, I wrote a piece ti­tled Why Your Brain Needs Idle Time.” I de­tailed all the rea­sons we need to give our minds reg­u­lar breaks from new in­for­ma­tion so that we have time to con­sider and make sense of our ex­pe­ri­ences.

The deeper re­flec­tive states, where you make mean­ing of what’s go­ing on and con­nect it to self and iden­tity and in­te­grate knowl­edge to­gether into co­her­ent nar­ra­tives — these kinds of processes only hap­pen when you’re not fo­cused on some in-the-mo­ment ac­tiv­ity,” Mary Helen Immordino-Yang, a pro­fes­sor at the University of Southern California, told me for that piece.

These vi­tal pe­ri­ods of con­tem­pla­tion and mean­ing mak­ing re­quire us to step away from our var­i­ous con­tent streams and al­low our thoughts to wan­der freely. But thanks to ear­buds, such op­por­tu­ni­ties to rest and re­flect are in­creas­ingly op­tional — and ef­fort­ful.

During my last trip home to Detroit, I was fill­ing a con­tainer at a gro­cery store salad bar when an older man, un­prompted, pointed at the jalapeno slaw I was spoon­ing up and said, You’re go­ing to eat that?”

He looked at me side­ways, shak­ing his head and smil­ing. Oh man, that looks too spicy for me. You’re go­ing to have to tell me how it is. I don’t know about that!”

Living abroad, one of the many things I miss about the U.S. is the warmth and friend­li­ness of its peo­ple. (In my ex­pe­ri­ence, a German would never in­ter­act with a stranger the way this older man had in­ter­acted with me.) I told the man I’d be sure to let him know about the slaw, and he wished me a good day. The in­ter­ac­tion lasted 15 sec­onds, but it bright­ened my whole af­ter­noon.

The great­est ben­e­fit we get from chat­ting with other peo­ple — and the one we may ul­ti­mately miss the most if we spend less time talk­ing with one an­other — is also the hard­est to quan­tify, Sandstrom told me for that Time ar­ti­cle.

All these lit­tle con­ver­sa­tions add up to us feel­ing like peo­ple are gen­er­ally good, I can talk to any­body, and I have a place in this world,” she said. That’s very hard to mea­sure, but that’s some­thing we all need.”

The more time we all spend with AirPods in our ears, the more that need is likely to go un­met.

No posts

To study how chips really work, MIT researchers built their own operating system

news.mit.edu

A new ker­nel, or core pro­gram within an op­er­at­ing sys­tem, gives re­searchers a cleaner view of what’s hap­pen­ing in­side a proces­sor. Called Fractal and de­vel­oped at MIT, the ker­nel has al­ready sur­faced pre­vi­ously un­known be­hav­ior in Apple’s M1.

When se­cu­rity re­searchers want to un­der­stand what a mod­ern proces­sor is re­ally do­ing with the kind of de­tail that de­ter­mines whether at­tacks like Spectre and Meltdown are pos­si­ble, they usu­ally run their ex­per­i­ments on top of an op­er­at­ing sys­tem that was never built for the job. They open up ma­cOS or Linux, patch the ker­nel by hand, and hope the mod­i­fi­ca­tions hold. The ap­proach is un­sta­ble, hard to re­pro­duce, and on Apple’s plat­forms, slated for dep­re­ca­tion.

A team at MITs Computer Science and Artificial Intelligence Laboratory (CSAIL) de­cided to build some­thing dif­fer­ent. Fractal, an op­er­at­ing sys­tem ker­nel writ­ten from the ground up, treats the hard­ware it­self as the ob­ject of study. Its first ma­jor use, a deep look at branch pre­dic­tors — a CPUs way of guess­ing what code to run next, be­fore it knows for cer­tain, so it does­n’t have to waste time wait­ing to find out — in­side Apple’s M1 proces­sor, has al­ready turned up find­ings that prior work missed, in­clud­ing the first ev­i­dence that a class of spec­u­la­tive at­tack known as Phantom” af­fects Apple Silicon.

We’re us­ing hard­ware in ways it was­n’t de­signed for,” says Joseph Ravichandran, the MIT PhD stu­dent in elec­tri­cal en­gi­neer­ing and com­puter sci­ence (EECS) who led the pro­ject. It’s not even ob­vi­ous that this is a pos­si­ble thing you could do with the hard­ware. But we found a way to pull all these dif­fer­ent prim­i­tives off. It’s like a mi­cro­scope. If you’ve got a hand mag­ni­fy­ing glass, you can see a lit­tle bit. But if you had an elec­tron mi­cro­scope, now we’re re­ally talk­ing. That’s what Fractal is. The elec­tron mi­cro­scope of op­er­at­ing sys­tems.”

A clean room for chip re­search

The core prob­lem Fractal solves is one that re­searchers have worked around for years. Modern proces­sors keep state in many in­ter­nal struc­tures: branch pre­dic­tors, caches, trans­la­tion looka­side buffers, and more. To study how those struc­tures be­have across the bound­ary be­tween user code and ker­nel code, two do­mains the chip is sup­posed to keep iso­lated, re­searchers need to run nearly iden­ti­cal ex­per­i­ments on each side of that bound­ary. On a gen­eral-pur­pose op­er­at­ing sys­tem, that is very dif­fi­cult. The sys­tem it­self man­ages priv­i­lege lev­els, ad­dress spaces, and sched­ul­ing, and it in­jects its own ac­tiv­ity into every mea­sure­ment.

Fractal in­verts the model. It boots di­rectly on bare metal, with no other soft­ware run­ning, and ex­poses prim­i­tives that let a sin­gle ex­per­i­ment switch priv­i­lege lev­els at run­time while ex­e­cut­ing the same in­struc­tions in the same ad­dress space. The team calls the un­der­ly­ing tech­nique multi-priv­i­lege con­cur­rency, and it re­lies on a new con­struct they in­tro­duced: the outer ker­nel thread, which sits in­side a user process’s mem­ory but ex­e­cutes with ker­nel priv­i­leges.

The re­sult is an ex­per­i­men­tal setup with al­most no back­ground noise. Where mea­sure­ments taken un­der ma­cOS or Linux are blurred by in­ter­rupts, sched­uler ac­tiv­ity, and ad­dress-space man­age­ment, Fractal pro­duces flat base­lines and clean sig­nals.

What Fractal found on the M1

Apple’s M1 im­ple­ments an ARM spec­i­fi­ca­tion called CSV2, which is sup­posed to pre­vent code run­ning in one priv­i­lege level from steer­ing spec­u­la­tion in an­other. Using Fractal, the MIT team con­firmed that the pro­tec­tion works for the ex­e­cute stage of in­di­rect branch pre­dic­tion: a user-mode pro­gram can­not make the ker­nel spec­u­la­tively ex­e­cute a cho­sen tar­get through the in­di­rect branch pre­dic­tor.

But the team also found some­thing the chip’s de­sign­ers may not have in­tended. The CPU still fetches the tar­get into the in­struc­tion cache be­fore the pro­tec­tion kicks in. That fetch is ob­serv­able through a side chan­nel, which means user code can still in­flu­ence what the ker­nel pulls into its caches across the priv­i­lege bound­ary. The same pat­tern ap­peared be­tween processes as­signed dif­fer­ent ad­dress space iden­ti­fiers.

The team also pro­duced the first ev­i­dence that Apple Silicon ex­hibits Phantom spec­u­la­tion, a class of mis­pre­dic­tion pre­vi­ously demon­strated only on AMD and Intel proces­sors. In Phantom, or­di­nary in­struc­tions, in­clud­ing a no-op, can be mis­in­ter­preted by the CPU as branches, trig­ger­ing spec­u­la­tive be­hav­ior the pro­gram never asked for. On the M1, Fractal showed that Phantom fetches suc­ceed across both priv­i­lege lev­els and ad­dress spaces, though the ex­e­cute phase re­mains blocked.

A sep­a­rate Fractal ex­per­i­ment over­turned a find­ing from ear­lier work on the M1s con­di­tional branch pre­dic­tor, which had re­ported that cross-priv­i­lege train­ing worked on Apple’s per­for­mance cores, but not its ef­fi­ciency cores. The Fractal team showed that the con­di­tional branch pre­dic­tor has no priv­i­lege iso­la­tion at all, on ei­ther core type, and that the ear­lier re­sult was likely an ar­ti­fact of ma­cOS qui­etly mi­grat­ing threads be­tween cores dur­ing sys­tem calls.

For us, it is a true in­de­pen­dent vari­able,” Ravichandran says. You change the priv­i­lege level, noth­ing else changes. The only thing that could ex­plain whether the at­tack suc­ceeds or not is the priv­i­lege level.”

A tool, not a one-off

Fractal sup­ports x86_64, ARM64, and RISC-V, and con­sists of more than 31,000 lines of code. The team de­signed it as in­fra­struc­ture rather than as a sin­gle ex­per­i­ment, with fa­mil­iar POSIX sys­tem calls, a C li­brary, and ports of stan­dard tools like vim, GCC, and the dash shell, so that re­searchers can move ex­ist­ing ex­per­i­ment code over with min­i­mal fric­tion.

The MIT team dis­closed its M1 find­ings to Apple’s prod­uct se­cu­rity team. In an un­usual re­ver­sal, Apple’s en­gi­neers also ex­am­ined Fractal.

The longer-term am­bi­tion is big­ger than any sin­gle re­sult. Ravichandran wants Fractal to be­come to mi­croar­chi­tec­ture re­search what tools like QEMU and FFmpeg are to their fields: shared in­fra­struc­ture that the whole com­mu­nity builds on.

My hope is that our re­sults as a com­mu­nity get sig­nif­i­cantly more re­li­able, sig­nif­i­cantly more ac­cu­rate,” says Ravichadran. With this re­duced noise, this clar­ity, and this guar­an­tee that you’re run­ning on the right core, on the right sys­tem.”

Fractal is a strong ar­chi­tec­ture con­tri­bu­tion be­cause it turns an of­ten ad hoc mi­croar­chi­tec­tural re­verse-en­gi­neer­ing work­flow into reusable re­search in­fra­struc­ture,” says Uni­ver­sity of Southern California as­sis­tant pro­fes­sor Mengyuan Li, who was­n’t in­volved in the pa­per. By re­duc­ing soft­ware noise and giv­ing re­searchers tighter con­trol across priv­i­lege bound­aries, it makes dif­fi­cult hard­ware ex­per­i­ments much eas­ier to in­ter­pret.”

Ravichandran worked with Mengjia Yan, an MIT as­so­ci­ate pro­fes­sor of EECS and CSAIL prin­ci­pal in­ves­ti­ga­tor, on the pa­per. Their work was sup­ported, in part, by the National Science Foundation, the U.S. Air Force Office of Scientific Research, and ACE, which is part of a pro­gram spon­sored by the U.S. Defense Advanced Research Projects Agency. They pre­sented their work at the IEEE Symposium on Security and Privacy in San Francisco, California.

Press Mentions

IEEE Spectrum

Writing for IEEE Spectrum, re­porter Matthew S. Smith high­lights Fractal, a new op­er­at­ing sys­tem hand-coded by CSAIL re­searchers to pro­vide a clear view of se­cu­rity vul­ner­a­bil­i­ties. We paved the way with tech­niques such as cus­tom ker­nel patches and ker­nel ex­ten­sions,” says grad­u­ate stu­dent Joseph Ravichandran. The dream was al­ways to have a com­pletely cus­tom op­er­at­ing sys­tem which would make these hacks un­nec­es­sary.”

Related Links

Fractal pro­ject web­site

Joseph Ravichandran

Mengjia Yan

Computer Science and Artificial Intelligence Laboratory (CSAIL)

Department of Electrical Engineering and Computer Science (EECS)

School of Engineering

MIT Schwarzman College of Computing

There Are No Instances in atproto — overreacted

overreacted.io

Every sin­gle time a post about at­proto hits Hacker News, some­body asks in the com­ments: But where are all the Bluesky in­stances?”. The prob­lem is, there are no in­stances in at­proto! The ques­tion is a cat­e­gory er­ror. Instances are a Mastodon-brained con­cept, and I wanted some­thing I can link to that ex­plains this clearly.

So this is that post.

RSS and Google Reader

I know RSS is still be­ing used some­where (podcasts?!) but its hey­day is ar­guably be­hind. Which is a shame. For a few years, which some of us might fondly re­mem­ber as the golden age of the web, it felt like blog­ging was a cool thing.

Now look at this pic­ture be­cause it’s go­ing to be im­por­tant:

al­ice’s­blog­cat’s­blog­bob’s­blog­googleread­er­feedly

As a re­minder, you pub­lish stuff on your own blog, which you can ei­ther self-host or host on a pop­u­lar blog­ging plat­form. But then every­one’s stuff gets ag­gre­gated into apps like Google Reader and Feedly, or col­lec­tive blogs like Monologue (RIP).

Note that host­ing and ag­gre­ga­tion are two sep­a­rate things. Your posts don’t live” in an app like Google Reader. Apps are mere pro­jec­tions of the Blogosphere.

Seriously, make sure this thought sears into your brain; it’s go­ing to be es­sen­tial.

Facebook and Such

Here’s what you could call an evo­lu­tion of this con­cept.

We put a box around the whole thing so that every­one is en­closed in the same space so we can show ads and stuff. Also, let’s leave only one app (we can let al­ter­na­tive apps live for a while, but not for long). That’s tra­di­tional so­cial me­dia.

al­ice’s­postscat’s­posts­bob’s­posts­face­book­the face­book news­feed

Oh no, now we have cen­tral­iza­tion!

Oh no, run­away net­work ef­fects!

Oh no, bla bla bla.

What do we do?

We need to de­cen­tral­ize this some­how.

Mastodon and Its Instances

I say Mastodon” here be­cause if I say ActivityPub” in­stead, a crowd of peo­ple will show up and say that ac­tu­ally what I’m de­scrib­ing is how Mastodon chose to im­ple­ment ActivityPub. Whereas ActivityPub by it­self does not re­ally spec­ify how to ac­tu­ally use it in prac­tice. I’m sure this is all very in­ter­est­ing—but I di­gress.

How do we de­cen­tral­ize a so­cial net­work?

Let’s build a ver­sion of what we saw ear­lier, but make it self-hostable. Then every com­mu­nity can have their own little Facebook” or little Twitter”. We’ll call them in­stances. They’re kind of like coun­tries—be­cause you live inside” one of them:

al­ice’s­post­salex’s­post­san­n’s­postscat’s­post­scrow’s­postscal­i’s­posts­bob’s­posts­bree’s­posts­boba’s­posts­mastodon in­stance #1mastodon in­stance #2mastodon in­stance #3the news­feedthe news­feedthe news­feed

But wait, this opens a bunch of ques­tions.

How do you choose which in­stance to join? Maybe you’re a mem­ber of a few over­lap­ping com­mu­ni­ties. Well, I guess you’re just gonna have to pick which com­mu­ni­ty’s ad­mins you trust the most with han­dling your iden­tity and data.

Okay, now an­other prob­lem—what if my friend’s on a dif­fer­ent in­stance? How will they see my posts? Since each in­stance is ba­si­cally its own lit­tle Facebook, they have no shared source of truth. So they have to send mes­sages to each other:

al­ice’s­post­salex’s­post­san­n’s­postscat’s­post­scrow’s­postscal­i’s­posts­bob’s­posts­bree’s­posts­boba’s­posts­mastodon in­stance #1mastodon in­stance #2mastodon in­stance #3the news­feedthe news­feedthe news­feed

This net­work topol­ogy might re­mind you of war­ring fief­doms in Ancient China.

If Alice-from-instance-#1 fol­lows Bree-from-instance-#2, the two in­stances make an agree­ment: Bree’s posts will be for­warded to in­stance #1 so that Alice can see them. That’s called federation”. You post on your in­stance, and then it gets for­warded to other in­stances whose users wanted to hear from you.

This pic­ture has a few in­ter­est­ing im­pli­ca­tions:

You belong” to your in­stance. You’re not Alice, you are Alice-from-instance-#1. That’s why your Mastodon lo­gin is lit­er­ally [email protected]. Where you’re from” is an im­mutable part of your iden­tity. (Somehow, this man­ages to be even more re­stric­tive than coun­tries and na­tion­al­i­ties.)

If your in­stance’s ad­mins pick a fight with an­other in­stance’s ad­mins, they may choose to stop fed­er­at­ing”, and no longer for­ward any posts be­tween them. That could be a sur­pris­ing rea­son why you’re no longer see­ing posts from your friends.

If your in­stance goes down, your iden­tity ceases to ex­ist. People who fol­lowed you fol­lowed you-from-that-in­stance, not some ab­stract pla­tonic actual you”.

Oh, and the ar­rows be­tween in­stances scale as O(n²). This might not mat­ter much now, but it could mat­ter if this ap­proach to so­cial net­work­ing be­comes pop­u­lar.

at­proto

Now for­get all of that—full re­set.

The mis­take was when we drew this box:

al­ice’s­postscat’s­posts­bob’s­posts­face­book­the face­book news­feed

Erase the box.

Go back to this:

al­ice’s­blog­cat’s­blog­bob’s­blog­googleread­er­feedly

We have host­ing where things ac­tu­ally live”, and apps ag­gre­gate from them. This worked for blogs just fine, so why would­n’t it work for lit­er­ally every­thing else?

al­ice’sstuff­cat’sstuff­bob’sstuffapp #1app #2

Like RSS, but for all kinds of stuff.

That’s at­proto.

So Where Are All the Bluesky Instances?

Now you know! There are no in­stances in at­proto.

Instances are these Mastodon-brained things:

al­ice’s­post­salex’s­post­san­n’s­postscat’s­post­scrow’s­postscal­i’s­posts­bob’s­posts­bree’s­posts­boba’s­posts­mastodon in­stance #1mastodon in­stance #2mastodon in­stance #3the news­feedthe news­feedthe news­feed

They’re those iso­lated bun­dled host­ing+app fief­doms that send stuff to each other.

Compare this pic­ture to at­proto.

In at­proto, we cut host­ing apart from the ag­gre­ga­tion at the net­work level:

al­ice’sstuffalex’sstuffcrow’sstuff­cal­i’sstuff­boba’sstuffat­pro­toapp #1atprotoapp #2atprotoapp #3bree’sstuffann’sstuffbob’sstuffcat’sstuffatproto host­ing #1atproto host­ing #2atproto host­ing #3

There are no in­stances at all! There’s host­ing you can swap, and there are apps that ag­gre­gate from every­one’s host­ing. It’s very much like RSS and Google Reader.

The de­cen­tral­iza­tion of at­proto is richer in struc­ture than many copies of one app”:

If you want to swap your host­ing, you can. I lit­er­ally did this to­day. Aside from three or four UX snags, it was all au­to­matic. My at­proto stuff is at Eurosky now. If I were more ad­ven­tur­ous, I could host all my data my­self too for free on Cloudflare.

If you want to swap your host­ing, you can. I lit­er­ally did this to­day. Aside from three or four UX snags, it was all au­to­matic. My at­proto stuff is at Eurosky now. If I were more ad­ven­tur­ous, I could host all my data my­self too for free on Cloudflare.

If you want to try new apps or make new apps, you can do that too! Check out Tangled and Semble, which have noth­ing to do with Bluesky. I’ve made my own app re­cently (and it’s open source). I rec­om­mend you to try your hand at it too.

If you want to try new apps or make new apps, you can do that too! Check out Tangled and Semble, which have noth­ing to do with Bluesky. I’ve made my own app re­cently (and it’s open source). I rec­om­mend you to try your hand at it too.

You care about de­cen­tral­iza­tion? You have full agency here. Decentralize away.

Free Yourself from the Instance Brain

Now you see why every de­cen­tral­ized so­cial me­dia dis­cus­sion is de­railed by this.

Mastodon users mea­sure de­cen­tral­iza­tion by the num­ber of in­stances be­cause that’s the only thing you can do in Mastodon. If there’s only one type of box”, and each box is an app cou­pled with host­ing”, the only thing you can do is to host more of these boxes and get them to talk to each other. They’re iso­lated by de­fault.

In at­proto, every app is a pro­jec­tion of the whole Atmosphere, just like Feedly and Google Reader are pro­jec­tions of the en­tire Blogosphere. You mostly decentralize” by swap­ping your host­ing, and/​or by mak­ing and try­ing new apps. Running many full copies of the Bluesky data­base server is pos­si­ble, but it’s not any more use­ful than run­ning many copies of Google Reader. People do set them up (cue Blacksky), but they arise to meet some­one’s spe­cific needs (like a dif­fer­ent mod­er­a­tion phi­los­o­phy). There are other ap­proaches too: this Bluesky client has no ded­i­cated data­base at all, and it just hits a free com­mu­nity-run cache of every­one’s host­ing. Shared net­work in­fra­struc­ture like Relays has been cheap to run for a year now.

This is why counting Bluesky in­stances” is so mis­lead­ing. What mat­ters is:

Are peo­ple mi­grat­ing to al­ter­na­tive host­ing?

Are peo­ple try­ing and mak­ing new apps?

Separating host­ing and apps fixes bro­ken in­cen­tives in closed and in fed­er­ated so­cial. Coupling host­ing and apps was the orig­i­nal sin, and the fix is sim­ple.

Keep our stuff out­side the apps; let the apps ag­gre­gate over it.

al­ice’sstuff­cat’sstuff­bob’sstuffapp #1app #2

Like RSS and Google Reader.

Ten years of ClickHouse in open source

clickhouse.com

ClickHouse was re­leased in open source on Jun 15 2016, ten years ago. Since then, it be­came the most pop­u­lar open source an­a­lyt­i­cal data­base with more than 2000 con­trib­u­tors.

There are dif­fer­ent lev­els of open-source.

Level 0: The min­i­mum level is mak­ing the code open to the pub­lic for read­ing, but noth­ing more. This is the case of archival and mu­seum re­leases, such as Doom or MS-DOS.

Level 1: The next level is when the soft­ware is up­dated by com­mits in a pub­lic repos­i­tory, but not nec­es­sar­ily ac­cept­ing con­trib­u­tors. This is also an ex­am­ple of open source. SQLite and Ladybird are ex­am­ples.

Level 2: Accepting con­tri­bu­tions but with­out a trans­par­ent and open de­vel­op­ment process. Most ac­tive open-source pro­jects are on this level.

Level 3: Open con­tri­bu­tion guide­lines, task tracker, code re­view sys­tem, de­vel­op­ment roadmap, test­ing and CI sys­tem, re­lease cy­cle, user sup­port, and doc­u­men­ta­tion.

I al­ways aim for the max­i­mum. ClickHouse should be the best ex­am­ple of:

How to build a great data­base - if you want to build a new data­base, ClickHouse source code and de­vel­op­ment prac­tices will serve as the best ex­am­ple. I al­ways write the code so every­one can learn from it - by keep­ing it mod­u­lar, or­thog­o­nal, and well-doc­u­mented. When the code re­quires a com­plex con­cept, I ex­plain it in the com­ments from scratch, so the read­ers don’t have to re­fer to text­books, Wikipedia, or AI.

A place to learn C++ de­vel­op­ment. Many peo­ple are look­ing for repos­i­to­ries rep­re­sent­ing the fron­tier of soft­ware en­gi­neer­ing, and to­day ClickHouse is one of the most pop­u­lar open source repos­i­to­ries in C++, where every­one can learn both the ex­cit­ing stuff (C++23) and bor­ing stuff (build sys­tems, con­tin­u­ous in­te­gra­tion and test­ing, code re­view prac­tices, and AI).

A place for ex­per­i­ments on data struc­tures and per­for­mance op­ti­miza­tion. You can open a pull re­quest as an ex­per­i­ment, with­out aim­ing for it to be merged - it will be tested with the same level of scrutiny as pro­duc­tion re­leases. Found a new mem­ory al­lo­ca­tor, a new com­pres­sion li­brary, a new hash table, a data for­mat, or a sort­ing al­go­rithm? - bring it to ClickHouse, and it will ex­pose it in­side-out. The roadmap also in­cludes a sec­tion about ex­per­i­men­tal, weird, and even ridicu­lous things.

Where you can be proud of your work. ClickHouse cred­its every con­trib­u­tor in the changelog and even in­side the data­base in the sys­tem.con­trib­u­tors table! There are count­less cases when a con­trib­u­tor sends an ini­tial, in­com­plete im­ple­men­ta­tion of a fea­ture, and we help to fin­ish it to­gether. Even if the code has to be en­tirely rewrit­ten, we do it proac­tively and take the re­spon­si­bil­ity for that, and al­ways credit the ini­tial au­thor, be­cause we care about your use case and the ini­tial in­tent that made it hap­pen. To put it sim­ply, we love our con­trib­u­tors.

Prototypes and first com­mits

The first com­mit in ClickHouse was made on May 29, 2009, and it was a per­for­mance op­ti­miza­tion (a re­place­ment of libc func­tions lo­cal­time, mk­time, gm­time, which were ex­tremely slow and an­noyed me by show­ing up in the pro­filer). But it was be­fore ClickHouse ex­isted.

ClickHouse started as my ex­per­i­ment while I was work­ing on data pro­cess­ing for a web an­a­lyt­ics sys­tem. The sys­tem, sim­i­lar to Google Analytics, re­ceived logs about pageviews sent from web­sites, and it was im­ple­mented with MySQL, data pro­cess­ing in C++, and cus­tom data struc­tures in C++ where MySQL could­n’t suf­fice. The MySQL data­bases stored pre-ag­gre­gated re­ports for cus­tomers, and cus­tom data struc­tures used for cal­cu­lat­ing user ses­sions, user his­tory, and sim­i­lar stuff.

My ex­pe­ri­ence from that time was - the data vol­ume is grow­ing, noth­ing works, and the new data ap­pears in real time. If we can’t process a five-minute chunk of logs in five min­utes, there will be a de­lay. I will search for any cre­ative so­lu­tion while the de­lay ac­cu­mu­lates and de­ploy it on the same work day.

That’s how I was search­ing for any so­lu­tion that works - any type of data­bases, any li­braries, etc. Can we use TokuDB? Colleagues use LMDB, maybe it will save us? Let’s try Judy Arrays. Someone at lunch told us about Hadoop, should we use it? I heard briefly about LZO and QuickLZ in the cor­ri­dor - let’s try it. If we store HyperLogLogs in MySQL BLOBs, how will we sum them? On a week­end, I will read that data com­pres­sion book or the doc­u­men­ta­tion on event-loop servers…

While sta­bi­liz­ing the data pipeline, I was also think­ing about new fea­tures that I can bring to the prod­uct. If we record clicks on links, we can show a heat map on every page. And if we record the po­si­tion of every click in the DOM, we can make a click map. For Apr 1, I made a 3D click map in Flash with anaglyph col­ors. The more in­ter­est­ing fea­ture was to let our users con­struct any re­port in­stead of a set of pre-ag­gre­gated ones.

For this task, I ex­plored col­umn-ori­ented data­bases. I’ve read about them from ran­dom com­pany mail­ing lists, web­sites like dbms2.com, and my col­leagues from the ads de­part­ment. The idea is to store non-ag­gre­gated, but struc­tured logs and ag­gre­gate them on the fly, while the cus­tomer waits for page load. I tested a few ex­ten­sions to MySQL: Infobright, InfiniDB, and a few stand­alone an­a­lyt­i­cal data­bases: Vertica, MonetDB, and LucidDB. For some rea­son, none worked on load­ing 100 bil­lion records a day with 500 columns. Then I tried to im­ple­ment a sim­ple pro­to­type of a cus­tom data struc­ture: every col­umn (only in­te­gers, with hashes in­stead of strings) for every day and every web­site in a sin­gle bi­nary file (a bil­lion of files needed XFS), with light­weight com­pres­sion, up­dated once a day with a de­lay of a few hours, queried with an API al­low­ing to spec­ify columns to group by, ag­gre­gate func­tions, fil­ters, and sort­ing (queries were spec­i­fied in XML). The most dif­fi­cult part was pop­u­lat­ing his­tor­i­cal data from MySQL by unaggregating” it so that ag­gre­gated data would show the same re­sult - it was solved by my col­league, Evgenii Gatov.

This sim­ple pro­to­type (named OLAPServer, im­ple­mented in Dec 2008, de­ployed in Jan 2009) worked. I’ve also cre­ated an end­point to let peo­ple an­a­lyze global Internet data in­stead of sin­gle web­sites, and it worked like a mir­a­cle. One ex­am­ple: there was a sta­tis­tics de­part­ment pro­cess­ing Internet logs us­ing an in­ter­nal ver­sion of MapReduce, but an­a­lysts in the com­pany started to use my ser­vice in­stead, be­cause it an­swers in­stantly.

The first pro­to­type of re­port gen­er­a­tor. The fron­tend and de­sign are also mine.

Then I de­cided to re­place ag­gre­gated re­ports in MySQL (it ac­cu­mu­lated about 50 TB of data on 50 shards). Many cus­tom data struc­tures were stored as BLOBs, and to ag­gre­gate them, the pro­grams had to read them from the data­base, ap­ply cus­tom code, and in­sert them back. Moreover, data in MySQL was un­com­pressed. And even more - the data was read­ing slowly, be­cause the or­der of its ar­rival (by time) did­n’t cor­re­spond with the or­der of queried ranges (by web­site ID). I was read­ing about LevelDB and TokuDB, so I de­cided to im­ple­ment a cus­tom data struc­ture for in­cre­men­tal ag­gre­ga­tion with back­ground merges. Every record in this table was de­fined by a cus­tom C++ struct, rep­re­sent­ing CRDT with add, up­date, merge, se­ri­al­ize­Text/​Bi­nary and de­se­ri­al­ize­Text/​Bi­nary meth­ods. On read, the par­tially ag­gre­gated data is fi­nally merged and re­turned to the API. This data struc­ture can be used for any ag­gre­gated re­port, such as unique users and vis­its by re­gion, or a click map for every page.

This sim­ple pro­to­type (named Metrage) also worked. So we end up with two cus­tom data struc­tures - one col­umn-ori­ented for non-ag­gre­gated data, up­dated daily, with only in­te­ger types, and an­other row-ori­ented, up­dated in real-time, with ar­bi­trary CRDT.

For a long time, these two cus­tom data struc­tures solved our prob­lems. To be hon­est, no one de­manded more. But I thought - what if I try to com­bine a col­umn-ori­ented ap­proach for ag­gre­ga­tion speed and a merge tree for re­al­time up­dates and data lo­cal­ity? And also gen­er­al­ize it to al­low a real query lan­guage and data types? This is how ClickHouse started.

ClickHouse is a rare ex­am­ple of a data­base sys­tem that is not based on any ex­ist­ing one - im­ple­mented en­tirely from scratch. Today, most of the data­base man­age­ment sys­tems are im­ple­mented on top of Postgres, Datafusion, and even ClickHouse. It might be in­ter­est­ing to look at how it is pos­si­ble to boot­strap a DBMS out of noth­ing, in what steps?

The first com­mits in 2009 are re­lated to op­ti­miza­tions re­lated to other data struc­tures in the same mono-repos­i­tory. They are vis­i­ble be­cause, dur­ing open-sourc­ing, I care­fully split the repos­i­tory while pre­serv­ing all the his­tory.

The first com­mit where I started im­ple­ment­ing a new DBMS (the name ClickHouse came later) is here - the im­ple­men­ta­tion of columns in mem­ory: you can see al­ready fa­mil­iar classes IColumn and Field. Compare it to to­day’s im­ple­men­ta­tion :) You might think that this is sim­i­lar to Apache Arrow (which fo­cuses on col­umn rep­re­sen­ta­tion in mem­ory), and why did­n’t we use it - but Apache Arrow did­n’t ex­ist then (other col­umn-ori­ented for­mats, such as RCFile, Trevni, ORC, and Parquet did­n’t ex­ist ei­ther).

Then ag­gre­gate func­tions were in­tro­duced in this com­mit. It is still one of the most im­por­tant parts of ClickHouse.

Then table en­gines were in­tro­duced. It is funny that table en­gines were named primary key”, but only for a few days. This al­lowed read­ing and writ­ing columns on disk. The first table en­gine was sim­i­lar to TinyLog, which ex­ists till to­day.

Then com­pres­sion was added. Initially, it was QuickLZ, but as soon as I read Yann Collet’s blog, I re­placed it with LZ4.

Then block streams - com­po­nents of the data pro­cess­ing pipeline that pro­duce, con­sume, or trans­form chunks of columns in a stream­ing form. Today, these are re­placed with Processors. This un­locked the way for for­mat­ting re­sults and im­ple­ment­ing queries on ta­bles. The same com­mit added StorageSystemNumbers - in­tro­duced for test­ing query pipelines, and it re­mains to­day as our beloved sys­tem.num­bers table. The first query pipeline in ClickHouse was print­ing num­bers in TSV.

Here you can see which table en­gines were in­tro­duced in what or­der.

The first re­la­tional op­er­a­tor in the ClickHouse code base was LIMIT.

Then I tried to add a SQL parser. The first at­tempt tried to use boost::spirit, which failed. After a while, I made a re­cur­sive de­scent parser.

Interesting to point out some ini­tial ideas that were re­jected or rein­tro­duced later. Initially, I tried to add a col­umn with vari­able-length en­coded num­bers. It was re­moved due to slow­ness, and only much later we in­tro­duced cus­tom com­pres­sion codecs, in­de­pen­dent of columns. Initially, I added a col­umn type Variant con­tain­ing ar­bi­trary field val­ues. It was also slow, and I re­moved it - a bet­ter ver­sion of Variant was added in 2025. I also had a fixed-size ar­ray data type along with a vari­able-size ar­ray, but I re­moved it due to the lack of need. Only to­day we are con­sid­er­ing adding it back. I be­lieve that re­mov­ing un­nec­es­sary code is more im­por­tant than adding new code, and to­day, re­mov­ing code is my fa­vorite thing to do. You can find a lot of com­mits in ClickHouse ti­tled remove trash” and sim­i­larly.

Here you can see the first real table struc­ture tested in ClickHouse - it is the hits table you can still see to­day in ClickBench.

Trying to read and write this table un­cov­ered that C++ iostreams are slow, so in this com­mit you can see the in­tro­duc­tion of WriteBuffer, ReadBuffer, which are still used to­day.

First func­tions in SQL ap­peared here - arith­metic op­er­a­tors. And it al­lowed to im­ple­ment the first SELECT query in­ter­preter. At this time, the SELECT query in­ter­preter was only ac­ces­si­ble from a test pro­gram, but it al­lowed quickly im­ple­ment­ing new ag­gre­gate and reg­u­lar func­tions, re­la­tional op­er­a­tors, data for­mats and other com­po­nents.

ClickHouse server was in­tro­duced on Mar 9, 2012 and click­house-client on Mar 25. Together with the Log, TinyLog, Merge, Distributed, and Memory table en­gines, it was enough to de­ploy ClickHouse on pro­duc­tion. The first de­ploy­ment was to store in­com­ing chunks of logs for fur­ther pro­cess­ing and for global queries on top of raw logs (this is what Merge and Distributed do). We can say that the first pro­duc­tion us­age of ClickHouse was a per­sis­tent log queue with SQL queries on top 😂

Then I’ve added MergeTree - it al­lowed in­cre­men­tal sort­ing of data in the back­ground, so that while the data ar­rives by time, range queries by a sin­gle web­site work fast, and we can de­ploy it for pro­duc­tion as a re­place­ment for both early pro­to­types, OLAPServer and Metrage. The first ver­sion con­tained a few cu­riosi­ties for our pro­duc­tion, like a more ag­gres­sive merge of data parts at night.

In 2012, I had a chance to hire the em­ployee №2 in my team, Michael Kolupaev, and I have the plea­sure to work with him to this day.

Our pro­duc­tion was de­ployed in mul­ti­ple re­gional data cen­ters, and the in­fra team was de­lib­er­ately turn­ing off a data cen­ter for an hour once a month (it was named drills”), so that un­pre­pared ser­vices ex­pe­ri­enced down­time - this was to teach every­one to im­ple­ment highly avail­able multi-DC ser­vices. So every­thing in pro­duc­tion has to be repli­cated in mul­ti­ple DCs. Initially, I used sim­ple dou­ble-write for that with back­fill for a DC af­ter its down­time. But we wanted 100% con­sis­tency with au­to­matic re­pair, and for that, we needed dis­trib­uted con­sen­sus. Some of my col­leagues were Java en­gi­neers, so they hooked us on ZooKeeper as a co­or­di­na­tion sys­tem (don’t worry, I for­give them), and Michael im­ple­mented ReplicatedMergeTree us­ing ZooKeeper as a meta­data layer. It al­lowed de­ploy­ing ClickHouse for pro­duc­tion for user-fac­ing queries in 2014.

In 2014, ClickHouse was in pro­duc­tion, stor­ing hun­dreds of bil­lions of records every day and an­swer­ing re­al­time queries from cus­tomers. I’ve also made it ac­ces­si­ble for data sci­en­tists in the com­pany who used it to cal­cu­late trends on the Internet. I pub­lished a sim­ple doc­u­men­ta­tion on ClickHouse us­age. Other de­part­ments, such as ads, e-com­merce, in­fra, and busi­ness an­a­lyt­ics, tried ClickHouse and mi­grated some of their use-cases from other sys­tems, such as in­ter­nal map-re­duce (where they were lit­er­ally writ­ing jobs on text logs with Perl), MySQL, and Postgres. At the end of 2014, ClickHouse was widely used, but only in a sin­gle com­pany (with one ex­cep­tion - CERN also de­ployed it in a co­op­er­a­tion for LHCb ex­per­i­ment).

When I watched pre­sen­ta­tions on tech con­fer­ences and read blogs, I no­ticed that in other com­pa­nies, en­gi­neers of­ten do some­thing sim­i­lar to OLAPServer or Metrage, be­cause none of the ex­ist­ing data­bases could rea­son­ably work on their use-cases - a story very fa­mil­iar to me! And I thought - what if I can pre­sent about ClickHouse? I pub­lished an ar­ti­cle about ClickHouse in 2015 (translation), and it proved the in­ter­est in it even more. My thought - if I make it ac­ces­si­ble for every­one, it can fill this empty niche. If I don’t - some­one else will even­tu­ally do it, and it is re­ally scary.

I pre­pared a list of items to mo­ti­vate com­pany man­age­ment to ap­prove the open-source re­lease, with the list of po­ten­tial ad­van­tages and po­ten­tial risks. Somehow, I was con­vinc­ing enough, and it was ap­proved, so I cre­ated a plan for re­lease, de­sign­ers made the first logo, I cre­ated the first web­site, pre­pared the blog post, cre­ated a Debian repos­i­tory (with the in­fra team), and it was opened to every­one in the world on Jun 15, 2016.

I want this story to also mo­ti­vate every en­gi­neer to try open-sourc­ing their code. In the worst case, noth­ing will come out of it, but there is a chance it will in­flu­ence gen­er­a­tions, as ClickHouse does! Don’t worry about be­ing ashamed of your code - I just showed you my code from fif­teen years ago, and it looks kind of funny. Today, ClickHouse is the most pop­u­lar an­a­lyt­i­cal data­base used by the largest com­pa­nies across the world.

Get started with ClickHouse Cloud to­day and re­ceive $300 in cred­its. At the end of your 30-day trial, con­tinue with a pay-as-you-go plan, or con­tact us to learn more about our vol­ume-based dis­counts. Visit our pric­ing page for de­tails.

reuters.com

www.reuters.com

Please en­able JS and dis­able any ad blocker

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

Visit pancik.com for more.