10 interesting stories served every morning and every evening.

OpenAI unveils its first custom chip, built by Broadcom

techcrunch.com

On Wednesday, OpenAI un­veiled its first cus­tom-built in­fer­ence proces­sor, de­signed and man­u­fac­tured in col­lab­o­ra­tion with Broadcom. Named Jalapeño, the new proces­sor was de­signed specif­i­cally for the unique needs of OpenAI’s in­fer­ence sys­tems. OpenAI’s own AI mod­els as­sisted in the de­vel­op­ment of the chip, the com­pany said.

While the chip is still be­ing tested, OpenAI says early re­sults show sig­nif­i­cantly bet­ter per­for­mance-per-watt than cur­rent state-of-the-art al­ter­na­tives.

The part­ner­ship was of­fi­cially an­nounced in October, but OpenAI’s chip plans have long been ru­mored as a way to re­duce the com­pa­ny’s de­pen­dence on Nvidia’s GPUs. Google and Amazon have both built cus­tom chips to serve a sim­i­lar pur­pose, of­ten called AI ac­cel­er­a­tors” — sil­i­con de­signed specif­i­cally to speed up ma­chine learn­ing work­loads.

OpenAI pres­i­dent Greg Brockman ex­plained the com­pa­ny’s ap­proach to chip de­vel­op­ment on its in-house pod­cast, shortly af­ter the Broadcom part­ner­ship was an­nounced.

We have a deep un­der­stand­ing of the work­load,” Brockman said in the episode. We’ve re­ally been look­ing for spe­cific work­loads that are un­der­served, [and ask­ing] how can we build some­thing that will be able to ac­cel­er­ate what’s pos­si­ble?”

Jalapeño is specif­i­cally de­signed for in­fer­ence, the process of run­ning pre-built AI mod­els in re­sponse to user com­mands. In the an­nounce­ment, OpenAI em­pha­sized the chip’s low op­er­at­ing cost when run­ning real-time cod­ing mod­els. It’s likely that more per­for­mance-in­ten­sive tasks like pre-train­ing will still rely on Nvidia hard­ware, but even small re­duc­tions in in­fer­ence costs could do a lot to im­prove the com­pa­ny’s bot­tom line.

Optimizing that in­fer­ence sys­tem may prove to be a cru­cial fac­tor in the eco­nom­ics of AI go­ing for­ward — and it’s likely to take place at every level of the stack. OpenAI is al­ready build­ing agen­tic prod­ucts like Codex and the mod­els that power them, as well as data cen­ters to run those mod­els. Moving into pur­pose-built chips lets the com­pany go even fur­ther in that process, as the com­pany ex­plained in its an­nounce­ment.

OpenAI is not only de­vel­op­ing fron­tier mod­els or build­ing prod­ucts on top of them; it is de­sign­ing the in­fra­struc­ture un­der­neath them: chip ar­chi­tec­ture, ker­nels, mem­ory sys­tems, net­work­ing, sched­ul­ing, de­ploy­ment sys­tems, and prod­uct ex­pe­ri­ence,” the com­pany wrote. Because OpenAI op­er­ates across the stack, each layer can be op­ti­mized around the same goal: mak­ing its mod­els faster, more re­li­able, and more af­ford­able for users.”

When you pur­chase through links in our ar­ti­cles, we may earn a small com­mis­sion. This does­n’t af­fect our ed­i­to­r­ial in­de­pen­dence.

Russell Brandom has been cov­er­ing the tech in­dus­try since 2012, with a fo­cus on plat­form pol­icy and emerg­ing tech­nolo­gies. He pre­vi­ously worked at The Verge and Rest of World, and has writ­ten for Wired, The Awl and MITs Technology Review. He can be reached at rus­sell.bran­dom@techcrunch.com or on Signal at 412 – 401-5489.

View Bio

reuters.com

www.reuters.com

Please en­able JS and dis­able any ad blocker

RubyLLM

rubyllm.com

A sin­gle, beau­ti­ful Ruby frame­work for all ma­jor AI providers. Easily build chat­bots, AI agents, RAG ap­pli­ca­tions, con­tent gen­er­a­tors, and every AI work­flow you can think of.

Battle tested at - Fully pri­vate work AI

Build a work­ing Ruby AI chat in two min­utes

Using RubyLLM? Share your story! Takes 5 min­utes.

Why RubyLLM?

Every AI provider ships their own bloated client. Different APIs. Different re­sponse for­mats. Different con­ven­tions. It’s ex­haust­ing.

RubyLLM gives you one beau­ti­ful frame­work for all of them. Same in­ter­face whether you’re us­ing GPT, Claude, or your lo­cal Ollama. Just three de­pen­den­cies: Faraday, Zeitwerk, and Marcel. That’s it.

Show me the code

# Just ask ques­tions chat = RubyLLM.chat chat.ask What’s the best way to learn Ruby?”

# Analyze any file type chat.ask What’s in this im­age?”, with: ruby_conf.jpg” chat.ask What’s hap­pen­ing in this video?”, with: video.mp4″ chat.ask Describe this meet­ing”, with: meeting.wav” chat.ask Summarize this doc­u­ment”, with: contract.pdf” chat.ask Explain this code”, with: app.rb”

# Multiple files at once chat.ask Analyze these files”, with: [“diagram.png”, report.pdf”, notes.txt”]

# Stream re­sponses chat.ask Tell me a story about Ruby” do |chunk| print chunk.con­tent end

# Generate im­ages RubyLLM.paint a sun­set over moun­tains in wa­ter­color style”

# Create em­bed­dings RubyLLM.embed Ruby is el­e­gant and ex­pres­sive”

# Transcribe au­dio to text RubyLLM.transcribe meeting.wav”

# Moderate con­tent for safety RubyLLM.moderate Check if this text is safe”

# Let AI use your code class Weather < RubyLLM::Tool desc Get cur­rent weather”

def ex­e­cute(lat­i­tude:, lon­gi­tude:) url = https://​api.open-me­teo.com/​v1/​fore­cast?lat­i­tude=#{lat­i­tude}&lon­gi­tude=#{lon­gi­tude}&cur­rent=tem­per­a­ture_2m,wind_speed_10m JSON.parse(Faraday.get(url).body) end end

chat.with­_­tool(Weather).ask What’s the weather in Berlin?”

# Define an agent with in­struc­tions + tools class WeatherAssistant < RubyLLM::Agent model gpt-5-nano” in­struc­tions Be con­cise and al­ways use tools for weather.” tools Weather end

WeatherAssistant.new.ask What’s the weather in Berlin?”

# Get struc­tured out­put class ProductSchema < RubyLLM::Schema string :name num­ber :price ar­ray :features do string end end

re­sponse = chat.with­_schema(Prod­uctSchema).ask Analyze this prod­uct”, with: product.txt”

Features

Chat: Conversational AI with RubyLLM.chat

Vision: Analyze im­ages and videos

Audio: Transcribe and un­der­stand speech with RubyLLM.transcribe

Documents: Extract from PDFs, CSVs, JSON, any file type

Image gen­er­a­tion: Create im­ages with RubyLLM.paint

Embeddings: Generate em­bed­dings with RubyLLM.embed

Moderation: Content safety with RubyLLM.moderate

Tools: Let AI call your Ruby meth­ods

Agents: Reusable as­sis­tants with RubyLLM::Agent

Structured out­put: JSON schemas that just work

Streaming: Real-time re­sponses with blocks

Rails: ActiveRecord in­te­gra­tion with act­s_as_chat

Async: Fiber-based con­cur­rency

Model reg­istry: 800+ mod­els with ca­pa­bil­ity de­tec­tion and pric­ing

Extended think­ing: Control, view, and per­sist model de­lib­er­a­tion

Providers: OpenAI, xAI, Anthropic, Gemini, VertexAI, Bedrock, DeepSeek, Mistral, Ollama, OpenRouter, Perplexity, GPUStack, and any OpenAI-compatible API

Installation

Add to your Gemfile:

gem ruby_llm’

Then bun­dle in­stall.

Configure your API keys:

# con­fig/​ini­tial­iz­ers/​ru­by_llm.rb RubyLLM.configure do |config| con­fig.ope­nai_api_key = ENV[‘OPENAI_API_KEY’] end

Rails

# Install Rails Integration bin/​rails gen­er­ate ru­by_llm:in­stall bin/​rails db:mi­grate bin/​rails ru­by_llm:load­_­mod­els # v1.13+

# Add Chat UI (optional) bin/​rails gen­er­ate ru­by_llm:chat_ui

class Chat < ApplicationRecord act­s_as_chat end

chat = Chat.create! model: claude-sonnet-4” chat.ask What’s in this file?”, with: report.pdf”

Visit http://​lo­cal­host:3000/​chats for a ready-to-use chat in­ter­face!

web hl2

hl2.slqnt.dev

Downloading…

Hotter Than a Hot Tub: The 45°C Breakthrough to Cool AI’s Biggest Machines

blogs.nvidia.com

Hot tubs sit at about 38 to 40 de­grees Celsius, warm enough that most peo­ple can only soak for about 15 min­utes. NVIDIAs newest AI servers can run their cool­ing liq­uid even hot­ter — up to 45 de­grees Celsius, or 113 de­grees Fahrenheit. That higher tem­per­a­ture limit is pre­cisely what makes them more en­ergy ef­fi­cient.

The Rubin gen­er­a­tion of NVIDIA AI in­fra­struc­ture is the world’s first to achieve 100% liq­uid cool­ing — every chip, every net­work­ing com­po­nent, cooled en­tirely by liq­uid in a closed loop with no fans any­where in the sys­tem. This liq­uid cool­ing method­ol­ogy is out­lined in the NVIDIA DSX AI fac­tory ref­er­ence de­sign, a guide that out­lines best prac­tices to de­sign, build and op­er­ate the en­tire AI fac­tory in­fra­struc­ture stack.

Although each gen­er­a­tion of­fers sig­nif­i­cantly more com­put­ing power for each watt, full liq­uid-cooled AI com­pute in­fra­struc­ture en­ables data cen­ters to dra­mat­i­cally re­duce cool­ing en­ergy con­sump­tion — mak­ing a mean­ing­ful dif­fer­ence to over­all data cen­ter en­ergy use at hy­per­scale.

The NVIDIA DSX ref­er­ence de­sign for AI fac­to­ries has zero wa­ter con­sump­tion — we have elim­i­nated mas­sive amounts of power us­age and pretty much all wa­ter us­age,” said Ali Heydari, di­rec­tor of data cen­ter cool­ing and in­fra­struc­ture at NVIDIA. With dry-cooler-based de­signs, it’s a closed-loop sys­tem with no evap­o­ra­tive wa­ter cool­ing — out­side of maybe 1% of the year when we might need chillers in some cli­mates.”

Historically, cool­ing alone has ac­counted for up to 40% of a data cen­ter’s elec­tric­ity con­sump­tion, mak­ing it one of the most sig­nif­i­cant ar­eas where ef­fi­ciency im­prove­ments can drive down both op­er­a­tional ex­penses and en­ergy de­mands.

Industry es­ti­mates sug­gest that rais­ing chiller plant tem­per­a­tures by just one de­gree can cut cool­ing en­ergy costs by about 4%. At scale, those sav­ings add up quickly. A 50-megawatt hy­per­scale fa­cil­ity can save over $4 mil­lion an­nu­ally in cool­ing-re­lated en­ergy and wa­ter costs by mov­ing to liq­uid-cooled in­fra­struc­ture.

In fa­vor­able cli­mates, NVIDIAs 45-degree liq­uid-cool­ing ar­chi­tec­ture can en­able chiller-less op­er­a­tion with dry cool­ers, re­duc­ing fa­cil­ity cool­ing wa­ter con­sump­tion from roughly 2.6 mil­lion gal­lons per megawatt per year for con­ven­tional cool­ing-tower-based sys­tems to near zero — up to a 100% re­duc­tion in wa­ter use.

The rea­son: tra­di­tional air-cooled data cen­ters de­pend on large vol­umes of cooled air to re­move heat from IT equip­ment, of­ten re­quir­ing en­ergy-in­ten­sive cool­ing in­fra­struc­ture dur­ing hot weather. With NVIDIAs 45-degree liq­uid cool­ing, heat is cap­tured di­rectly at the chip and trans­ported through liq­uid loops op­er­at­ing at much higher tem­per­a­tures, al­low­ing out­door dry cool­ers to re­ject heat ef­fi­ciently for much of the year while sig­nif­i­cantly re­duc­ing me­chan­i­cal cool­ing re­quire­ments and fa­cil­ity wa­ter con­sump­tion.

The data cen­ter am­bi­ent tem­per­a­ture is flex­i­ble — warm sum­mer air is fine — be­cause noth­ing in the server de­pends on cool air. The liq­uid does all the work — and the same liq­uid can be re­cir­cu­lated in a closed loop so no new wa­ter is con­sumed to cool the chips.

https://​blogs.nvidia.com/​wp-con­tent/​up­loads/​2026/​06/​Liq­uid­Cooling­In­fra_­mon­tage_v4.mp4

A New Standard for the Industry

Because the NVIDIA Rubin plat­form in­te­grates 100% liq­uid-cooled in­fra­struc­ture, every cloud provider and data cen­ter op­er­a­tor build­ing for it is mak­ing the tran­si­tion.

The ecosys­tem is keep­ing pace. Motivair, the ad­vanced cool­ing di­vi­sion of Schneider Electric, has worked along­side NVIDIAs prod­uct roadmap for nearly a decade — and Richard Whitmore, its pres­i­dent and CEO, says the re­la­tion­ship only in­ten­si­fied as power den­si­ties crossed the thresh­old where air cool­ing was no longer a vi­able op­tion.

Once the watts per chip crossed a cer­tain level, liq­uid cool­ing be­came manda­tory,” said Whitmore.

Too Hot to Cool AI Infrastructure Is Hotter Than You’d Think

There’s a long-stand­ing mis­con­cep­tion in the in­dus­try that a cold data cen­ter is an ef­fi­cient one. Decades ago, if a data cen­ter did­n’t feel like a walk-in freezer, peo­ple would as­sume some­thing was wrong.

In re­al­ity, chips can sus­tain far warmer en­vi­ron­ments than that in­stinct sug­gests. Silicon proces­sors gen­er­ate enor­mous in­ter­nal heat — the coolant en­ter­ing a fully liq­uid-cooled chip at 45 de­grees Celsius ex­its at roughly 55 de­grees, hav­ing ab­sorbed that heat load across the chip sur­face. Yet per­for­mance does­n’t de­grade.

The proces­sors con­tinue to op­er­ate at full per­for­mance be­cause liq­uid-cooled cold plates keep de­vice tem­per­a­tures within val­i­dated op­er­at­ing lim­its, even with coolant en­ter­ing the rack at 45 de­grees Celsius.

No Fans, No Cold Aisles — A Fundamentally Different Machine

Walk into a tra­di­tional data cen­ter and no­tice two things: the noise — cool­ing fans con­tribute to to­tal noise lev­els at or above 85 deci­bels, loud enough to re­quire ear pro­tec­tion — and the phys­i­cal chore­og­ra­phy of hot aisles and cold aisles, care­fully man­aged to push cooled air across com­po­nents.

The Rubin ar­chi­tec­ture changes the pic­ture.

Coolant — 75% wa­ter and 25% propy­lene gly­col — flows through cold plates that sit di­rectly on proces­sors, pulling heat out at the source. Running that coolant at up to 45 de­grees Celsius means that in many cli­mates, the fa­cil­ity loop can re­ject heat with­out turn­ing on me­chan­i­cal chillers and noisy fans.

That un­locks some­thing be­yond en­ergy sav­ings: the pos­si­bil­ity of elim­i­nat­ing wa­ter con­sump­tion en­tirely.

In the right ge­og­ra­phy — some­where with re­li­ably cool out­door air — a liq­uid-cooled data cen­ter can re­ject its heat through coolant dis­tri­b­u­tion units that cap­ture heat di­rectly at the source and trans­port it to out­door dry cool­ers, es­sen­tially large ra­di­a­tor coils po­si­tioned out­side the build­ing.

The loop is filled once and runs closed for the life of the fa­cil­ity. And it takes dra­mat­i­cally less space in the AI fac­tory com­pared to tra­di­tional air-cool­ing in­fra­struc­ture.

In the right ge­o­graphic lo­ca­tion, with the right sys­tem de­sign, you don’t need any re­frig­er­a­tion equip­ment,” Whitmore said. You can just put big ra­di­a­tor coils out­side and use the air tem­per­a­ture for all your cool­ing. It’s in­cred­i­bly ef­fi­cient.”

The ge­og­ra­phy caveat mat­ters. A data cen­ter in the Scottish Highlands and one in Phoenix, Arizona, face very dif­fer­ent re­al­i­ties. But even in warmer cli­mates, the shift to­ward 45-degrees-Celsius coolant moves op­er­a­tors sig­nif­i­cantly closer to that chiller-less ideal — where chillers may turn on just a few days a year when the out­side air tem­per­a­ture de­mands it.

Another key ben­e­fit of this new model for AI fac­to­ries is the po­ten­tial for waste heat re­cov­ery, where resid­ual heat from AI fac­tory op­er­a­tions can be re­pur­posed to heat com­mer­cial or res­i­den­tial build­ings nearby.

The Engineering Problem Nobody Had Solved

Previous liq­uid-cooled servers were hy­brid: GPUs and CPUs got cold plates, but the rest of the sys­tem stayed air-cooled, with finned heat sinks de­signed to shed heat into mov­ing air. In a fully liq­uid-cooled server, the cool­ing for these com­po­nents needed to be com­pletely re­designed to use liq­uid.

NVIDIAs ther­mal en­gi­neer­ing team re­worked how those com­po­nents han­dle heat, de­sign­ing cool­ing loops that sim­plify how liq­uid is routed to mul­ti­ple high-power chips on the board us­ing a sin­gle in­let and out­let, re­sult­ing in a cleaner tray-level cool­ing ar­chi­tec­ture.

One vis­i­ble out­come: Rubin servers have clean, sealed front pan­els where air-cooled servers have per­fo­rated bezels. Another: fully liq­uid cooled servers en­able higher rack den­sity than air-cooled servers, so a sys­tem that pre­vi­ously oc­cu­pied six rack units now fits in two — more com­pute, less space, less noise.

AI work­loads are not get­ting lighter. The com­pute de­mand dri­ving data cen­ter con­struc­tion is grow­ing faster than al­most any other cat­e­gory of in­fra­struc­ture in­vest­ment.

Without ef­fi­ciency im­prove­ments in how that com­pute is cooled, the en­ergy cost of run­ning AI at scale would grow in lock­step with the hard­ware. Liquid cool­ing at up to 45 de­grees Celsius — hot­ter than a hot tub, cooler for the planet — is one of the most im­por­tant tools the in­dus­try has to close that gap.

Learn more about liq­uid cool­ing, the NVIDIA DSX plat­form for AI fac­to­ries and NVIDIAs ap­proach to en­ergy-ef­fi­cient AI in­fra­struc­ture.

Blogging Can Just Be Stating The Obvious

blog.jim-nielsen.com

John Gruber writes about those an­noy­ing pop­ups every web­site seems to have now and while he does a great job tear­ing into these ubiq­ui­tous, user-hos­tile pat­terns, one of the things that stood out to me about his piece was this meta com­men­tary on blog­ging. Here’s John:

If you visit a web­site you should … see the web­site. See its con­tent. Be able to read the ar­ti­cle whose page you are at­tempt­ing to visit. Showing a subscribe to our newslet­ter” or accept our fuck­ing cook­ies” dick­over to some­one try­ing to read an ar­ti­cle on the web makes no more sense than send­ing out an email newslet­ter that only con­tains a link to read the newslet­ter on a web­page. A web­page should show the web­page. An email should show the email. I should not have to ex­plain this.

If you visit a web­site you should … see the web­site. See its con­tent. Be able to read the ar­ti­cle whose page you are at­tempt­ing to visit. Showing a subscribe to our newslet­ter” or accept our fuck­ing cook­ies” dick­over to some­one try­ing to read an ar­ti­cle on the web makes no more sense than send­ing out an email newslet­ter that only con­tains a link to read the newslet­ter on a web­page. A web­page should show the web­page. An email should show the email. I should not have to ex­plain this.

It’s funny how of­ten blog­ging feels like be­ing the lit­tle child in the story of The Emperor’s New Clothes. You’re just stat­ing what seems ob­vi­ous to you.

I of­ten look at my own posts and think, There’s noth­ing novel, or im­por­tant, or deep in here at all — is this even worth say­ing?”

A post’s point can seem so glar­ingly ob­vi­ous to me (and thus, I pre­sume, oth­ers) it feels like a waste of time to even say it. As John says:

A web­page should show the web­page. An email should show the email. I should not have to ex­plain this.

A web­page should show the web­page. An email should show the email. I should not have to ex­plain this.

But then real-world ex­am­ples of an­noy­ance pile up around you and no­body talks about it, so you fi­nally just have to say it in a post and bring re­ceipts.

You feel like some­one gone mad: Is any­one else see­ing the same thing I’m see­ing? And we’re just ok with this?”

Very of­ten, those are the best posts I read from oth­ers.

So it must be that a key in­gre­di­ent to blog­ging is sim­ple: have a will­ing­ness to state some­thing that seems ob­vi­ous to you but no­body else is say­ing it.

Or if some­one else is say­ing it, just link to them and say, Yes!!! This!!!”

GLM-5.2 is the step change for open agents

www.interconnects.ai

Housekeeping: Following my State of the blog” post last week, not­ing a slight in­crease in paid fea­tures, it’s a good time to re­mind folks that I of­fer group sub­scrip­tions with larger dis­counts pro­por­tional to the num­ber of seats. I also re­leased a new pa­per to­day on open RL recipes for ter­mi­nal agents, read more here.

A bit over a week ago, when the AI world was still reel­ing from the shock­ing ex­port re­stric­tion, and ef­fec­tive ban­ning, of Claude Fable 5, Z.ai re­leased their lat­est model, GLM-5.2. This model was rolled out un­usu­ally on a Saturday, June 13th, to GLM Coding Plan mem­bers. This is an un­usual re­lease prac­tice, nor­mally when an AI model is re­leased on a week­end it’s for a weird rea­son (most fa­mously, Llama 4).1 In this case, it seemed like Z.ai was ex­cited to cap­i­tal­ize on the zeit­geist of Anthropic be­ing anti open-sci­ence” with their silent safe­guards on AI re­searchers. For the past year or two, the Chinese open-weight labs have taken every op­por­tu­nity they have for easy mar­ket­ing wins like this.

Share

GLM-5.2, in a com­mon nam­ing con­ven­tion across the in­dus­try, looked po­ten­tially like an in­cre­men­tal up­date fol­low­ing the pop­u­lar GLM-5.1 model. At this point, Moonshot AI, mak­ers of the Kimi mod­els, and Z.ai, mak­ers of the GLM mod­els, have con­sol­i­dated the top of the rep­u­ta­tional mar­ket with the most beloved open-weight mod­els among AI re­searchers. What un­folded is a com­mon les­son in track­ing AI mod­els that of­ten mi­nor ver­sion num­bers can have AI mod­els cross­ing mean­ing­ful user ex­pe­ri­ence thresh­olds. A small change in bench­marks and train­ing can open a wide range of new use-cases.

What has fol­lowed is a slow, groundswell of hype for GLM-5.2. The of­fi­cial, MIT-licensed model weights and re­lease blog dropped three days af­ter the ini­tial roll­out, on June 16th. One could ram­ble many tech­ni­cal de­tails, such as the strong bench­mark scores, the very pop­u­lar RL frame­work that Z.ai uses (SLIME), the rec­om­men­da­tion of al­ways us­ing the model on Max think­ing ef­fort, and so on, but the ini­tial re­lease blogs usu­ally aren’t the thing to fo­cus on. You can wait and read the ecosys­tem re­ac­tion to know if it’s the real deal. Benchmarks are half dead these days, any­ways.

What fol­lowed on the 16th was a slew of com­mu­nity bench­marks show­ing bet­ter-than-ex­pected re­sults for GLM-5.2. Arena’s agent leader­board had it as the only open model mix­ing it up with OpenAI and Anthropic’s lat­est mod­els (notably match­ing Opus 4.8’s no-think­ing ef­fort to GLM-5.2’s max mode). This is one of many evals GLM-5.2 is crush­ing Gemini on, but that’s a topic for an­other time. A bench­mark that has mixed per­cep­tion in the com­mu­nity (particularly among ac­tual de­sign­ers), Design Arena even had GLM-5.2 best­ing Claude Fable it­self — the re­cently banned hype ma­chine!

Pretty much every­one I re­spect among the AI com­men­tariat and re­searcher class has praised the model af­ter us­ing it per­son­ally. Such a fo­cal point of dis­cus­sion among the com­mu­nity has only been so clear with an open model re­lease once be­fore — DeepSeek R1. This is not a com­par­i­son I make lightly, and when I com­pared Kimi K2s re­lease to a DeepSeek Moment,” GLM-5.2 has well ex­ceeded that. What made Kimi K2 im­pres­sive was that big steps in open model per­for­mance could seem­ingly come from any­where in China. The step that GLM-5.2 has taken is more of a one way door for AI progress.

Anthropic’s record rev­enue growth rate on the back of Claude Code is heav­ily dri­ven by be­ing the best model, and the only model that can re­ally do this. GLM-5.2 is the first of many (coming soon) open weight mod­els to of­fer cred­i­ble al­ter­na­tives. The par­al­lel is very clear, to when DeepSeek R1 showed that open-weight labs, with far fewer re­sources, could also repli­cate the chain-of-thought rea­son­ing mod­els that OpenAI cham­pi­oned with o1. As AI sys­tems get more com­plex and far more ex­pen­sive to build, with tools, in­te­grated har­nesses, and scaled model weights, it was not a given that this GLM-5.2 mo­ment would hap­pen at all.

The key point is that GLM-5.2 is the open weight model that feels right in cod­ing har­nesses as a gen­eral agent. It’s the first one. I was per­son­ally over­due in try­ing some of the re­cent peer mod­els, such as Kimi K2.7 or GLM-5.1, but the hype was too much for me to ig­nore. I put it to work help­ing make con­tent for my post-train­ing course with Fireworks’ API in Claude Code (setting this up was very easy). There were some mi­nor knife cuts, such as the Claude Code har­ness / my repo doc­u­men­ta­tion try­ing to send im­ages to the model, which would brick Fireworks API for the ses­sion — forc­ing a man­ual con­text clear. Overall, the model ca­pa­bil­i­ties im­me­di­ately felt right, and I still have some tin­ker­ing to do in which har­ness and in­fer­ence provider to use.

For more hype, you can sam­ple the Z.ai founder telling Elon that open-weight Fable ca­pa­bil­i­ties will be here sooner than Q1 2027,” the CEO of Vercel say­ing Genuinely im­pressed, al­most shocked, at how good GLM-5.2 by @zai_org is at cod­ing. This changes things,” and much more from a mix of peo­ple whose opin­ions I deeply re­spect and oth­ers I’m new to.

So, this is a good model, where does this leave us?

There are many trends at play. To start, let’s ground things in the open-closed ca­pa­bil­i­ties gap. I’ve writ­ten how I ex­pect an explosion in us­age” if open mod­els crossed the Opus 4.5 in Claude Code thresh­old from around the start of 2026. Here we are. With Claude Opus 4.5’s re­lease on November 24th, 2025, the gap in time to GLM-5.2’s re­lease on June 16th, 2026 is 204 days — or about 6.8 months. This puts us square in the 6 – 9 month time gap that many peo­ple claim as the per­for­mance lag be­tween the U.S.’s closed labs and China’s open coun­ter­parts.

Upon writ­ing this, I’m sur­prised. As the U.S. labs have so rapidly ramped com­pute in the last ~year, I’ve ex­pected the gap in per­for­mance to grow in time. A very mean­ing­ful step in this tra­jec­tory will also be Claude Fable 5’s re­lease — which was more re­liant on scale, and there­fore the most ad­vanced GPUs, rel­a­tive to the Claude Opus mod­els. Still, that’s not a sat­is­fac­tory an­swer. Continuing to un­pack the tra­jec­tory here in­volves more nu­ance than I can af­ford to fit in a sign­post­ing ar­ti­cle.

The most im­me­di­ate mean­ing of this is far more se­ri­ous pric­ing pres­sure within the or­ga­ni­za­tions to­ken­maxxing, send­ing Anthropic’s rev­enue to the moon. Some would pre­dict Anthropic does­n’t re­al­ize its fore­casted ARR num­bers, but I don’t think that prices in the true de­mand for these mod­els and the in­evitable growth. This model ex­ist­ing is a huge boon for the open model econ­omy. All the likes of Fireworks, Together, Thinky (via Tinker), Prime Intellect, and who­ever else sells open model in­fer­ence or fine­tun­ing just hit an­other in­flec­tion point.

It’ll take a long time for the ef­fects here to dif­fuse into the broader econ­omy (and use-cases). Workflows are be­com­ing more com­plex, with peo­ple us­ing dif­fer­ent mod­els for plan­ning, pri­mary cod­ing, and sub­agent dis­patch. I ex­pect the hype to con­tinue to grow, and heck, as I’m writ­ing this on a Sunday evening, I could see the me­dia and mar­ket re­ac­tion on the Monday be­ing a thing just like the DeepSeek R1 re­lease. This dif­fu­sion hap­pen­ing while Anthropic’s, and by ex­ten­sion the U.S.’s flag­ship model, is still banned is a se­vere eco­nomic dag­ger. GLM-5.2 is be­ing given time to carve out the eco­nomic un­der­belly of the fron­tier labs when they want to be push­ing for­ward into higher mar­gin, higher rev­enue do­mains en­abled only by the ab­solute fron­tier mod­els.

The eco­nomic con­cern mir­rors a story that has been told many times in AI, so it’s un­clear when it’ll stick.

The con­ver­sa­tion that feels more core to the tra­jec­tory of AI is that of reg­u­la­tion and con­trol of open mod­els. I think it is an eco­nomic good for cheap in­tel­li­gence to dif­fuse widely, and our de­fault po­si­tion should be to cheer for open mod­els, but this mod­el’s re­lease date will have it be per­ma­nently as­so­ci­ated with Claude Fable — and there­fore Claude Mythos — in the men­tal map of AI power struc­tures. We are at a point where Mythos-class model ca­pa­bil­i­ties are deemed not safe for re­lease by the U.S. Government and the Chinese model mak­ers are charg­ing for­ward in ca­pa­bil­i­ties avail­able to all.

These trend lines aren’t nec­es­sar­ily causally linked, as we don’t know the cy­ber per­for­mance of GLM-5.2 ver­sus its pre­de­ces­sors, but the ca­pa­bil­i­ties are def­i­nitely cor­re­lated. Without any­thing chang­ing, this points to a po­ten­tial­ity where the U.S. Government de­cides a cer­tain open-weights Chinese model is not safe for the pub­lic. There are many other po­ten­tial sce­nar­ios here too, but what is clear is that we have a lot of work to do in map­ping them out, prepar­ing our in­fra­struc­ture, and mes­sag­ing to so­ci­ety.

It’ll take a lot more peo­ple than just me to imag­ine and com­mu­ni­cate a world to de­ci­sion mak­ers for how to man­age ever­more ca­pa­ble open mod­els.2 We have years more of AI progress to come, with Nvidia’s next gen­er­a­tion chips al­ready in pro­duc­tion and a con­stant stream of al­go­rith­mic ad­vance­ments. It feels like a nar­row path for open model ad­vo­cates to take, but we need to fig­ure out how to make them vi­able so the mas­sive leaps in per­for­mance don’t only go to closed mod­els.

I to­tally see why it is scary to imag­ine an openly ac­ces­si­ble Mythos class model, but if open mod­els get banned now and only closed mod­els get 10 or 100X bet­ter in 2 years in the hands of one or two com­pa­nies, I think we will have big­ger prob­lems on our hands.

1

Something that has al­ways stood out to me is how fast the Chinese labs re­lease their mod­els. I’ve heard from mul­ti­ple labs that the time to up­load the weights pub­licly to HuggingFace af­ter the model fin­ishes train­ing could be mea­sured in hours rather than days. This has at least slowed a bit, now that they need to pre­pare to serve the model to a wider in­fer­ence mar­ket.

2

Something that will need to be dis­cussed more is how even closed mod­els, e.g. Mythos pre­view, are reg­u­larly in the hands of unau­tho­rized users or jail­bro­ken. So, the open vs. closed di­chotomy on ac­cess is­n’t to­tally black and white.

The Xteink X4 E-Ink Reader

blog.omgmog.net

I’ve had the Xteink X4 for a cou­ple of months now, a £40 e-ink reader small enough to stick to the back of a phone. I’d seen a few posts about it (Khairul Selamat, Neil Brown, joelchrono, and mod­ded­bear among them), so I got cu­ri­ous and or­dered one.

First im­pres­sions

Out of the box, the X4 is light­weight, prop­erly light, the kind where I’d for­get it’s in my pocket. The dis­play is crisp for its size, and the de­vice ships with a branded 16GB mi­croSD card (cute touch) plus a card reader and ad­he­sive metal ring for MagSafe mount­ing.

The mi­croSD slot is awk­ward (I needed a sty­lus to push the card out, felt like I was do­ing some­thing wrong). The stock OS de­faults to Chinese, but the UI is nav­i­ga­ble enough that I had it switched to English within a minute of blind fum­bling.

Marketing pushes the stick it to your phone” an­gle hard. I’ve got a MagSafe-compatible case on my Pixel 7a, but that spot’s al­ready taken by my card holder. Even if it was­n’t, the X4 mounts in­verted for some rea­son (polarity is­sue with third-party MagSafe cases, pre­sum­ably).

The real porta­bil­ity win is the size and weight, not the phone mount. This thing ac­tu­ally fits in a trouser pocket and dis­ap­pears.

Software

Stock firmware is us­able but min­i­mal (three fonts, ba­sic line-height and para­graph spac­ing con­trols, EPUB sup­port). Page turns are in­stant with no no­tice­able ghost­ing. The hard­ware’s good, the firmware just does­n’t do it jus­tice.

Rather than live with stock, there’s a small crop of cus­tom firmwares, all forked from CrossPoint. CrossPoint it­self has come a long way since I first started pok­ing at this (it’s now on v1.3.0 with OTA up­dates, 24 UI lan­guages, and SD-card font in­stal­la­tion with­out need­ing to re­flash). Flashing takes about two min­utes and one ter­mi­nal com­mand, no but­ton com­bos or de­bug modes needed. If the ter­mi­nal’s off-putting, CrossPoint has a web-based flasher that does it in the browser in­stead.

Units bought from AliExpress or else­where out­side China ship with USB flash­ing dis­abled. CrossPoint’s docs point to the SD card flash­ing method in­stead, which works fine for get­ting cus­tom firmware on, it just does­n’t un­lock USB flash­ing it­self.

I tried Papyrix first, which keeps things min­i­mal and fo­cused on the read­ing ex­pe­ri­ence. The ty­pog­ra­phy en­gine punches well above what I’d ex­pect from a £40 de­vice, a Knuth-Plass line break­ing al­go­rithm for proper TeX-quality jus­ti­fied text, soft hy­phen sup­port, and lan­guage-aware hy­phen­ation across six lan­guages. It’ll even han­dle Vietnamese, Thai, Greek, and Arabic scripts, with right-to-left lay­out and proper con­tex­tual shap­ing for Arabic. Beyond EPUB it reads FictionBook, HTML, Markdown, and plain text. Custom themes and fonts load straight from the SD card, but­ton remap­ping is sup­ported, and there’s Calibre Wireless Device sup­port for send­ing books over WiFi with­out touch­ing a ca­ble.

It’s good, but I ended up set­tling on Inx, which goes wider and feels the most pol­ished of the lot. A tab bar across the top (Recent, Library, Settings, Sync, Statistics) gives it a proper app feel, with per-book set­tings rather than global-only, Wi-Fi with Calibre wire­less sync, OPDS cat­a­logue brows­ing, and more for­mat sup­port be­yond EPUB. The read­ing sta­tis­tics page is the bit I keep go­ing back to (reading time, pages, chap­ters, and av­er­age time per page, all bro­ken down per book). It syncs with KOReader for any­one al­ready us­ing that for an­no­ta­tions, plus high­light­ing and foot­note nav­i­ga­tion baked into the in-book menu. It’s even got a lit­tle corgi as its mas­cot on the sleep and wake screens, and it’s still be­ing ac­tively worked on.

The firmware rab­bit hole goes deeper. MicroSlate turns the X4 into a ded­i­cated writ­ing de­vice when paired with a Bluetooth key­board (scrolling, type­writer, and pag­i­nated writ­ing modes), with auto-save and WiFi sync for get­ting notes off the de­vice. TernOS is a dif­fer­ent beast en­tirely, a PalmOS-inspired OS that runs na­tive Rust apps and clas­sic Palm apps. PlusPoint is a CrossPoint fork with ex­per­i­men­tal JavaScript app sup­port.

Then there’s the weird stuff (a Tamagotchi app that uses MQTT to switch be­tween moods and dis­play mes­sages, in­tended as a lit­tle com­pan­ion dis­play for an AI as­sis­tant, and a browser-based wall­pa­per maker that con­verts any im­age into the 480x800 BMP for­mat the de­vice needs for cus­tom sleep screens, all processed lo­cally).

I 3D printed this flip cover case for it, about an hour on my FlashForge AD5X. It uses one of the ad­he­sive MagSafe rings that ships with the X4, so the de­vice sits in the case se­curely rather than just rest­ing against a mag­net.

Compared to other full-size e-ink read­ers

I re­viewed the Bigme B6 in January, a £125 colour e-ink tablet run­ning Android 14. My con­clu­sion there was that colour e-ink is­n’t quite ready, muted colours, halved res­o­lu­tion, ghost­ing when switch­ing be­tween colour and black-and-white. Android brought flex­i­bil­ity (any read­ing app I wanted) but also the bag­gage of a phone OS bolted onto a slow screen, stut­ter­ing an­i­ma­tions, traces left be­hind by menus, and a load of bloat to strip out be­fore it felt us­able.

The X4 side­steps all of that by be­ing smaller and dumber in the best way. No sty­lus lag, no wait­ing on slow colour screens, no Android com­plex­ity to de­bloat. Devices like the Kobo Clara Colour and InkBook Solaris Color have since brought Kaleido 3 colour e-ink to main­stream read­ers with­out need­ing to tin­ker with Android, which is prob­a­bly the bet­ter way in for colour. But none of them get close to the X4 on porta­bil­ity. The Clara and Paperwhite need a big pocket, the X4 dis­ap­pears into any pocket. For pure read­ing, that size dif­fer­ence mat­ters more than colour ever will.

For £40, the X4 with Crosspoint or Inx is a bet­ter read­ing de­vice than it has any right to be. Paging’s in­stant, the dis­play’s crisp, and it’s small enough to ac­tu­ally carry every­where. Stock firmware is bare min­i­mum, but cus­tom firmware makes it ac­tu­ally good.

GitHub - nubjs/nub: The fast all-in-one Node.js toolkit

github.com

A fast all-in-one toolkit that aug­ments Node.js in­stead of re­plac­ing it

A Bun-like DX on top of stock node, writ­ten in Rust.

nub in­dex.ts # TypeScript-first Node.js run­time nub run dev # 24× faster pnpm run nubx prisma gen­er­ate # 19× faster npx nub in­stall # 2.5× faster pnpm in­stall nub watch src/​server.ts # na­tive watch mode nub pm shim # built-in Corepack-style shims nub node in­stall 26 # Node ver­sion man­ager nub up­grade # self up­date

One tool to run your files and scripts, in­stall de­pen­den­cies, and man­age Node it­self. No new run­time, no ven­dor-spe­cific API sur­face, no lock-in.

Install

# ma­cOS / Linux curl -fsSL https://​nubjs.com/​in­stall.sh | bash

# Windows (PowerShell) irm https://​nubjs.com/​in­stall.ps1 | iex

# Homebrew (macOS / Linux) brew in­stall nubjs/​tap/​nub

# Or via npm (pnpm / yarn global add work too) npm in­stall -g –ignore-scripts=false @nubjs/nub

For GitHub Actions, use nubjs/​setup-nub in place of ac­tions/​setup-node. It’s one-to-one com­pat­i­ble.

- - uses: ac­tions/​setup-node@v4 + - uses: nubjs/​setup-nub@v0

File run­ner — nub <file>

Run a file. Supports .js, .ts, .mjs, .cjs, .mts, .cts, .jsx, and .tsx. Flag-for-flag and var-for-var drop-in com­pat­i­ble with node (mostly via passthrough).

nub in­dex.ts # TypeScript, JSX, no build step nub –watch app.ts # same path, restart-on-change

It aug­ments stock Node with some of Bun/Deno’s best fea­tures:

🦆 Full TypeScript sup­port, in­clud­ing enum, name­space

🧭 TypeScript-friendly res­o­lu­tion: ex­ten­sion­less im­ports, tscon­fig.json#paths

⚛️ JSX / TSX

🎂 Decorators and emit­Dec­o­ra­torMeta­data

🆕 Modern syn­tax like us­ing (downleveled in tran­spiler when needed)

🔐 Automatic .env* load­ing — Next.js/Vite par­ity

🗂️ Built-in load­ers for com­mon data for­mats — .yaml, .toml, .jsonc, .json5, .txt

🌐 Polyfills for Temporal, Worker, URLPattern (when needed)

🔥 Unflags ex­per­i­men­tal fea­tures like node:sqlite, vm.Mod­ule, lo­cal­Stor­age, WebSocket, EventSource

⚡ 2.9× faster startup than tsx

How it works — Nub takes ad­van­tage of Node ex­ten­sion sur­faces that mostly did­n’t ex­ist when Deno and Bun were built:

–import/–require pre­loads mod­ule.reg­is­ter­Hooks() for tran­spi­la­tion and res­o­lu­tion N-API na­tive ad­dons: Nub em­beds oxc for pre-tran­spi­la­tion

How it works — Nub takes ad­van­tage of Node ex­ten­sion sur­faces that mostly did­n’t ex­ist when Deno and Bun were built:

–import/–require pre­loads

mod­ule.reg­is­ter­Hooks() for tran­spi­la­tion and res­o­lu­tion

N-API na­tive ad­dons: Nub em­beds oxc for pre-tran­spi­la­tion

Node pro­vi­sion­ing

When you run a file with nub, it in­fers the ver­sion of Node your pro­ject ex­pects and auto-in­stalls it if needed. It re­spects (in prece­dence or­der):

NODE_EXECUTABLE (override)

pack­age.json#de­vEngines

.node-version

.nvmrc

pack­age.json#en­gines

This re­solved ver­sion of Node is in­stalled and your file is ex­e­cuted with it (with Nub’s aug­men­ta­tions).

$ echo 26 > .node-version $ nub hello.ts Using Node.js 26.3.0 (resolved from .node-version) Installed in 9.8s Hello world!

Modern APIs

Modern API work out of the box un­der Nub. Node.js ex­per­i­men­tal APIs are un­flagged, oth­ers are auto-poly­filled (e.g. Temporal on Node 25 and ear­lier), and oth­ers are down­leveled in the tran­spiler (using).

Watch mode

Restart-on-change dri­ven by the re­solved de­pen­dency graph plus the off-graph files that still in­val­i­date a run — no glob list to main­tain:

nub watch src/​server.ts nub –watch src/​server.ts # same path

👀 Tracks the re­solved de­pen­dency graph au­to­mat­i­cally

🧷 Also watches the off-graph in­val­ida­tors — .env*, the tscon­fig.json ex­tends chain, pack­age.json

⚙️ Runs on Node’s own –watch en­gine, pre­serv­ing out­put by de­fault

View the full run­time docs 👉.

Script run­ner — nub run

A drop-in for npm run and pnpm run. The run­ner is a Rust bi­nary with no JavaScript startup of its own, so it dis­patches a warm script roughly 24× faster than pnpm run:

nub run build nub run -r –filter @org/*” test # sup­ports –filter

It’s fast com­pared to ex­ist­ing JavaScript-based script run­ners.

script dis­patch · warm · 50 runs · ma­cOS — view bench­mark

script dis­patch · warm · 50 runs · ma­cOS — view bench­mark

🚀 Feels in­stan­ta­neous — 14ms vs a de­tectable 300ms+ lag for npm/​pnpm

🔁 Full life­cy­cle sup­port — pre/​post hooks and the com­plete npm_* en­vi­ron­ment

🧰 Local node_­mod­ules/.​bin on PATH, with args for­warded with­out the — sep­a­ra­tor

🗃️ The full pnpm work­space sur­face — -r, –filter, –parallel, –workspace-concurrency, –resume-from, –stream

🎯 pnpm’s –filter gram­mar ver­ba­tim — graph (…@org/web) and changed-since ([main]) se­lec­tors

View the full script run­ner docs 👉.

Package run­ner — nubx / nub dlx

A drop-in for npx and pnpm dlx. Local-first with a down­load-and-ex­e­cute reg­istry fall­back (same as npx). Eliminating the dou­ble-Node.js-spawn per­for­mance penalty paid by JavaScript-based tools like npx and pnpm.

nubx es­lint . –fix nubx -y cowsay@1.5.0 hi” # fetched from the reg­istry (auto-approved via -y)

es­build –version · ma­cOS — view bench­mark

es­build –version · ma­cOS — view bench­mark

⚡ Runs a lo­cal bin ~19× faster than npx, with no Node in the wrap­per

🔎 Resolves node_­mod­ules/.​bin re­gard­less of which pack­age man­ager in­stalled it

🌐 Registry fall­back for unin­stalled bins — fetched, run, then dis­carded

🧩 Full pnpm exec / pnpm dlx flag par­ity, shell mode in­cluded

🪜 Walks the res­o­lu­tion chain — mem­ber .bin, then work­space root, then an­ces­tors

View the full pack­age run­ner docs 👉.

Package man­ager — nub in­stall

Nub is a pack­age man­ager pow­ered by the Aube en­gine. The CLI is flag-for-flag com­pat­i­ble with pnpm for mus­cle mem­ory, but

nub in­stall nub ci nub add -E -D –save-catalog re­act nub re­move lo­dash nub up­date nub dedupe

It’s fast — avoids the per-com­mand Node.js boot­strap lag in­curred by JS-based pack­age man­agers.

warm frozen in­stall · cre­ate-t3-app · 222 deps · ma­cOS — view bench­mark

warm frozen in­stall · cre­ate-t3-app · 222 deps · ma­cOS — view bench­mark

Security

🛡️ Blocks postin­stall by de­fault

🦠 Checks osv.dev for known-ma­li­cious pack­age ver­sions dur­ing res­o­lu­tion by de­fault

🔻 Refuses prove­nance down­grades by de­fault

⏳ 24-hour min­i­mum­Re­leaseAge by de­fault

Compatibility

When you run nub in­stall in­side a pro­ject, it de­tects the in­cum­bent pack­age man­ager (based on your pack­age.json#pack­age­M­an­ager or any de­tected lock­files). It then runs in com­pat-mode, re­spect­ing the con­fig files and en­vi­ron­ment vari­ables for that pack­age man­ager.

Under each in­cum­bent, Nub reads that tool’s branded con­fig and no oth­er’s; the neu­tral .npmrc cas­cade and npm_­con­fig_* are read un­der every one.

View the full pack­age man­ager docs 👉.

Package meta-man­ager — nub pm

Corepack’s job, in na­tive Rust: pro­vi­sion and run the ex­act pnpm / npm / yarn your pro­ject pins:

nub pm shim # reg­is­ters global shims (Corepack-style)

Like corepack en­able, this reg­is­ters global shims for npm, yarn, and pnpm. When you run a com­mand us­ing one of these shim aliases any­where on your file sys­tem, the shim will:

Detect the ver­sion used in your pro­ject

Install that ver­sion if needed

Run the com­mand us­ing the proper ver­sion

Nub pro­vides this func­tion­al­ity as a con­ve­nience for users who pre­fer to keep their cur­rent pack­age man­ager. Corepack it­self was un­bun­dled from Node it­self in v25.

View the full nub pm docs 👉.

Node ver­sion man­ager — nub node

Though Node.js ver­sions will gen­er­ally be auto-in­stalled and cached as needed, you can man­age ver­sions man­u­ally as well.

$ nub node -h nub node — man­age Node ver­sions

Usage: nub node <command>

nytimes.com

www.nytimes.com

Please en­able JS and dis­able any ad blocker

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

Visit pancik.com for more.