10 interesting stories served every morning and every evening.

We’re making Bunny DNS free

bunny.net

At bunny.net, our mis­sion has al­ways been am­bi­tious but fo­cused: help make the in­ter­net hop faster.

To do that, we’ve built a mas­sive global net­work span­ning 119 lo­ca­tions and count­ing. Today, this net­work pow­ers over 1.5 mil­lion web­sites and con­sis­tently de­liv­ers some of the fastest con­tent de­liv­ery around the globe. But while de­ploy­ing thou­sands of servers glob­ally is an im­pres­sive feat on its own, the hard­ware it­self does not ex­plain how bunny.net is able to de­liver such an im­pres­sive level of per­for­mance.

The real se­cret hides un­der the hood, em­bed­ded in the rout­ing en­gine that di­rects every re­quest, every user, and sends traf­fic ex­actly where it needs to go. That en­gine is Bunny DNS.

From in­ter­nal en­gine to 200 bil­lion cus­tomer queries per month

Originally, Bunny DNS was built with one sim­ple goal: to build the most ad­vanced rout­ing en­gine pos­si­ble, ca­pa­ble of an­a­lyz­ing every DNS query and di­rect­ing it to the op­ti­mal des­ti­na­tion for serv­ing your con­tent. Even to this day, it’s what makes Bunny CDN achieve it’s ex­cep­tional per­for­mance.

Four years ago, we took every­thing we had learned from de­sign­ing and run­ning this sys­tem and turned it into a prod­uct our users could use them­selves. With Bunny DNS, we’ve up­graded DNS from be­ing a ba­sic record lookup table into a glob­ally dis­trib­uted, smart rout­ing en­gine. Instead of just re­turn­ing sta­tic records, it al­lows de­vel­op­ers to use la­tency data, health checks, and even JavaScript to dy­nam­i­cally de­ter­mine ex­actly where re­quests should go.

We ap­plied our tra­di­tional mantra. We made it af­ford­able, scal­able, and added a gen­er­ous free tier. The re­sponse was in­cred­i­ble, and to­day, Bunny DNS pow­ers over 300,000 do­mains and han­dles nearly 200 bil­lion queries every sin­gle month.

But as we looked closer at our mis­sion, we re­al­ized some­thing needed to change.

Dealing with in­fra­struc­ture costs is al­ready no­to­ri­ously com­plex. You should­n’t have to stress about pric­ing tiers or whether a sud­den spike of a mil­lion queries is go­ing to re­sult in an un­pre­dictable bill.

If we truly be­lieve in our mis­sion to help make the in­ter­net hop faster, then the fun­da­men­tal sys­tem that sits be­tween your users and your ser­vices should­n’t be a pre­mium add-on. It should be ac­ces­si­ble to every­one.

So, we’ve elim­i­nated DNS query fees en­tirely.

Bunny DNS no longer charges for DNS queries and in­cludes free DNS host­ing for up to 500 do­mains per ac­count. There are no query lim­its, no per-re­quest billing, and no crit­i­cal fea­tures hid­den be­hind en­ter­prise plans. (Yes, that in­cludes smart records and health mon­i­tor­ing too.)

As with all bunny.net ser­vices, ac­counts us­ing the plat­form are sub­ject to our stan­dard $1/month min­i­mum spend, but DNS it­self no longer in­curs any us­age-based charges.

The en­try point for every­thing else

Making Bunny DNS free does­n’t mean we’re los­ing in­ter­est in it. Quite the op­po­site. More than ever, we view DNS as the core prod­uct that glues our en­tire plat­form to­gether. It’s the start­ing line for every­thing else your ap­pli­ca­tion does.

Getting that start­ing line set up is now eas­ier than ever. If you’re mi­grat­ing from some­where else, our new au­to­matic zone scan­ning checks your do­main’s most com­mon record names and types, re­con­struct­ing your zone so you only have to make a few tweaks in­stead of start­ing from scratch. (You can also just up­load a BIND file if you pre­fer.)

Once your records are in place, the real magic hap­pens. With 1-Click Acceleration, you can en­able the CDN di­rectly from your DNS records. We’ll spin up a Pull Zone be­hind the scenes and in­stantly start rout­ing re­quests through our edge net­work. Once traf­fic is flow­ing, 1-Click Security lets you en­able Bunny Shield in­stantly to fil­ter traf­fic at the edge, block­ing com­mon ex­ploits and ab­sorb­ing DDoS at­tacks be­fore they ever touch your ori­gin server.

Performance, se­cu­rity, and rout­ing are now uni­fied in one place, rather than stitched to­gether af­ter the fact. Our goal is to keep evolv­ing this with even more ad­vanced record types.

Beyond mak­ing it free, we’re also mak­ing it bet­ter

Saving on costs is great, but ul­ti­mately, what dri­ves us at bunny.net is build­ing in­cred­i­ble prod­ucts. While all of that was hap­pen­ing, we’ve also been mak­ing steady changes to DNS it­self. There haven’t been many ma­jor re­leases, but rather a lot of smaller im­prove­ments over time.

IPv6 is no longer op­tional

More and more net­works de­fault to IPv6 now, es­pe­cially on mo­bile. So we made sure every­thing on our side just works in a dual-stack world.

If you’re us­ing Bunny DNS, your name­server records al­ready re­solve over both IPv4 and IPv6. There’s noth­ing to con­fig­ure, noth­ing to mi­grate. It just works the way it prob­a­bly should have a while ago.

We’ve added DNSSEC, with­out the usual trade-offs

DNSSEC is one of those things peo­ple want in the­ory but hes­i­tate to adopt in prac­tice.

Part of that is the com­plex­ity, but part of it is also that tra­di­tional DNSSEC can ex­pose in­for­ma­tion about your zone that you might not want to share.

We im­ple­mented DNSSEC with NSEC Black Lies to get around that. You still get the val­i­da­tion and pro­tec­tion against tam­per­ing, but with­out mak­ing it easy for some­one to walk your en­tire do­main struc­ture.

It’s one of those de­tails most peo­ple won’t no­tice di­rectly, but it does change how com­fort­able you can be with turn­ing DNSSEC on.

We’ve mod­ern­ized record types

DNS has moved far be­yond sim­ple ad­dress records. Modern ap­pli­ca­tions in­creas­ingly rely on DNS not just to point a name at an IP ad­dress, but to de­scribe how clients should con­nect, how cer­tifi­cates should be val­i­dated, and how se­cu­rity set­tings should be man­aged over time.

That’s why we’ve been ex­pand­ing Bunny DNS with sup­port for more ad­vanced record types.

We’ve added sup­port for HTTPS and SVCB records, which let you hint how clients should con­nect to your ser­vices. TLSA records are there if you’re us­ing DANE and want tighter con­trol over cer­tifi­cate val­i­da­tion. And CDS and CDNSKEY help au­to­mate DNSSEC key man­age­ment so you’re not ro­tat­ing things by hand.

None of this is par­tic­u­larly flashy, but it’s the kind of stuff you end up need­ing once you move past a ba­sic setup.

Helping you build faster

We take our mis­sion se­ri­ously, and the help” part of it is per­haps the most im­por­tant. By drop­ping us­age charges for DNS and in­te­grat­ing it deeply with the rest of our stack, start­ing with CDN and Shield, we want to help you build faster, safer, and more re­silient ap­pli­ca­tions with­out wor­ry­ing about ar­bi­trary lim­its.

In a world where every­one sim­ply wants to ship as many fea­tures as fast as pos­si­ble, we’re fo­cused on some­thing else: mak­ing Bunny DNS in­cred­i­ble to use and seam­lessly in­te­grated into every­thing else we do, so you can build faster, sleep eas­ier, and ul­ti­mately cre­ate build bet­ter user ex­pe­ri­ences for every­one.

If you haven’t tried Bunny DNS in a while, now’s a good time to take an­other look. You can add your zones, point your do­mains, and leave it at that, or start lay­er­ing on CDN and Shield when you need them.

It’s free now, so you can log in or sign up and start us­ing it straight away.

OpenAI unveils its first custom chip, built by Broadcom

techcrunch.com

On Wednesday, OpenAI un­veiled its first cus­tom-built in­fer­ence proces­sor, de­signed and man­u­fac­tured in col­lab­o­ra­tion with Broadcom. Named Jalapeño, the new proces­sor was de­signed specif­i­cally for the unique needs of OpenAI’s in­fer­ence sys­tems. OpenAI’s own AI mod­els as­sisted in the de­vel­op­ment of the chip, the com­pany said.

While the chip is still be­ing tested, OpenAI says early re­sults show sig­nif­i­cantly bet­ter per­for­mance-per-watt than cur­rent state-of-the-art al­ter­na­tives.

The part­ner­ship was of­fi­cially an­nounced in October, but OpenAI’s chip plans have long been ru­mored as a way to re­duce the com­pa­ny’s de­pen­dence on Nvidia’s GPUs. Google and Amazon have both built cus­tom chips to serve a sim­i­lar pur­pose, of­ten called AI ac­cel­er­a­tors” — sil­i­con de­signed specif­i­cally to speed up ma­chine learn­ing work­loads.

OpenAI pres­i­dent Greg Brockman ex­plained the com­pa­ny’s ap­proach to chip de­vel­op­ment on its in-house pod­cast, shortly af­ter the Broadcom part­ner­ship was an­nounced.

We have a deep un­der­stand­ing of the work­load,” Brockman said in the episode. We’ve re­ally been look­ing for spe­cific work­loads that are un­der­served, [and ask­ing] how can we build some­thing that will be able to ac­cel­er­ate what’s pos­si­ble?”

Jalapeño is specif­i­cally de­signed for in­fer­ence, the process of run­ning pre-built AI mod­els in re­sponse to user com­mands. In the an­nounce­ment, OpenAI em­pha­sized the chip’s low op­er­at­ing cost when run­ning real-time cod­ing mod­els. It’s likely that more per­for­mance-in­ten­sive tasks like pre-train­ing will still rely on Nvidia hard­ware, but even small re­duc­tions in in­fer­ence costs could do a lot to im­prove the com­pa­ny’s bot­tom line.

Optimizing that in­fer­ence sys­tem may prove to be a cru­cial fac­tor in the eco­nom­ics of AI go­ing for­ward — and it’s likely to take place at every level of the stack. OpenAI is al­ready build­ing agen­tic prod­ucts like Codex and the mod­els that power them, as well as data cen­ters to run those mod­els. Moving into pur­pose-built chips lets the com­pany go even fur­ther in that process, as the com­pany ex­plained in its an­nounce­ment.

OpenAI is not only de­vel­op­ing fron­tier mod­els or build­ing prod­ucts on top of them; it is de­sign­ing the in­fra­struc­ture un­der­neath them: chip ar­chi­tec­ture, ker­nels, mem­ory sys­tems, net­work­ing, sched­ul­ing, de­ploy­ment sys­tems, and prod­uct ex­pe­ri­ence,” the com­pany wrote. Because OpenAI op­er­ates across the stack, each layer can be op­ti­mized around the same goal: mak­ing its mod­els faster, more re­li­able, and more af­ford­able for users.”

When you pur­chase through links in our ar­ti­cles, we may earn a small com­mis­sion. This does­n’t af­fect our ed­i­to­r­ial in­de­pen­dence.

Russell Brandom has been cov­er­ing the tech in­dus­try since 2012, with a fo­cus on plat­form pol­icy and emerg­ing tech­nolo­gies. He pre­vi­ously worked at The Verge and Rest of World, and has writ­ten for Wired, The Awl and MITs Technology Review. He can be reached at rus­sell.bran­dom@techcrunch.com or on Signal at 412 – 401-5489.

View Bio

Founding a Company in Germany: €9,600, 152 Days, and I Still Can't Send an Invoice

paolino.me

I started found­ing my sec­ond com­pany in Germany in late January. It is now late June.

In that time, the state, two courts, a no­tary, a law firm, a tax firm, and soft­ware ven­dors have all found a way to bill me. Every sin­gle one of them, on time.

I have spent more than 9,600 eu­ros to start a com­pany: a lit­tle over 7,600 in fees and bills, plus 2,000 in share cap­i­tal frozen in an ac­count I am not al­lowed to touch. And af­ter five months, here is what I have to show for it:

I have not been able to send a sin­gle in­voice of my own.

Not one.

The work is hap­pen­ing. The clients are real. The one thing the state ex­ists to let me do, bill them cleanly, is the one thing I still can’t.

The time­line

23 JanFirst call with a law firm to set up the com­pany. The clock and the hourly billing start.

23 Jan

First call with a law firm to set up the com­pany. The clock and the hourly billing start.

5 FebI sign the man­date and send my ID. Drafting be­gins.

5 Feb

I sign the man­date and send my ID. Drafting be­gins.

18 FebThe struc­ture is set: PlentyLabs UG & Co. KG, tech­ni­cally two com­pa­nies. The name is a saga of its own.

18 Feb

The struc­ture is set: PlentyLabs UG & Co. KG, tech­ni­cally two com­pa­nies. The name is a saga of its own.

about 1 month of draft­ing

6 MarIncorporation doc­u­ments ready.

6 Mar

Incorporation doc­u­ments ready.

17 MarDocuments ap­proved. The hunt for a no­tary be­gins.

17 Mar

Documents ap­proved. The hunt for a no­tary be­gins.

7 days for the ap­point­ment

24 MarNotary in Berlin reads the deeds aloud and cer­ti­fies that I am who I say I am.€1,575.24No­tary fees

24 Mar

Notary in Berlin reads the deeds aloud and cer­ti­fies that I am who I say I am.

€1,575.24Notary fees

25 MarI pay in €2,000.00 of share cap­i­tal. Money I can­not touch; it has to stay there.€2,000.00Locked, not a fee

25 Mar

I pay in €2,000.00 of share cap­i­tal. Money I can­not touch; it has to stay there.

€2,000.00Locked, not a fee

26 MarThe reg­is­ter court de­mands a fee ad­vance.€300.00­Court ad­vance

26 Mar

The reg­is­ter court de­mands a fee ad­vance.

€300.00Court ad­vance

17 days af­ter the no­tary

10 AprFirst com­pany en­tered in the com­mer­cial reg­is­ter.

10 Apr

First com­pany en­tered in the com­mer­cial reg­is­ter.

1 week more

17 AprSecond com­pany en­tered.€260.00Reg­is­ter, 200 + 60

17 Apr

Second com­pany en­tered.

€260.00Register, 200 + 60

20 AprI ask the firm I al­ready pay to han­dle the tax reg­is­tra­tion too.

20 Apr

I ask the firm I al­ready pay to han­dle the tax reg­is­tra­tion too.

2.5 weeks just to start

6 MayBefore the tax work can be­gin, a fresh en­gage­ment is re­quired: pro­posal, power of at­tor­ney, ID checks, per com­pany.€630.00­Tax reg­is­tra­tion quote

6 May

Before the tax work can be­gin, a fresh en­gage­ment is re­quired: pro­posal, power of at­tor­ney, ID checks, per com­pany.

€630.00Tax reg­is­tra­tion quote

28 MayThe in­cor­po­ra­tion le­gal bill lands.€4,462.50Le­gal fees

28 May

The in­cor­po­ra­tion le­gal bill lands.

€4,462.50Legal fees

29 MayTax ques­tion­naires sub­mit­ted. I re­quest stan­dard VAT and a VAT ID, ur­gently.

29 May

Tax ques­tion­naires sub­mit­ted. I re­quest stan­dard VAT and a VAT ID, ur­gently.

3 JunFirst bill from the ac­count­ing soft­ware.€426.97Ac­count­ing soft­ware

3 Jun

First bill from the ac­count­ing soft­ware.

€426.97Accounting soft­ware

9 JunI am told the VAT ID will ar­rive by post. A let­ter.

9 Jun

I am told the VAT ID will ar­rive by post. A let­ter.

24 Jun, to­day­Seven weeks since the tax firm, al­most four weeks since the ques­tion­naires. No VAT ID. No in­voice sent.

24 Jun, to­day

Seven weeks since the tax firm, al­most four weeks since the ques­tion­naires. No VAT ID. No in­voice sent.

Billed by every­one else€7,654.71

Share cap­i­tal I can­not touch€2,000.00

Total gone€9,654.71

Invoices I have man­aged to send0

Everyone in this story could in­voice me. I am the only one who can’t in­voice any­one.

But you can in­voice your German clients”

The clients abroad need a VAT ID for re­verse charge, and that is ex­actly the one I am still wait­ing for. My German clients I could bill to­day. But a do­mes­tic in­voice now would have to be reis­sued the mo­ment the VAT ID ar­rives. Bill now, bill again later, for no rea­son. So those wait too.

This should have been a web form

Fill it in, pay a fee, get your com­pany and your VAT ID in a week. Estonia does it. The UK reg­is­ters a com­pany in a day, on­line, for the price of a din­ner. There is no law of na­ture that says in­cor­po­ra­tion has to take five months and ar­rive by post.

Germany has built a process that chains one de­pen­dency to the next, puts a fee on each, and lets a founder run up le­gal bills, no­tary bills, court fees, tax re­tain­ers, and soft­ware sub­scrip­tions on zero rev­enue, all be­fore grant­ing the one per­mis­sion a com­pany ex­ists for: the right to send an in­voice.

If you ask the gov­ern­ment, the rea­son is trust: the no­tary, the cap­i­tal, the reg­is­ters, the end­less checks, all there to keep bad ac­tors out. This is the same ma­chine that did not catch Wirecard, a two-bil­lion-euro scam. It does, some­how, gen­er­ate enough fric­tion to scare new founders out of the coun­try.

And no, I could not just leave in­stead. My first com­pany, Freshflow, is valu­able enough that walk­ing out of Germany would trig­ger a mas­sive six-fig­ure exit tax, on gains I have not even re­alised, purely for the priv­i­lege of leav­ing. But that is a story for an­other post.

This is a coun­try tax­ing am­bi­tion through the roof be­fore you’ve earned a cent, then won­der­ing why the am­bi­tious leave.

Bonus round: my com­pany name was too generic”

Have you heard of Apple? A piece of fruit, and one of the most valu­able brands ever built. That name would never have been ap­proved in Germany.

Naming a com­pany is hard. It is the word every­one who touches your work will re­mem­ber. After months of turn­ing it over, I found one I could stand be­hind, a name that says what I be­lieve soft­ware should be. (That be­lief will be its own post, soon.) Distinctive, I thought. The kind of name you do not for­get.

Plenty.

No,” said the lawyer. German com­pany names have to be dis­tinc­tive, and Plenty” is a plain English word. Berlin would re­ject it.

Plenty Group?” Two plain words. Plenty Labs?” Labs” is a plain word too. Plenty.is?” A generic word with a do­main on the end is still a generic word, and there was case law to prove it.

The sug­ges­tions were worse: stick my sur­name on the front, Paolino Plenty Labs. Or a pre­fix, PG Plenty Germany. Or make up a fan­tasy word.

Is Plenty. Its Plenty. IsPlenty. ItsPlenty. Rejected, all of it.

Fine. They wanted a mean­ing­less word; I gave them one. Plenty Labs, mi­nus the space: PlentyLabs.

Approved.

A name that started out of spite. Weeks of cor­re­spon­dence, re­solved by re­mov­ing a space. A rule that does not re­ward clar­ity. It re­wards non­sense.

Postscript: why a UG and Co. KG, two com­pa­nies?

Why does a one-per­son busi­ness need two com­pa­nies? Because the sim­ple ver­sion is worse, and be­cause I am build­ing it into some­thing big­ger.

The sim­plest setup is a sole pro­pri­etor­ship. Thirty eu­ros, no cap­i­tal, done in an af­ter­noon. It also makes me per­son­ally li­able for every­thing. A client sues? They are not su­ing a com­pany. They are su­ing me. My sav­ings, my apart­ment, my name.

So I wanted real lim­ited li­a­bil­ity, which means a com­pany. And for one per­son, the clean­est com­pany turns out not to be one com­pany. It is a KG, a part­ner­ship that does the work, with a tiny UG stand­ing in as the part­ner that car­ries the li­a­bil­ity. Strange, but stan­dard. You prob­a­bly have seen GmbH & Co. KG on German com­pa­nies a hun­dred times with­out won­der­ing why. This is why.

It is taxed the sane way, too. The part­ner­ship’s profit is taxed once, as my in­come, since I am the one who ends up with it. A plain UG would tax the com­pany first, then tax me again when I paid my­self.

Why a UG and not the fa­mous GmbH? A GmbH wants 25,000 eu­ros sit­ting in a bank ac­count be­fore it is al­lowed to ex­ist. The UG lets you start with al­most noth­ing, on one con­di­tion: lock away a quar­ter of every year’s profit un­til the re­serve reaches 25,000, then con­vert to a GmbH. The 25,000 does not go away. Germany just takes it in in­stal­ments.

RubyLLM

rubyllm.com

A sin­gle, beau­ti­ful Ruby frame­work for all ma­jor AI providers. Easily build chat­bots, AI agents, RAG ap­pli­ca­tions, con­tent gen­er­a­tors, and every AI work­flow you can think of.

Battle tested at - Fully pri­vate work AI

Build a work­ing Ruby AI chat in two min­utes

Using RubyLLM? Share your story! Takes 5 min­utes.

Why RubyLLM?

Every AI provider ships their own bloated client. Different APIs. Different re­sponse for­mats. Different con­ven­tions. It’s ex­haust­ing.

RubyLLM gives you one beau­ti­ful frame­work for all of them. Same in­ter­face whether you’re us­ing GPT, Claude, or your lo­cal Ollama. Just three de­pen­den­cies: Faraday, Zeitwerk, and Marcel. That’s it.

Show me the code

# Just ask ques­tions chat = RubyLLM.chat chat.ask What’s the best way to learn Ruby?”

# Analyze any file type chat.ask What’s in this im­age?”, with: ruby_conf.jpg” chat.ask What’s hap­pen­ing in this video?”, with: video.mp4″ chat.ask Describe this meet­ing”, with: meeting.wav” chat.ask Summarize this doc­u­ment”, with: contract.pdf” chat.ask Explain this code”, with: app.rb”

# Multiple files at once chat.ask Analyze these files”, with: [“diagram.png”, report.pdf”, notes.txt”]

# Stream re­sponses chat.ask Tell me a story about Ruby” do |chunk| print chunk.con­tent end

# Generate im­ages RubyLLM.paint a sun­set over moun­tains in wa­ter­color style”

# Create em­bed­dings RubyLLM.embed Ruby is el­e­gant and ex­pres­sive”

# Transcribe au­dio to text RubyLLM.transcribe meeting.wav”

# Moderate con­tent for safety RubyLLM.moderate Check if this text is safe”

# Let AI use your code class Weather < RubyLLM::Tool desc Get cur­rent weather”

def ex­e­cute(lat­i­tude:, lon­gi­tude:) url = https://​api.open-me­teo.com/​v1/​fore­cast?lat­i­tude=#{lat­i­tude}&lon­gi­tude=#{lon­gi­tude}&cur­rent=tem­per­a­ture_2m,wind_speed_10m JSON.parse(Faraday.get(url).body) end end

chat.with­_­tool(Weather).ask What’s the weather in Berlin?”

# Define an agent with in­struc­tions + tools class WeatherAssistant < RubyLLM::Agent model gpt-5-nano” in­struc­tions Be con­cise and al­ways use tools for weather.” tools Weather end

WeatherAssistant.new.ask What’s the weather in Berlin?”

# Get struc­tured out­put class ProductSchema < RubyLLM::Schema string :name num­ber :price ar­ray :features do string end end

re­sponse = chat.with­_schema(Prod­uctSchema).ask Analyze this prod­uct”, with: product.txt”

Features

Chat: Conversational AI with RubyLLM.chat

Vision: Analyze im­ages and videos

Audio: Transcribe and un­der­stand speech with RubyLLM.transcribe

Documents: Extract from PDFs, CSVs, JSON, any file type

Image gen­er­a­tion: Create im­ages with RubyLLM.paint

Embeddings: Generate em­bed­dings with RubyLLM.embed

Moderation: Content safety with RubyLLM.moderate

Tools: Let AI call your Ruby meth­ods

Agents: Reusable as­sis­tants with RubyLLM::Agent

Structured out­put: JSON schemas that just work

Streaming: Real-time re­sponses with blocks

Rails: ActiveRecord in­te­gra­tion with act­s_as_chat

Async: Fiber-based con­cur­rency

Model reg­istry: 800+ mod­els with ca­pa­bil­ity de­tec­tion and pric­ing

Extended think­ing: Control, view, and per­sist model de­lib­er­a­tion

Providers: OpenAI, xAI, Anthropic, Gemini, VertexAI, Bedrock, DeepSeek, Mistral, Ollama, OpenRouter, Perplexity, GPUStack, and any OpenAI-compatible API

Installation

Add to your Gemfile:

gem ruby_llm’

Then bun­dle in­stall.

Configure your API keys:

# con­fig/​ini­tial­iz­ers/​ru­by_llm.rb RubyLLM.configure do |config| con­fig.ope­nai_api_key = ENV[‘OPENAI_API_KEY’] end

Rails

# Install Rails Integration bin/​rails gen­er­ate ru­by_llm:in­stall bin/​rails db:mi­grate bin/​rails ru­by_llm:load­_­mod­els # v1.13+

# Add Chat UI (optional) bin/​rails gen­er­ate ru­by_llm:chat_ui

class Chat < ApplicationRecord act­s_as_chat end

chat = Chat.create! model: claude-sonnet-4” chat.ask What’s in this file?”, with: report.pdf”

Visit http://​lo­cal­host:3000/​chats for a ready-to-use chat in­ter­face!

Krea 2 Technical Report

www.krea.ai

Introduction

Over the past few years, im­age gen­er­a­tion has seen re­mark­able progress. Diffusion and flow-match­ing mod­els can gen­er­ate high-res­o­lu­tion im­ages, pro­duce sharp pho­to­re­al­ism and sta­ble struc­ture, ren­der dense text, en­code broad world knowl­edge, and fol­low user prompts in pre­cise de­tail. These im­prove­ments have been dri­ven by sev­eral in­ter­act­ing fac­tors in­clud­ing scal­able trans­form­ers ar­chi­tec­tures, im­proved cap­tion­ing and text en­coders, bet­ter la­tent rep­re­sen­ta­tions, and pipelined post-train­ing tech­niques. Yet as the field has op­ti­mized for re­li­a­bil­ity on these ca­pa­bil­i­ties, many sys­tems have con­verged to­ward a nar­row set of de­fault aes­thet­ics. While ef­fec­tive pro­duc­tion tools, this makes them less ef­fec­tive as en­gines for cre­ative ex­plo­ration, where users of­ten need to search across styles, moods, com­po­si­tions and vi­sual di­rec­tions rather than re­ceive a sin­gle pol­ished de­fault.

To ad­dress these lim­i­ta­tions, we pre­sent Krea 2, a se­ries of foun­da­tion mod­els fo­cused on cre­ative ex­plo­ration. Krea 2’s mod­els are built on the be­lief that im­age gen­er­a­tion should be an ex­ploratory medium: ex­pres­sive enough to span many aes­thet­ics, and con­trol­lable enough for cre­ators to nav­i­gate them.

We built a large-scale data in­fra­struc­ture and dis­trib­uted train­ing frame­work from scratch to cu­rate a com­pre­hen­sive pre­train­ing dataset with broad world knowl­edge and style cov­er­age.

Using this in­fra­struc­ture, we train ex­pres­sive mod­els through a multi-stage pipeline span­ning pre­train­ing, mid­train­ing, su­per­vised fine­tun­ing (SFT), pref­er­ence op­ti­miza­tion, and re­in­force­ment learn­ing (RL), with each stage de­signed to pro­gres­sively re­fine the mod­el’s out­put dis­tri­b­u­tion. We de­velop a sim­ple yet per­for­mant dif­fu­sion trans­former (DiT) ar­chi­tec­ture through thor­ough ab­la­tions. Our model in­cor­po­rates sev­eral com­po­nents that ac­cel­er­ate con­ver­gence , in­clud­ing iREPA, im­proved VAEs, and Qwen3-VL. We also in­te­grate sev­eral ar­chi­tec­tural im­prove­ments, in­clud­ing grouped-query at­ten­tion (GQA), sig­moid-gated at­ten­tion, light­weight timestep mod­u­la­tion, and mul­ti­layer fea­ture ag­gre­ga­tion for text-en­coder fea­tures, which to­gether im­prove train­ing sta­bil­ity and ef­fi­ciency.

A strong base model is only use­ful if users can re­li­ably reach the parts of its dis­tri­b­u­tion they care about. In train­ing, the model learns from rich, care­fully con­structed cap­tions that de­scribe im­ages with dense vi­sual de­tail. In prac­tice, user in­puts are of­ten shorter, more am­bigu­ous, and shaped by many dif­fer­ent habits of ex­pres­sion. Some users de­scribe a scene in nat­ural lan­guage; oth­ers ges­ture to­ward a mood, a style, or a ref­er­ence im­age. This cre­ates a gap be­tween the mod­el’s learned con­di­tion­ing space and the way cre­ative in­tent is ex­pressed at in­fer­ence time.

To re­duce this gap, we build two sys­tems that make Krea 2 more ex­ploratory and steer­able from both text and im­age in­puts: a prompt ex­pander and a style-ref­er­ence sys­tem. The prompt ex­pander maps sim­ple or un­der­spec­i­fied user prompts into richer vi­sual di­rec­tions with­out over­writ­ing the user’s in­tent. It is trained through a two-stage SFT and RL pipeline on top of open-source LLMs, where the ob­jec­tive is not only to im­prove im­age qual­ity, but also to en­cour­age cre­ative vari­a­tion and con­trol­lable ex­plo­ration. Complementing this tex­tual in­ter­face, the style-ref­er­ence sys­tem lets users ex­press vi­sual in­tent through im­ages when words are in­suf­fi­cient. It al­lows users to in­ject the style or mood of one or more ref­er­ence im­ages with min­i­mal con­tent leak­age, while pro­vid­ing fine-grained con­trol over style strength and weighted style mix­ing.

Together, these com­po­nents de­fine Krea 2 as a foun­da­tion model for ex­ploratory gen­er­a­tion. Instead of op­ti­miz­ing only for a sin­gle pol­ished de­fault, Krea 2 is de­signed to ex­pose a broad vi­sual space and give users prac­ti­cal ways to move through it, us­ing both text and im­age-based con­trol. Krea 2 is among the top 10 mod­els on the the Artificial Analysis leader­board for text-to-im­age, and scores 2nd place among mod­els from in­de­pen­dent labs. Krea 2 serves as a com­pre­hen­sive base­line and en­ables a cre­ative gen­er­a­tive ex­pe­ri­ence while main­tain­ing com­pet­i­tive per­for­mance.

Data

Data Curation Principles

Before de­tail­ing our data pipeline, it is im­por­tant to es­tab­lish what con­sti­tutes a good data mix for our pur­pose. A good mix does not con­sist solely of high qual­ity” im­ages. Diversity and broad do­main cov­er­age are es­sen­tial given our ob­jec­tive of build­ing an ex­pres­sive, styl­is­ti­cally di­verse model. We ar­gue that con­ven­tional model-based fil­ter­ing, which uses aes­thetic-score and im­age-qual­ity-as­sess­ment (IQA) mod­els, in­tro­duces im­plicit bi­ases. For ex­am­ple, such meth­ods may clas­sify a blurry im­age as low qual­ity, even though mo­tion blur or soft­ness can be a de­lib­er­ate artis­tic choice.

Furthermore, we ar­gue that as long as a cap­tion ac­cu­rately de­scribes its im­age, even an un­de­sir­able im­age may be help­ful in down­stream use cases: be­cause the model pre­cisely un­der­stands the un­de­sired be­hav­ior, such sam­ples can later be used to steer gen­er­a­tions away from that dis­tri­b­u­tion.

For these rea­sons, we build the pre­train­ing dataset by fil­ter­ing out only:

Duplicated sam­ples and over-rep­re­sented con­cepts.

Samples for which VLMs con­sis­tently fail to cap­ture im­por­tant as­pects of the im­age.

Samples that in­duce un­de­sired bi­ases and ar­ti­facts.

Samples with high vi­sual com­plex­ity that is too dif­fi­cult to model re­li­ably at low res­o­lu­tion.

AI-generated sam­ples

These con­di­tions shape a pre­train­ing dataset with broad cov­er­age while avoid­ing poor text-to-im­age align­ment and ar­ti­facts.

Importantly, we use no AI-generated im­ages in our pre­train­ing mix. Synthetic data and dis­til­la­tion can be an ef­fec­tive short­cut for ac­quir­ing model ca­pa­bil­i­ties. However we find that even a small pro­por­tion of AI-generated im­ages in­tro­duces bi­ases into the mod­el’s out­put dis­tri­b­u­tion, as syn­thetic im­ages tend to be eas­ier to learn, which ef­fec­tively im­poses an up­per bound on model qual­ity. We there­fore de­signed in-house clas­si­fiers to fil­ter such im­ages out.

Captioning

We em­ploy a multi-stage ap­proach to pro­duce cap­tions. First, we run an OCR model on each tar­get im­age to ex­tract any vis­i­ble text. In the sec­ond stage, we pro­vide both the OCR re­sults and any avail­able meta­data (camera set­tings, known en­ti­ties, and so on) to the cap­tion­ing model, which pro­duces an en­riched cap­tion that in­cor­po­rates world knowl­edge along­side the ex­tracted text.

General cap­tion­ing pipeline

Once a con­text-rich, long-form nat­ural-lan­guage cap­tion is ob­tained, we use a cheaper LLM to re­for­mat it into a va­ri­ety of lengths and for­mats, ex­pos­ing the model to a range of prompt styles. Empirically, we find that train­ing on long prompts pro­vides dense su­per­vi­sion, yield­ing faster con­ver­gence and lower train­ing loss. For many down­stream and ap­plied use cases, how­ever, per­for­mance on short and medium-length prompts re­mains im­por­tant. We there­fore train pre­dom­i­nantly on long cap­tions while en­sur­ing the model is ex­posed to short and medium-length prompts through­out train­ing.

Our over­all train­ing pipeline and data stages

Pretraining Data

Pretraining data spans 256px, 512px, and 1024px res­o­lu­tion stages. Progressively scal­ing the res­o­lu­tion forms a cur­ricu­lum-learn­ing strat­egy: we ded­i­cate the ma­jor­ity of FLOPs to the low-res­o­lu­tion stages to build core model ca­pa­bil­i­ties ef­fi­ciently, then equip the model with high-fi­delity gen­er­a­tion ca­pa­bil­i­ties as the train­ing res­o­lu­tion in­creases.

Low-resolution pre­train­ing is the stage at which ba­sic text-im­age align­ment and struc­ture are learned. At this stage the dataset is on the or­der of bil­lions of im­ages, so we rely heav­ily on in­ex­pen­sive CPU-based fil­ters to re­move low-qual­ity im­ages. These range from sim­ple bro­ken-file, res­o­lu­tion, and as­pect-ra­tio fil­ters that re­move un­qual­i­fied im­ages, to Laplacian fil­ters that re­move im­ages with ex­treme tex­tures and noise pat­terns.

As an ex­am­ple, one is­sue we en­coun­tered while pre­train­ing K2 was a ten­dency for the model to gen­er­ate flat-color back­grounds and bor­der ar­ti­facts. To mit­i­gate this, we used RGB en­tropy, white/​black pixel ra­tios, cus­tom heuris­tics, and in-house clas­si­fiers to fil­ter out sam­ples that in­duced this be­hav­ior.

Building an in-house clas­si­fier, one ef­fec­tive strat­egy was to use a large VLM to craft a task-spe­cific sys­tem prompt for the fil­ter­ing task (for ex­am­ple, de­tect­ing a spe­cific pat­tern or ar­ti­fact), pro­duce a pseudo-la­beled dataset, and then train a small DINOv3- or SigLIP-2-based clas­si­fier to run the fil­ter at scale. Any fil­ter­ing model that re­quires GPU com­pute at the low-res­o­lu­tion stage is kept un­der 1B pa­ra­me­ters for ef­fi­ciency.

For dedu­pli­ca­tion at the low-res­o­lu­tion stages, we pri­mar­ily use in­ex­pen­sive hash-based meth­ods, com­bin­ing md5, phash, and col­orhash to re­move du­pli­cate im­ages with min­i­mal com­pute. We find that the de­fault 8x8 phash does not ac­count for color and has a high false-pos­i­tive rate; we there­fore com­bine a 12x12 phash with col­orhash for more ro­bust dedu­pli­ca­tion.

As we scale the train­ing res­o­lu­tion, we in­tro­duce im­age-qual­ity and aes­thetic fil­ters. Importantly, these qual­ity scores are used only to drop im­ages of ex­tremely poor qual­ity, not to over­sam­ple im­ages on the ba­sis of their scores. We ad­di­tion­ally use an im­age-com­plex­ity score and text den­sity (from OCR re­sults) to ex­clude im­ages whose text and con­tent can­not be mean­ing­fully rep­re­sented at low res­o­lu­tion. We ad­just the qual­ity, com­plex­ity, and text-den­sity thresh­olds as train­ing pro­gresses.

Beyond con­ven­tional qual­ity fil­ters, we also train a sparse au­toen­coder (SAE) on SigLIP-2 em­bed­dings com­puted over a sam­ple of our pre­train­ing cor­pus. After train­ing the SAE, we use a VLM to an­no­tate each SAE fea­ture based on its top-k ac­ti­vat­ing sam­ples. These an­no­tated fea­tures form an un­su­per­vised tag­ging sys­tem in which we ex­tract the pre­dom­i­nant SAE fea­tures from each im­age. This tag­ging sys­tem was use­ful for fil­ter­ing clear vi­sual ar­ti­facts with­out train­ing an ex­plicit clas­si­fier.

Midtraining Data

Unlike the pre­train­ing stages, mid­train­ing ex­plic­itly se­lects spe­cific im­age sources known to of­fer good styl­is­tic cov­er­age and high-qual­ity im­ages for par­tic­u­lar vi­sual do­mains. Whereas pre­train­ing is a bot­tom-up process that be­gins from a gen­eral pool, mid­train­ing data is cu­rated top-down: the do­mains and sources are cho­sen first. Midtraining is a cru­cial stage that smoothly bridges the gen­eral pre­train­ing dis­tri­b­u­tion and the high-qual­ity SFT dis­tri­b­u­tion. To im­prove the qual­ity of the dis­tri­b­u­tion, we in­tro­duce se­man­tic clus­ter­ing and use re­trieval-based strate­gies to en­sure world-knowl­edge cov­er­age.

Building on the ap­proach in Automatic Data Curation for Self-Supervised Learning, we use FAISS to per­form hi­er­ar­chi­cal k-means clus­ter­ing, which we then sam­ple so as to re­tain long-tail vi­sual con­cepts with­out wast­ing com­pute over-sam­pling head con­cepts. After com­put­ing the hi­er­ar­chi­cal clus­ters, we have a VLM ex­am­ine the im­ages near­est each clus­ter cen­troid in or­der to name and, where ap­pro­pri­ate, flag the clus­ter. Following hu­man re­view of the flagged clus­ters, we dropped sev­eral that were low qual­ity or prob­lem­atic. We re­move fur­ther re­dun­dant data through se­man­tic dedu­pli­ca­tion, com­put­ing the SigLIP sim­i­lar­ity be­tween im­ages within each re­main­ing leaf clus­ter.

An im­por­tant ca­pa­bil­ity of im­age gen­er­a­tion mod­els is faith­fully rep­re­sent­ing known en­ti­ties that users may ref­er­ence sim­ply by name. Some en­ti­ties, such as sports play­ers or ac­tors, can fall into se­man­tic clus­ters con­tain­ing many other en­ti­ties, which risks their be­ing dropped un­der straight­for­ward hi­er­ar­chi­cal sam­pling. To ad­dress this, we ran PageRank over English Wikipedia us­ing Danker and re­tained the top 90% of ar­ti­cles by rank. We then fil­tered out all ar­ti­cles de­scrib­ing un­rep­re­sentable sub­jects based on their Wikidata meta­data, and for the re­main­ing ~5 mil­lion con­cepts we per­formed a full-text search across all cap­tions in our dataset to as­sess cov­er­age. When sam­pling, we pri­or­i­tized im­ages whose cap­tions ref­er­enced rare con­cepts. Finally, we re­peated this cov­er­age analy­sis on the re­sult­ing sam­ple to con­firm that no con­cepts pre­sent in the ini­tial dataset had been dropped en­tirely.

Supervised Finetuning Data

For the su­per­vised fine­tun­ing (SFT) stage, we use a small, hand-cu­rated dataset fo­cused on in­di­vid­ual vi­sual do­mains. We find that, once a suf­fi­cient vol­ume is reached, the qual­ity of the dataset mat­ters far more than its scale.

Architecture

For our ar­chi­tec­tural ab­la­tions, we found it use­ful to clas­sify each ab­la­tion’s ob­jec­tive into one of the fol­low­ing cat­e­gories:

Stability: Does it make train­ing more sta­ble? Does it re­duce loss and gra­di­ent spikes?

Performance: Does it make the model con­verge faster? If so, does the trend hold over an ex­tended hori­zon and at higher res­o­lu­tion?

Efficiency: Does it re­duce pa­ra­me­ter count, FLOPs, mem­ory, or com­mu­ni­ca­tion re­quire­ments with­out com­pro­mis­ing model qual­ity?

Simplicity: Can we make the model sim­pler with­out af­fect­ing the other cat­e­gories?

It is worth not­ing that many of our ar­chi­tec­tural de­ci­sions are guided by their adop­tion in the LLM space. Choosing an ar­chi­tec­ture that is well es­tab­lished in the LLM ecosys­tem al­lows us to take ad­van­tage of ex­ist­ing ker­nels and op­ti­miza­tions, even for dif­fu­sion mod­els.

With these ob­jec­tives in mind, we be­gin from the fol­low­ing base­line.

Transformer block

We be­gin by re­plac­ing the GeLU MLP with SwiGLU lay­ers at a 4x ex­pan­sion fac­tor, which have be­come a de facto mod­ule in LLM ar­chi­tec­tures. Introducing SwiGLU led to con­sis­tent per­for­mance gains, so we adopted it across all sub­se­quent ab­la­tions.

Having re­vised the MLP de­sign, we con­sid­ered GQA, MLA, and gated sig­moid at­ten­tion as al­ter­na­tives to the multi-head at­ten­tion base­line. We find that GQA in­tro­duces min­i­mal degra­da­tion while of­fer­ing im­proved com­pu­ta­tional ef­fi­ciency. We also ex­plored MLA and ob­served slight gains over GQA, but did not adopt it, as it in­tro­duced ad­di­tional com­pu­ta­tional over­head. We used MLA with up/​down pro­jec­tion for KV com­pres­sion and with­out de­cou­pled RoPE, since dif­fu­sion is purely pre­fill and does not use a KV cache at in­fer­ence.

On top of GQA, we add gated sig­moid at­ten­tion, fol­low­ing Gated Attention for Large Language Models. Gated sig­moid at­ten­tion adds very lit­tle com­pute and pa­ra­me­ter over­head. While it did not yield sig­nif­i­cant per­for­mance gains, it pro­duced more sta­ble train­ing dy­nam­ics, as re­flected in the loss and gra­di­ent-norm curves through­out train­ing.

We also ab­late the modal­ity-stream de­sign:

Single-stream de­sign: a stan­dard trans­former block in which the at­ten­tion and MLP weights are shared be­tween text and im­age to­kens.

Dual-stream de­sign: joint at­ten­tion with sep­a­rate at­ten­tion and MLP weights for text and im­age to­kens.

Hybrid-stream de­sign: a mix of the two, us­ing dual-stream blocks for the first third of the net­work and sin­gle-stream blocks for the re­main­ing two-thirds.

We did not ob­serve sig­nif­i­cant per­for­mance dif­fer­ences among the three de­signs, with the ex­cep­tion of the hy­brid-stream de­sign, which slightly out­per­formed the oth­ers. For the sake of sim­plic­ity, how­ever, we use sin­gle-stream blocks in our fi­nal ar­chi­tec­ture.

Timestep con­di­tion­ing

Many MMDiTs use a per-block MLP to pro­duce scale, shift, and gate fac­tors. These MLP blocks can ac­count for 20—30% of the to­tal pa­ra­me­ter count, which we con­sider ex­ces­sive for in­ject­ing a scalar con­di­tion. We there­fore re­place the per-block MLP with a per-block tun­able bias term. This change al­lows us to al­lo­cate more pa­ra­me­ters to the at­ten­tion and MLP lay­ers with­out sac­ri­fic­ing model per­for­mance.

Beyond AdaLN mod­u­la­tion, we ex­plored two al­ter­na­tives: (1) re­mov­ing timestep con­di­tion­ing en­tirely, and (2) in-con­text timestep con­di­tion­ing via timestep to­kens. In our low-res­o­lu­tion pre­train­ing runs, re­mov­ing timestep in­for­ma­tion en­tirely con­sis­tently un­der­per­formed the AdaLN base­line. For in-con­text con­di­tion­ing, we cre­ate time em­bed­dings us­ing si­nu­soidal em­bed­dings, con­cate­nate them into a uni­fied text + im­age + time se­quence, and re­move the AdaLN lay­ers en­tirely. At 256px pre­train­ing, 4—16 timestep to­kens were suf­fi­cient to re­place AdaLN. At 512px and 1024px, how­ever, in-con­text con­di­tion­ing per­formed poorly rel­a­tive to the AdaLN base­line. We at­tempted to mit­i­gate this by in­creas­ing the num­ber of timestep to­kens, but ob­served di­min­ish­ing re­turns and could not achieve com­pet­i­tive per­for­mance at higher res­o­lu­tions.

Positional en­cod­ing

We im­ple­mented sev­eral RoPE schemes for our ab­la­tions. We use 3D ax­ial RoPE, with head di­men­sions ded­i­cated to frame, height, and width. For text to­kens, we set the RoPE in­dices to zero. At low res­o­lu­tion, we did not ob­serve sig­nif­i­cant gains from switch­ing to Golden Gate RoPE, MRoPE, nor­mal­ized RoPE, or par­tial RoPE. For par­tial RoPE, we ro­tate only the first half of the head di­men­sion and leave the re­main­der un­ro­tated. As ex­pected, par­tial RoPE pro­duced bet­ter zero-shot in­fer­ence re­sults when scal­ing the model from 256px to 512px and did not suf­fer from the com­mon du­pli­ca­tion ar­ti­facts. Despite this ini­tial res­o­lu­tion gen­er­al­iza­tion, par­tial RoPE ul­ti­mately per­formed worse than the base­line RoPE set­ting as high-res­o­lu­tion train­ing con­tin­ued.

Autoencoder

Recent work sug­gests that the la­tent-space de­sign of the au­toen­coder can sig­nif­i­cantly ac­cel­er­ate the train­ing of im­age gen­er­a­tion mod­els. We start from the FLUX.1-dev au­toen­coder as a base­line and bench­mark it against the Qwen Image VAE, DC-AE, FLUX 2 VAE, and our in­ter­nal au­toen­coder. We ini­tially tested the DC-AE se­ries, as it of­fers up to 32x spa­tial com­pres­sion, which can sub­stan­tially ben­e­fit both train­ing and in­fer­ence ef­fi­ciency. However, we found that DC-AE im­poses a hard up­per limit on the dif­fu­sion mod­el’s abil­ity to re­solve fine de­tail, ow­ing to its re­con­struc­tion er­ror.

By con­trast, the Qwen Image VAE and FLUX 2 VAE of­fer a la­tent space with sig­nif­i­cantly faster con­ver­gence across our pre­train­ing ab­la­tions while main­tain­ing ex­cel­lent re­con­struc­tion qual­ity. We there­fore ini­tially used the Qwen Image au­toen­coder to scale our early mod­els and later adopted the FLUX 2 VAE for our larger mod­els. We also briefly ex­plored train­ing an in­ter­nal au­toen­coder us­ing DINOv3 for se­man­tic align­ment to­gether with a light dif­fu­sion loss, fol­low­ing an ap­proach sim­i­lar to REPA-E. We val­i­dated that it per­forms com­pet­i­tively with the Qwen Image au­toen­coder, but ow­ing to time con­straints we opted for the Qwen Image and FLUX 2 VAEs, which have been val­i­dated at scale.

Residual de­sign

We use stan­dard resid­ual con­nec­tions as our de­fault. We briefly ex­per­i­mented with Laurel, which im­proves the ex­pres­siv­ity of the resid­ual con­nec­tion by adding a low-rank bot­tle­neck branch, but ob­served no no­tice­able im­prove­ment. For fu­ture mod­els, we in­tend to ex­plore al­ter­na­tives such as NOBLE, delta at­ten­tion resid­u­als, and mHC to im­prove the resid­ual de­sign of dif­fu­sion trans­form­ers.

Normalization

RMSNorm has be­come a stan­dard com­po­nent of LLM ar­chi­tec­tures but has not been fully in­te­grated into re­cent dif­fu­sion trans­former ar­chi­tec­tures. Starting from a LayerNorm base­line, we re­placed all nor­mal­iza­tion lay­ers with RMSNorm and ob­served very lit­tle qual­ity degra­da­tion. We there­fore use RMSNorm as the de­fault nor­mal­iza­tion mod­ule (for ex­am­ple, for prenorm and QKNorm). We use the zero-cen­tered RMSNorm and ap­ply weight de­cay to its learn­able pa­ra­me­ters. We also ex­per­i­mented with more ef­fi­cient vari­ants such as Derf, but found non-neg­li­gi­ble qual­ity degra­da­tion.

Text en­coder

We used T5-XXL as our base­line text en­coder. From the out­set, we de­lib­er­ately chose to keep the ar­chi­tec­ture sim­ple and use a sin­gle text en­coder. Notably, we find that T5-XXL re­mains a very com­pet­i­tive text en­coder rel­a­tive to T5Gemma, umT5, Qwen 2.5 VL, and Qwen 3 VL. Ultimately, we use Qwen 3 VL as our fi­nal text en­coder, as a VLM of­fers a richer in­put space (text and im­age) and stronger mul­ti­lin­gual gen­er­al­iza­tion.

Furthermore, in­spired by Unifusion, rather than tak­ing the last layer of the VLM fea­tures, we in­tro­duce a shal­low at­ten­tion layer that ag­gre­gates hid­den fea­tures across lay­ers. This de­sign al­lows the model to dy­nam­i­cally se­lect coarse-to-fine text rep­re­sen­ta­tions. The last-layer fea­tures of an au­tore­gres­sive LLM are sub­op­ti­mal for our pur­pose, as they are op­ti­mized for next-to­ken pre­dic­tion rather than im­age gen­er­a­tion. Alongside this lay­er­wise fea­ture ag­gre­ga­tion, we add light­weight bidi­rec­tional trans­former lay­ers across the to­ken axis to re­duce the au­tore­gres­sive bias in the rep­re­sen­ta­tion space.

Optimization

We use AdamW as our pri­mary op­ti­mizer through­out the pipeline. We ini­tially saw mixed re­sults ap­ply­ing Muon to the MMDiT ar­chi­tec­ture. By de­fault, we use the Muon im­ple­men­ta­tion from Dion and the RMS-matched set­ting from Moonlight to trans­fer AdamW hy­per­pa­ra­me­ters.

In our ex­plo­ration, Muon con­verged faster than AdamW in the ini­tial steps but un­der­per­formed it over longer hori­zons. We also en­coun­tered a num­ber of sta­bil­ity is­sues with Muon, in­clud­ing fre­quent loss and gra­di­ent-norm spikes through­out train­ing. We found it cru­cial to ex­clude the first and last lin­ear lay­ers of the MMDiT from the Muon pa­ra­me­ters; this is con­sis­tent with the LLM lit­er­a­ture, where em­bed­ding and LM-head pa­ra­me­ters are ex­cluded from Muon. After ex­clud­ing these lay­ers and adding Nesterov mo­men­tum, Muon con­sis­tently out­per­formed the AdamW base­line at both low and high res­o­lu­tion. We did not adopt Muon for our most re­cent pre­train­ing run ow­ing to time con­straints, but given these strong re­sults we plan to adopt it in our next pre­train­ing cy­cle.

Training

Our train­ing pipeline fol­lows a multi-stage struc­ture in­spired by mod­ern LLM train­ing pipelines.

Pretraining

Pretraining es­tab­lishes the mod­el’s ba­sic ca­pa­bil­i­ties, in­clud­ing text-im­age align­ment, text ren­der­ing, styl­is­tic cov­er­age, and struc­tural con­sis­tency. We pro­gres­sively scale the res­o­lu­tion from 256px to 512px to 1024px. For our fi­nal model, we train with the stan­dard rec­ti­fied-flow loss un­der v-pa­ra­me­ter­i­za­tion. To ac­cel­er­ate the early stages, we use iREPA for the first epoch of the 256px stage and then re­move it, which en­cour­ages the MMDiT to learn its own rep­re­sen­ta­tions while sub­stan­tially speed­ing up ini­tial con­ver­gence. We also ex­plored al­ter­na­tive ac­cel­er­a­tion strate­gies such as TREAD, but saw lit­tle ben­e­fit.

During the 256px and 512px stages, we use 8-bit train­ing and ob­serve 15—20% gains in train­ing speed over a bf16 base­line, with very min­i­mal degra­da­tion in train­ing loss and eval­u­a­tion met­rics. At 256px we use 8-bit train­ing with ten­sor­wise scal­ing, and at 512px we use finer-grained row­wise scal­ing. From 1024px on­ward, and through the fi­nal RL stage, we use stan­dard bf16 train­ing.

Another im­por­tant as­pect of high-res­o­lu­tion pre­train­ing is adapt­ing the res­o­lu­tion-de­pen­dent timeshift sched­ule. We use a shifted logit-nor­mal sam­pling sched­ule for both train­ing and in­fer­ence, and grad­u­ally in­crease the shift as res­o­lu­tion in­creases. Following FLUX 2 VAE blog, we sweep for the op­ti­mal train­ing timeshift at each res­o­lu­tion. We sweep the shift only for train­ing and keep the in­fer­ence shift sched­ule con­stant, as cer­tain au­toen­coders are less sen­si­tive to the in­fer­ence timeshift.

During pre­train­ing, we use a warmup-sta­ble-de­cay learn­ing-rate sched­ule and ap­ply PMA fol­low­ing Model Merging in Pre-training of Large Language Models. We val­i­date that PMA achieves per­for­mance com­pa­ra­ble to EMA while avoid­ing its sig­nif­i­cant mem­ory over­head. We do not ob­serve sig­nif­i­cant dif­fer­ences be­tween merg­ing meth­ods, al­though tun­ing the num­ber of merged check­points and the merge in­ter­val can yield slight gains on down­stream met­rics.

Midtraining

Midtraining has be­come com­mon in the LLM lit­er­a­ture, and we in­cor­po­rate an anal­o­gous stage into our pipeline. Its fo­cus is to warm up the mod­el’s dis­tri­b­u­tion be­fore the su­per­vised fine­tun­ing (SFT) stage. We find that mid­train­ing is typ­i­cally the last point in the pipeline at which we can equip the model with down­stream ca­pa­bil­i­ties such as high-fi­delity, high-res­o­lu­tion gen­er­a­tion, strong do­main cov­er­age, and text ren­der­ing.

Supervised fine­tun­ing (SFT)

In the su­per­vised fine­tun­ing (SFT) stage, we cu­rate a small, ded­i­cated set of highly aes­thetic im­ages. The ob­jec­tive is to fur­ther bias the model to­ward aes­thet­i­cally de­sir­able di­rec­tions. We find this stage par­tic­u­larly help­ful for im­prov­ing over­all check­point qual­ity and for ad­dress­ing the high-sat­u­ra­tion and tex­ture is­sues that are preva­lent in ear­lier check­points.

After train­ing do­main-spe­cific SFT check­points, we use model merg­ing to pro­duce a gen­er­al­ist SFT check­point. Model merg­ing yields di­min­ish­ing re­turns to­ward the later stages of the pipeline, as the di­rec­tions of im­prove­ment be­gin to con­flict across check­points.

Preference op­ti­miza­tion (PO)

Preference op­ti­miza­tion (PO) is the first stage of our post-train­ing stack and con­sists of a two-stage pipeline. In the first stage, we run a large-scale syn­thetic pref­er­ence-pair gen­er­a­tion pipeline for ini­tial re­fine­ment, us­ing a strat­egy sim­i­lar to delta learn­ing; we en­sure that the ma­jor­ity of pairs in­clude at least one on-pol­icy sam­ple. The sec­ond stage is a cal­i­bra­tion stage that uses only hu­man an­no­ta­tions. These an­no­ta­tions are col­lected en­tirely in house, by peo­ple fa­mil­iar with the spe­cific strengths, weak­nesses, and quirks of the model.

A com­mon phe­nom­e­non dur­ing PO is pol­icy di­ver­gence. At a high level, pref­er­ence-op­ti­miza­tion meth­ods such as DPO en­cour­age the model to in­crease the mar­gin be­tween its like­li­hood of gen­er­at­ing a pre­ferred sam­ple and that of gen­er­at­ing a dis­pre­ferred one, rel­a­tive to the ref­er­ence model. In prac­tice, across dif­fer­ent pref­er­ence-dataset mix­tures, we ob­serve that the model achieves this ob­jec­tive by de­creas­ing the like­li­hood of gen­er­at­ing both sam­ples, but at dif­fer­ent rates. This would be de­sir­able if both the win­ning and los­ing sam­ples were of lower qual­ity than the cur­rent model dis­tri­b­u­tion, but that as­sump­tion does not al­ways hold, de­pend­ing on how the pref­er­ence set was cu­rated. Moreover, this di­ver­gence drifts the model away from the gen­eral pre­train­ing dis­tri­b­u­tion, which man­i­fests as high-fre­quency ar­ti­facts in the later stages of train­ing. To mit­i­gate this, we de­signed a vari­ant of DPO, which we call STPO, that adds an aux­il­iary loss and a mod­i­fi­ca­tion to the orig­i­nal DPO for­mu­la­tion in or­der to re­duce this di­ver­gence.

Reinforcement learn­ing (RL)

Reinforcement learn­ing (RL) is the fi­nal stage of the train­ing pipeline. We use a multi-re­ward GRPO-style method with sev­eral re­ward mod­els: (1) a gen­eral aes­thetic model, (2) a prompt-fol­low­ing re­ward , (3) a text-ren­der­ing re­ward, (4) an ar­ti­fact and struc­ture re­ward. The gen­eral aes­thetic model is ob­tained by fine­tun­ing an open-source VLM on the pref­er­ence data col­lected dur­ing the PO stage. We care­fully de­sign the re­ward struc­ture and tune the data mix­ture to pre­vent ar­ti­facts in­tro­duced by re­ward hack­ing.

Unlike gen­eral aes­thetic re­wards, which are in­her­ently sub­jec­tive, prompt fol­low­ing and text ren­der­ing pro­vide more con­crete sig­nals be­cause they can be checked against the user’s stated in­tent. The chal­lenge is that this in­tent varies widely across prompts. To han­dle this, we use a prompt-spe­cific rubric re­ward in­spired by rubric-based eval­u­a­tion in LLM train­ing. Instead of ask­ing a judge model for a sin­gle holis­tic score, we de­com­pose each prompt into ver­i­fi­able re­quire­ments and eval­u­ate the gen­er­ated im­age against them. This gives the RL stage a more struc­tured sig­nal for align­ment with user in­tent, mak­ing the model bet­ter at sat­is­fy­ing fine-grained prompt con­straints with­out re­duc­ing prompt fol­low­ing to generic im­age qual­ity.

We also found that op­ti­miz­ing only for aes­thet­ics and prompt fol­low­ing can lead to re­ward hack­ing. The model may learn to pro­duce im­ages that ap­pear plau­si­ble at first glance while con­tain­ing struc­tural ar­ti­facts such as ex­tra fin­gers, mal­formed limbs, or dis­torted text. These fail­ures are vi­su­ally ob­vi­ous to hu­mans but are of­ten missed by gen­eral-pur­pose VLM judges. To ad­dress this, we train a ded­i­cated ar­ti­fact re­ward model that de­tects these struc­tural er­rors and dis­cour­ages the RL stage from im­prov­ing bench­mark-fac­ing sig­nals at the ex­pense of vi­sual cor­rect­ness.

During the RL stage, we find that suc­cess de­pends not only on the qual­ity of the re­ward mod­els, but also on how ef­fi­ciently train­ing com­pute is al­lo­cated across prompts. Reward mod­els de­fine the di­rec­tion of im­prove­ment, while the prompt pool de­ter­mines where the model re­ceives use­ful learn­ing sig­nal. We there­fore cu­rate a broad pool of prompts span­ning di­verse styles, con­cepts, set­tings, and sub­jects, then con­tin­u­ously an­a­lyze the re­ward sta­tis­tics of gen­er­ated groups to iden­tify which prompts are most in­for­ma­tive. Prompts that are al­ready too easy, con­sis­tently too hard, or pro­duce lit­tle vari­ance across sam­ples con­tribute lim­ited sig­nal and are de­pri­or­i­tized or re­moved. In prac­tice, ef­fec­tive RL re­quires treat­ing prompt se­lec­tion as a re­source-al­lo­ca­tion prob­lem, where the train­ing process should spend more com­pute on ex­am­ples where the model can still learn, and less on ex­am­ples that pro­vide sat­u­rated or noisy feed­back.

Another prac­ti­cal con­sid­er­a­tion in dif­fu­sion RL is how to han­dle clas­si­fier-free guid­ance (CFG). Both roll­out gen­er­a­tion and train­ing can be per­formed with or with­out CFG, and dif­fer­ent choices cre­ate dif­fer­ent trade-offs be­tween align­ment, sta­bil­ity, and ef­fi­ciency. After ab­la­tions, we found it im­por­tant to keep the roll­out and train­ing dis­tri­b­u­tions aligned while avoid­ing un­nec­es­sary com­pu­ta­tional over­head. We there­fore train the whole RL stage with­out CFG. This set­ting quickly im­proves the con­di­tional model dis­tri­b­u­tion, bring­ing no-CFG sam­ples much closer to guided sam­ples early in train­ing. At in­fer­ence time, CFG can still be en­abled as an ad­di­tional con­trol knob, fur­ther im­prov­ing qual­ity when de­sired.

Timestep dis­til­la­tion

After the RL stage, we in­clude an op­tional timestep-dis­til­la­tion stage in which we ap­ply guid­ance dis­til­la­tion and timestep dis­til­la­tion si­mul­ta­ne­ously. We con­sid­ered sev­eral dis­til­la­tion tech­niques, in­clud­ing DMD, DMD2, Decoupled DMD, pi­Flow, and APT, but adopted Trajectory Distribution Matching (TDM) for the fol­low­ing rea­sons. We sought a tech­nique that was sim­ple to tune, with min­i­mal hy­per­pa­ra­me­ters, which ruled out GAN-based meth­ods and pi­Flow (the lat­ter re­quires adapt­ing the model into a multi-timestep pre­dic­tion model). We chose TDM be­cause it pro­vides a fast, data-free method with flex­i­ble mul­ti­step dis­til­la­tion.

DMD dis­tills the teacher by match­ing the dis­tri­b­u­tions of real and gen­er­ated sam­ples over the clean-im­age dis­tri­b­u­tion. Accordingly, stan­dard DMD uses a few-step stu­dent to pre­dict a clean im­age and then renoises the pre­dic­tion to train the stu­dent (see fig­ure above). Unlike DMD, which matches only the clean-im­age dis­tri­b­u­tion, TDM ap­plies DMD across timesteps, ef­fec­tively per­form­ing dis­tri­b­u­tion match­ing at the tra­jec­tory level rather than at the sam­ple level. Since our goal was a flex­i­ble mul­ti­step stu­dent, we found TDM to be the most suit­able method for our use case.

Prompt Expansion

Dense prompts re­li­ably pro­duce bet­ter im­age-gen­er­a­tion re­sults, but users rarely write prompts that re­sem­ble the rich cap­tions used dur­ing train­ing. We frame this as a dis­tri­b­u­tion-map­ping prob­lem: the im­age model is best con­di­tioned on de­tailed cap­tions that lie close to its train­ing dis­tri­b­u­tion, while real user prompts are of­ten short, con­ver­sa­tional, and un­der­spec­i­fied. We there­fore de­velop a prompt ex­pander that in­ter­prets user in­tent and maps an in­put prompt into a richer, model-friendly cap­tion.

reuters.com

www.reuters.com

Please en­able JS and dis­able any ad blocker

A Practical Guide to SSH Tunnels: Local and Remote Port Forwarding | iximiuz Labs

labs.iximiuz.com

SSH is yet an­other ex­am­ple of an an­cient tech­nol­ogy that is still in wide use to­day. It may very well be that learn­ing a cou­ple of SSH tricks is more prof­itable in the long run than mas­ter­ing a dozen Cloud Native tools or AI agent frame­works des­tined to be­come dep­re­cated next quar­ter.

One of my fa­vorite parts of this tech­nol­ogy is SSH Tunnels. With noth­ing but stan­dard tools and of­ten us­ing just a sin­gle com­mand, you can achieve the fol­low­ing:

Access in­ter­nal VPC end­points through a pub­lic-fac­ing EC2 in­stance.

Open a lo­cal­host port of a re­mote de­vel­op­ment VM in the lo­cal browser.

Expose any lo­cal server from a home/​pri­vate net­work to the out­side world.

Tunnel your browser’s de­bug­ging port to a re­mote sand­boxed cod­ing agent.

And more 😍

But de­spite the fact that I use SSH Tunnels daily, it al­ways takes me a while to re­call the right com­mand. Should it be a Local or a Remote tun­nel? What are the flags? Is it a lo­cal_­port:re­mote_­port or the other way around? So, I de­cided to fi­nally wrap my head around it, and it re­sulted in a se­ries of labs and a vi­sual cheat sheet.

The labs in this tu­to­r­ial run on an at­tached play­ground with four hosts wired into three net­works:

in­ter­nal - a de­vice on the home net­work 192.168.0.0/24 (a home­lab box, a NAS, a printer). Not reach­able from the pub­lic net­work.

lo­cal - your work­sta­tion. Sits on both the home 192.168.0.0/24 and the pub­lic 203.0.113.0/24 net­works.

re­mote - a pub­lic-fac­ing bas­tion / gate­way on the pub­lic 203.0.113.0/24 net­work, also con­nected to a pri­vate vpc 172.16.0.0/24.

pri­vate - an in­ter­nal-only ser­vice (a data­base, an OpenSearch clus­ter) on the vpc 172.16.0.0/24. Not reach­able from the pub­lic net­work.

You can ssh from lo­cal to re­mote by host­name or IP ad­dress - the lo­cal host key is al­ready trusted on the re­mote ma­chine:

ssh re­mote ssh 203.0.113.30

Local Port Forwarding

Starting from the one that I use the most. Oftentimes, there might be a ser­vice lis­ten­ing on lo­cal­host or a pri­vate in­ter­face of a re­mote ma­chine that I can only SSH to via its pub­lic IP. And I des­per­ately need to ac­cess this port from my lo­cal ma­chine. A few typ­i­cal ex­am­ples:

Accessing a pri­vate re­mote data­base (MySQL, Postgres, Redis, etc) from your lap­top us­ing your fa­vorite UI tool.

Using your browser to ac­cess a web ap­pli­ca­tion ex­posed only to a pri­vate net­work.

Accessing a con­tain­er’s port from your lap­top with­out pub­lish­ing it on the server’s pub­lic in­ter­face.

All of the above use cases can be solved with a sin­gle ssh com­mand:

ssh -L [local_addr:]local_port:remote_addr:remote_port [user@]sshd_addr

The -L flag in­di­cates we’re start­ing a lo­cal port for­ward­ing. What it ac­tu­ally means is:

On your lo­cal ma­chine, the SSH client will start lis­ten­ing on lo­cal_­port (likely, on lo­cal­host, but it de­pends - check the GatewayPorts set­ting).

Any traf­fic to this port will be for­warded to re­mote_addr:re­mote_­port, reached from the re­mote ma­chine you SSH-ed to.

Here is what it looks like on a di­a­gram:

Pro Tip: Use ssh -f -N -L to run the port-for­ward­ing ses­sion in the back­ground.

This lab re­pro­duces the setup from the di­a­gram above. The re­mote host runs a web server bound to 127.0.0.1:80, and we want to reach it from the lo­cal work­sta­tion.

Because the ser­vice is bound to the loop­back in­ter­face, it can­not be reached over the net­work. From the lo­cal host, try to hit the re­mote host’s pub­lic ad­dress:

curl 203.0.113.30:80 # re­mote.pub­lic

curl: (7) Failed to con­nect to 203.0.113.30 port 80 af­ter 0 ms: Could not con­nect to server

But from the in­side of the re­mote host, the very same ser­vice works just fine:

curl lo­cal­host:80

Hello from the re­mote host (localhost-only ser­vice).

And here is the trick: back on the lo­cal host, bind the re­mote’s lo­cal­host:80 to the lo­cal’s lo­cal­host:8080 us­ing lo­cal port for­ward­ing:

ssh -f -N -L 8080:localhost:80 203.0.113.30

Now you can ac­cess the web ser­vice on a lo­cal port of your work­sta­tion:

curl lo­cal­host:8080

Hello from the re­mote host (localhost-only ser­vice).

A slightly more ver­bose (but more ex­plicit and flex­i­ble) way to achieve the same goal:

ssh -f -N -L lo­cal­host:8080:lo­cal­host:80 203.0.113.30 # lo­cal re­mote via

Local Port Forwarding with a Bastion Host

It might not be ob­vi­ous at first, but the ssh -L com­mand al­lows for­ward­ing a lo­cal port to a re­mote port on any ma­chine, not only on the SSH server it­self. Notice how the re­mote_addr and sshd_addr may or may not have the same value:

ssh -L [local_addr:]local_port:remote_addr:remote_port [user@]sshd_addr

A re­mote SSH server used to ac­cess pri­vate des­ti­na­tions is usu­ally called a bas­tion or jump host. This is how I vi­su­al­ize this sce­nario in my head:

I of­ten use the above trick to call end­points that are ac­ces­si­ble from the bas­tion host but not from my lap­top (e.g., us­ing an EC2 in­stance with pri­vate and pub­lic in­ter­faces to con­nect to an OpenSearch clus­ter or any other ser­vice de­ployed fully within a VPC).

This lab re­pro­duces the setup from the di­a­gram above. The re­mote tar­get ser­vice runs on the pri­vate host in­side an im­pro­vised VPC net­work (172.16.0.40:80), and the for­mer re­mote host acts as our pub­lic-fac­ing bas­tion (jump host) that can reach it.

The lo­cal work­sta­tion has no route into the VPC, so it can­not talk to the pri­vate host di­rectly. From the lo­cal host:

curl –connect-timeout 3 172.16.0.40:80 # pri­vate.vpc

curl: (28) Connection timed out af­ter 3002 mil­lisec­onds

The re­mote bas­tion, on the other hand, is con­nected to the VPC and can reach the pri­vate host. So, we for­ward a lo­cal port through the bas­tion straight to the pri­vate ser­vice. From the lo­cal host:

ssh -f -N -L 8081:172.16.0.40:80 203.0.113.30

Checking that it works - still on the lo­cal host:

curl lo­cal­host:8081

Hello from the pri­vate VPC host (172.16.0.40).

Notice that the for­ward­ing tar­get (172.16.0.40) and the SSH server (203.0.113.30) are dif­fer­ent ma­chines. The bas­tion ac­cepts the con­nec­tion and opens the sec­ond hop to the pri­vate host on our be­half.

A slightly more ver­bose (but more ex­plicit and flex­i­ble) way to achieve the same goal:

ssh -f -N -L lo­cal­host:8081:172.16.0.40:80 203.0.113.30 # lo­cal re­mote via

Remote Port Forwarding

Another pop­u­lar (but rather in­verse) sce­nario is when you want to mo­men­tar­ily ex­pose a lo­cal ser­vice to the out­side world. Of course, for that, you’ll need a pub­lic-fac­ing ingress gate­way server. And the good news is that any pub­lic-fac­ing server with an SSH dae­mon on it can be used as such a gate­way:

ssh -R [remote_addr:]remote_port:local_addr:local_port [user@]gateway_addr

The above com­mand looks no more com­pli­cated than its ssh -L coun­ter­part. But there is a pit­fall…

By de­fault, the above SSH tun­nel will al­low us­ing only the gate­way’s lo­cal­host as the re­mote ad­dress. In other words, your lo­cal port will be­come ac­ces­si­ble only from in­side the gate­way server it­self, which is most likely not what you ac­tu­ally need. For in­stance, I typ­i­cally want to use the gate­way’s pub­lic ad­dress as the re­mote ad­dress to ex­pose my lo­cal ser­vices to the pub­lic Internet. For that, the SSH server needs to be con­fig­ured with the GatewayPorts yes set­ting.

Here is what re­mote port for­ward­ing can be used for:

Exposing a dev ser­vice from your lap­top to the pub­lic Internet for a quick demo.

Exposing your home­lab to the pub­lic Internet (for ar­bi­trary pur­poses).

Tunneling your lo­cal browser’s de­bug­ging port to a re­mote and/​or sand­boxed cod­ing agent.

Here is how the re­mote port for­ward­ing can be vi­su­al­ized:

Pro Tip: Use ssh -f -N -R to run the port-for­ward­ing ses­sion in the back­ground.

This lab re­pro­duces the setup from the di­a­gram above. The lo­cal work­sta­tion runs a web server bound to 127.0.0.1:80, and we want to ex­pose it to the out­side through the pub­lic-fac­ing re­mote gate­way.

The ser­vice is bound to the loop­back in­ter­face, so right now no­body but the lo­cal ma­chine it­self can reach it. Try ac­cess­ing it from the re­mote ma­chine:

curl –connect-timeout 3 203.0.113.20:80 # lo­cal.pub­lic

curl: (7) Failed to con­nect to 203.0.113.20 port 80 af­ter 0 ms: Could not con­nect to server

We want to ex­pose it through the re­mote gate­way and con­sume it from the pri­vate host. The re­mote gate­way al­ready has GatewayPorts yes in its sshd_­con­fig, so we can ask it to lis­ten on all of its in­ter­faces (0.0.0.0) and for­ward the traf­fic back to us. However, the lo­cal ma­chine has to es­tab­lish the tun­nel first.

From the lo­cal host, start the re­mote port for­ward­ing:

ssh -f -N -R 0.0.0.0:8080:localhost:80 203.0.113.30 # re­mote lo­cal via

Now the lo­cal web ser­vice is pub­lished on the gate­way’s in­ter­faces. Let’s con­firm it from a third ma­chine - the pri­vate host, which can reach the re­mote gate­way over the VPC:

curl 172.16.0.30:8080 # re­mote.vpc

Hello from your lo­cal work­sta­tion (localhost-only ser­vice).

Remote Port Forwarding to a Home or Private Network

Similar to lo­cal port for­ward­ing, re­mote port for­ward­ing has its own bas­tion or jump host mode. But this time, the ma­chine with the SSH client (e.g., your dev lap­top) plays the role of the jump host. In par­tic­u­lar, it al­lows ex­pos­ing ports of a home (or pri­vate) net­work reach­able from your lap­top to the out­side world through a re­mote SSH server act­ing as an ingress gate­way:

ssh -R [remote_addr:]remote_port:local_addr:local_port [user@]gateway_addr

Looks al­most iden­ti­cal to the sim­ple re­mote SSH tun­nel, but the lo­cal_addr:lo­cal_­port pair be­comes the ad­dress of a de­vice in the home net­work. Here is how it can be de­picted on a di­a­gram:

I typ­i­cally use my lap­top as a thin client and the ac­tual de­vel­op­ment hap­pens on a re­mote server. Sometimes, such a re­mote server can re­side in my home net­work and have no or re­stricted Internet ac­cess (for ex­tra iso­la­tion). This is when I may want to rely on re­mote port for­ward­ing to ex­pose a ser­vice from a home server to the pub­lic Internet, us­ing my lap­top that can ac­cess both the in­ter­nal dev server and the re­mote SSH server (ingress gate­way) as a jump host.

This lab re­pro­duces the setup from the di­a­gram above. The ser­vice we want to ex­pose runs on the in­ter­nal host in­side an iso­lated home net­work (192.168.0.10:80). Our lo­cal work­sta­tion can reach the home net­work and also has SSH ac­cess to the pub­lic-fac­ing re­mote gate­way, so it plays the role of a jump host.

The lo­cal host can reach the in­ter­nal ser­vice over the home net­work. From the lo­cal host:

curl 192.168.0.10:80 # in­ter­nal.home

Hello from the in­ter­nal home-net­work host (192.168.0.10).

From the out­side, though, the in­ter­nal de­vice is in­vis­i­ble. Try ac­cess­ing it from the re­mote host:

curl –connect-timeout 3 192.168.0.10:80 # in­ter­nal.home

curl: (28) Connection timed out af­ter 3001 mil­lisec­onds

The re­mote host has no route into the home net­work, so the re­quest sim­ply times out.

Now, from the lo­cal host, start the re­mote port for­ward­ing from the re­mote gate­way to the in­ter­nal de­vice. The for­ward­ing tar­get (192.168.0.10) is re­solved by the SSH client, i.e., from the lo­cal host’s point of view:

ssh -f -N -R 0.0.0.0:8081:192.168.0.10:80 203.0.113.30 # re­mote lo­cal via

Finally, val­i­date that the home-net­work ser­vice be­came ac­ces­si­ble on the gate­way - from the pri­vate host, which reaches the gate­way over the VPC:

curl 172.16.0.30:8081 # re­mote.vpc

Hello from the in­ter­nal home-net­work host (192.168.0.10).

Dynamic Local Port Forwarding

This for­ward­ing mode is less trans­par­ent for the clients, but it is also sig­nif­i­cantly more flex­i­ble than reg­u­lar lo­cal port for­ward­ing. Instead of wiring a lo­cal port to a sin­gle re­mote des­ti­na­tion (like ssh -L does), dy­namic (local) port for­ward­ing turns the SSH client into a lo­cal SOCKS proxy. Any ap­pli­ca­tion that can speak SOCKS can then send traf­fic through it, choos­ing the ac­tual des­ti­na­tion host and port per con­nec­tion - they will be sent over to the SSH server, which will re­solve the des­ti­na­tion and es­tab­lish the con­nec­tion:

ssh -D [local_addr:]local_port [user@]sshd_addr

Hotter Than a Hot Tub: The 45°C Breakthrough to Cool AI’s Biggest Machines

blogs.nvidia.com

Hot tubs sit at about 38 to 40 de­grees Celsius, warm enough that most peo­ple can only soak for about 15 min­utes. NVIDIAs newest AI servers can run their cool­ing liq­uid even hot­ter — up to 45 de­grees Celsius, or 113 de­grees Fahrenheit. That higher tem­per­a­ture limit is pre­cisely what makes them more en­ergy ef­fi­cient.

The Rubin gen­er­a­tion of NVIDIA AI in­fra­struc­ture is the world’s first to achieve 100% liq­uid cool­ing — every chip, every net­work­ing com­po­nent, cooled en­tirely by liq­uid in a closed loop with no fans any­where in the sys­tem. This liq­uid cool­ing method­ol­ogy is out­lined in the NVIDIA DSX AI fac­tory ref­er­ence de­sign, a guide that out­lines best prac­tices to de­sign, build and op­er­ate the en­tire AI fac­tory in­fra­struc­ture stack.

Although each gen­er­a­tion of­fers sig­nif­i­cantly more com­put­ing power for each watt, full liq­uid-cooled AI com­pute in­fra­struc­ture en­ables data cen­ters to dra­mat­i­cally re­duce cool­ing en­ergy con­sump­tion — mak­ing a mean­ing­ful dif­fer­ence to over­all data cen­ter en­ergy use at hy­per­scale.

The NVIDIA DSX ref­er­ence de­sign for AI fac­to­ries has zero wa­ter con­sump­tion — we have elim­i­nated mas­sive amounts of power us­age and pretty much all wa­ter us­age,” said Ali Heydari, di­rec­tor of data cen­ter cool­ing and in­fra­struc­ture at NVIDIA. With dry-cooler-based de­signs, it’s a closed-loop sys­tem with no evap­o­ra­tive wa­ter cool­ing — out­side of maybe 1% of the year when we might need chillers in some cli­mates.”

Historically, cool­ing alone has ac­counted for up to 40% of a data cen­ter’s elec­tric­ity con­sump­tion, mak­ing it one of the most sig­nif­i­cant ar­eas where ef­fi­ciency im­prove­ments can drive down both op­er­a­tional ex­penses and en­ergy de­mands.

Industry es­ti­mates sug­gest that rais­ing chiller plant tem­per­a­tures by just one de­gree can cut cool­ing en­ergy costs by about 4%. At scale, those sav­ings add up quickly. A 50-megawatt hy­per­scale fa­cil­ity can save over $4 mil­lion an­nu­ally in cool­ing-re­lated en­ergy and wa­ter costs by mov­ing to liq­uid-cooled in­fra­struc­ture.

In fa­vor­able cli­mates, NVIDIAs 45-degree liq­uid-cool­ing ar­chi­tec­ture can en­able chiller-less op­er­a­tion with dry cool­ers, re­duc­ing fa­cil­ity cool­ing wa­ter con­sump­tion from roughly 2.6 mil­lion gal­lons per megawatt per year for con­ven­tional cool­ing-tower-based sys­tems to near zero — up to a 100% re­duc­tion in wa­ter use.

The rea­son: tra­di­tional air-cooled data cen­ters de­pend on large vol­umes of cooled air to re­move heat from IT equip­ment, of­ten re­quir­ing en­ergy-in­ten­sive cool­ing in­fra­struc­ture dur­ing hot weather. With NVIDIAs 45-degree liq­uid cool­ing, heat is cap­tured di­rectly at the chip and trans­ported through liq­uid loops op­er­at­ing at much higher tem­per­a­tures, al­low­ing out­door dry cool­ers to re­ject heat ef­fi­ciently for much of the year while sig­nif­i­cantly re­duc­ing me­chan­i­cal cool­ing re­quire­ments and fa­cil­ity wa­ter con­sump­tion.

The data cen­ter am­bi­ent tem­per­a­ture is flex­i­ble — warm sum­mer air is fine — be­cause noth­ing in the server de­pends on cool air. The liq­uid does all the work — and the same liq­uid can be re­cir­cu­lated in a closed loop so no new wa­ter is con­sumed to cool the chips.

https://​blogs.nvidia.com/​wp-con­tent/​up­loads/​2026/​06/​Liq­uid­Cooling­In­fra_­mon­tage_v4.mp4

A New Standard for the Industry

Because the NVIDIA Rubin plat­form in­te­grates 100% liq­uid-cooled in­fra­struc­ture, every cloud provider and data cen­ter op­er­a­tor build­ing for it is mak­ing the tran­si­tion.

The ecosys­tem is keep­ing pace. Motivair, the ad­vanced cool­ing di­vi­sion of Schneider Electric, has worked along­side NVIDIAs prod­uct roadmap for nearly a decade — and Richard Whitmore, its pres­i­dent and CEO, says the re­la­tion­ship only in­ten­si­fied as power den­si­ties crossed the thresh­old where air cool­ing was no longer a vi­able op­tion.

Once the watts per chip crossed a cer­tain level, liq­uid cool­ing be­came manda­tory,” said Whitmore.

Too Hot to Cool AI Infrastructure Is Hotter Than You’d Think

There’s a long-stand­ing mis­con­cep­tion in the in­dus­try that a cold data cen­ter is an ef­fi­cient one. Decades ago, if a data cen­ter did­n’t feel like a walk-in freezer, peo­ple would as­sume some­thing was wrong.

In re­al­ity, chips can sus­tain far warmer en­vi­ron­ments than that in­stinct sug­gests. Silicon proces­sors gen­er­ate enor­mous in­ter­nal heat — the coolant en­ter­ing a fully liq­uid-cooled chip at 45 de­grees Celsius ex­its at roughly 55 de­grees, hav­ing ab­sorbed that heat load across the chip sur­face. Yet per­for­mance does­n’t de­grade.

The proces­sors con­tinue to op­er­ate at full per­for­mance be­cause liq­uid-cooled cold plates keep de­vice tem­per­a­tures within val­i­dated op­er­at­ing lim­its, even with coolant en­ter­ing the rack at 45 de­grees Celsius.

No Fans, No Cold Aisles — A Fundamentally Different Machine

Walk into a tra­di­tional data cen­ter and no­tice two things: the noise — cool­ing fans con­tribute to to­tal noise lev­els at or above 85 deci­bels, loud enough to re­quire ear pro­tec­tion — and the phys­i­cal chore­og­ra­phy of hot aisles and cold aisles, care­fully man­aged to push cooled air across com­po­nents.

The Rubin ar­chi­tec­ture changes the pic­ture.

Coolant — 75% wa­ter and 25% propy­lene gly­col — flows through cold plates that sit di­rectly on proces­sors, pulling heat out at the source. Running that coolant at up to 45 de­grees Celsius means that in many cli­mates, the fa­cil­ity loop can re­ject heat with­out turn­ing on me­chan­i­cal chillers and noisy fans.

That un­locks some­thing be­yond en­ergy sav­ings: the pos­si­bil­ity of elim­i­nat­ing wa­ter con­sump­tion en­tirely.

In the right ge­og­ra­phy — some­where with re­li­ably cool out­door air — a liq­uid-cooled data cen­ter can re­ject its heat through coolant dis­tri­b­u­tion units that cap­ture heat di­rectly at the source and trans­port it to out­door dry cool­ers, es­sen­tially large ra­di­a­tor coils po­si­tioned out­side the build­ing.

The loop is filled once and runs closed for the life of the fa­cil­ity. And it takes dra­mat­i­cally less space in the AI fac­tory com­pared to tra­di­tional air-cool­ing in­fra­struc­ture.

In the right ge­o­graphic lo­ca­tion, with the right sys­tem de­sign, you don’t need any re­frig­er­a­tion equip­ment,” Whitmore said. You can just put big ra­di­a­tor coils out­side and use the air tem­per­a­ture for all your cool­ing. It’s in­cred­i­bly ef­fi­cient.”

The ge­og­ra­phy caveat mat­ters. A data cen­ter in the Scottish Highlands and one in Phoenix, Arizona, face very dif­fer­ent re­al­i­ties. But even in warmer cli­mates, the shift to­ward 45-degrees-Celsius coolant moves op­er­a­tors sig­nif­i­cantly closer to that chiller-less ideal — where chillers may turn on just a few days a year when the out­side air tem­per­a­ture de­mands it.

Another key ben­e­fit of this new model for AI fac­to­ries is the po­ten­tial for waste heat re­cov­ery, where resid­ual heat from AI fac­tory op­er­a­tions can be re­pur­posed to heat com­mer­cial or res­i­den­tial build­ings nearby.

The Engineering Problem Nobody Had Solved

Previous liq­uid-cooled servers were hy­brid: GPUs and CPUs got cold plates, but the rest of the sys­tem stayed air-cooled, with finned heat sinks de­signed to shed heat into mov­ing air. In a fully liq­uid-cooled server, the cool­ing for these com­po­nents needed to be com­pletely re­designed to use liq­uid.

NVIDIAs ther­mal en­gi­neer­ing team re­worked how those com­po­nents han­dle heat, de­sign­ing cool­ing loops that sim­plify how liq­uid is routed to mul­ti­ple high-power chips on the board us­ing a sin­gle in­let and out­let, re­sult­ing in a cleaner tray-level cool­ing ar­chi­tec­ture.

One vis­i­ble out­come: Rubin servers have clean, sealed front pan­els where air-cooled servers have per­fo­rated bezels. Another: fully liq­uid cooled servers en­able higher rack den­sity than air-cooled servers, so a sys­tem that pre­vi­ously oc­cu­pied six rack units now fits in two — more com­pute, less space, less noise.

AI work­loads are not get­ting lighter. The com­pute de­mand dri­ving data cen­ter con­struc­tion is grow­ing faster than al­most any other cat­e­gory of in­fra­struc­ture in­vest­ment.

Without ef­fi­ciency im­prove­ments in how that com­pute is cooled, the en­ergy cost of run­ning AI at scale would grow in lock­step with the hard­ware. Liquid cool­ing at up to 45 de­grees Celsius — hot­ter than a hot tub, cooler for the planet — is one of the most im­por­tant tools the in­dus­try has to close that gap.

Learn more about liq­uid cool­ing, the NVIDIA DSX plat­form for AI fac­to­ries and NVIDIAs ap­proach to en­ergy-ef­fi­cient AI in­fra­struc­ture.

Slate

www.slate.auto

THE MOST AFFORDABLE TRUCK

The Blank Slate has the es­sen­tials from Whatever else you add—wraps, ac­ces­sories, you name it—is en­tirely up to you.

THE MOST AFFORDABLE TRUCK

The Blank Slate has the es­sen­tials from Whatever else you add—wraps, ac­ces­sories, you name it—is en­tirely up to you.

GET IT HOW YOU WANT.

TRUCK. SUV. ALL YOU.

Keep it as a two-seat pickup. Or get it as a five-seat SUV or Fastback. Your call.

Start Designing

WRAPS FOR ANY VIBE.

200+ ACCESSORIES. AND COUNTING.

Over 80% of them are un­der $500. Making a ve­hi­cle that’s truly yours has never been eas­ier (or more fun).

START WITH A STARTER PACK.

Retrograde

Highlights

Body StyleDecalsWheels

Nightwave

Highlights

DecalsWheels

Cali Sunset

Highlights

Full WrapLightingWheels

Sunnyside

Highlights

Body StyleFull WrapInterior

Mauvin’ On Up

Highlights

Audio & TechBody StyleFull WrapInterior

High Visibility

Highlights

LightingPartial WrapWheels

Ice House

Highlights

Body StyleFull WrapWheels

Hauler Back

Highlights

Full WrapTires

Area 51

Highlights

Body StylePartial WrapWheels

Farm Stand

Highlights

Body StyleDecalsFull WrapLighting

Black Cherry Noir

Highlights

Body StyleFull WrapWheels

Moon Duster

Highlights

Body StylePartial WrapTires

The Red Line

Highlights

Partial WrapWheels

Purple Reign

Highlights

Body StyleFull WrapWheels

Field Jacket

Highlights

DecalsFull WrapTires

The Professional

Highlights

Body StyleFull WrapInteriorWheels

AS SEEN IN

DIY YOUR RIDE.

Panels swap. Parts are ac­ces­si­ble. Manuals are free. We built it so you can truly own it.

Slate U

Got a fleet to run?

Easy to main­tain, easy to fix, and easy to spec to your ex­act busi­ness needs.

Learn about fleet

What’s hap­pen­ing

News, builds, events, and what we’ve been up to.

See what’s up

See ya out there

We don’t just sell a truck on­line. Come see it in per­son.

See events

IF YOU CAN CHARGE YOUR PHONE, YOU CAN CHARGE YOUR CAR.

Charge your Truck with a reg­u­lar wall out­let (120V), a dryer out­let (240V), or at any of 29,000 Tesla Superchargers across the coun­try — every­thing’s in­cluded.

How to Charge

IF YOU CAN CHARGE YOUR PHONE, YOU CAN CHARGE YOUR CAR.

Charge your Truck with a reg­u­lar wall out­let (120V), a dryer out­let (240V), or at any of 29,000 Tesla Superchargers across the coun­try — every­thing’s in­cluded.

How to Charge

STAY IN THE LOOP.

There’s a lot more com­ing. New fea­tures. Fresh drops. Big up­dates. Leave your email and we’ll keep you one step ahead. No spam, just good stuff.

nytimes.com

www.nytimes.com

Please en­able JS and dis­able any ad blocker

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

Visit pancik.com for more.