10 interesting stories served every morning and every evening.

DeepSeek V4 Preview Release | DeepSeek API Docs

api-docs.deepseek.com

🚀 DeepSeek-V4 Preview is of­fi­cially live & open-sourced! Welcome to the era of cost-ef­fec­tive 1M con­text length.

🔹 DeepSeek-V4-Pro: 1.6T to­tal / 49B ac­tive params. Performance ri­val­ing the world’s top closed-source mod­els.

🔹 DeepSeek-V4-Flash: 284B to­tal / 13B ac­tive params. Your fast, ef­fi­cient, and eco­nom­i­cal choice.

Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is up­dated & avail­able to­day!

📄 Tech Report: https://​hug­ging­face.co/​deepseek-ai/​DeepSeek-V4-Pro/​blob/​main/​DeepSeek_V4.pdf

🤗 Open Weights: https://​hug­ging­face.co/​col­lec­tions/​deepseek-ai/​deepseek-v4

DeepSeek-V4-Pro​

🔹 Enhanced Agentic Capabilities: Open-source SOTA in Agentic Coding bench­marks.

🔹 Rich World Knowledge: Leads all cur­rent open mod­els, trail­ing only Gemini-3.1-Pro.

🔹 World-Class Reasoning: Beats all cur­rent open mod­els in Math/STEM/Coding, ri­val­ing top closed-source mod­els.

DeepSeek-V4-Flash​

🔹 Reasoning ca­pa­bil­i­ties closely ap­proach V4-Pro.

🔹 Performs on par with V4-Pro on sim­ple Agent tasks.

🔹 Smaller pa­ra­me­ter size, faster re­sponse times, and highly cost-ef­fec­tive API pric­ing.

Structural Innovation & Ultra-High Context Efficiency​

🔹 Novel Attention: Token-wise com­pres­sion + DSA (DeepSeek Sparse Attention).

🔹 Peak Efficiency: World-leading long con­text with dras­ti­cally re­duced com­pute & mem­ory costs.

🔹 1M Standard: 1M con­text is now the de­fault across all of­fi­cial DeepSeek ser­vices.

Dedicated Optimizations for Agent Capabilities​

🔹 DeepSeek-V4 is seam­lessly in­te­grated with lead­ing AI agents like Claude Code, OpenClaw & OpenCode.

🔹 Already dri­ving our in-house agen­tic cod­ing at DeepSeek.

The fig­ure be­low show­cases a sam­ple PDF gen­er­ated by DeepSeek-V4-Pro.

API is Available Today!​

🔹 Keep base_url, just up­date model to deepseek-v4-pro or deepseek-v4-flash.

🔹 Supports OpenAI ChatCompletions & Anthropic APIs.

🔹 Both mod­els sup­port 1M con­text & dual modes (Thinking / Non-Thinking): https://​api-docs.deepseek.com/​guides/​think­ing_­mode

⚠️ Note: deepseek-chat & deepseek-rea­soner will be fully re­tired and in­ac­ces­si­ble af­ter Jul 24th, 2026, 15:59 (UTC Time). (Currently rout­ing to deepseek-v4-flash non-think­ing/​think­ing).

🔹 Amid re­cent at­ten­tion, a quick re­minder: please rely only on our of­fi­cial ac­counts for DeepSeek news. Statements from other chan­nels do not re­flect our views.

🔹 Thank you for your con­tin­ued trust. We re­main com­mit­ted to longter­mism, ad­vanc­ing steadily to­ward our ul­ti­mate goal of AGI.

Why I Cancelled Claude: Token Issues, Declining Quality, and Poor Support

nickyreinert.de

First en­thu­si­asm

A cou­ple of weeks ago I sub­scribed to Claude Code, and dur­ing the first few weeks I had a re­ally nice ex­pe­ri­ence. It was fast, the to­ken al­lowance was fair, and the qual­ity was good.

I learned they had

raised the to­ken al­lowance for non-rush hours

, and since they op­posed some gov­ern­men­tal rules, it felt good to sup­port the right cause.

(づ  ̄ ³ ̄)づ

However… for about three weeks now my ini­tial en­thu­si­asm has been rapidly wan­ing.

It be­gan with an is­sue three weeks ago. I started work­ing in the morn­ing af­ter about a ten-hour break; enough time for my to­kens to re­fresh.

I sent two small ques­tions to Claude Haiku. They were sim­ple ques­tions, not even re­lated to the repos­i­tory.

Suddenly, to­ken us­age spiked to 100%.

Have a nice break…

I con­tacted their AI sup­port bot”, which re­turned some de­fault sup­port non­sense and did­n’t re­ally un­der­stand the prob­lem. So I asked for hu­man sup­port. A cou­ple of days later a - what ap­peared to be - hu­man sup­port per­son sent a re­ply. It be­gan like this:

Our sys­tems are de­tect­ing your in­quiry is re­gard­ing us­age lim­its on your Pro or Max plan.”

Yeah, well — it’s the Pro plan. Seems like your sys­tems weren’t ac­tu­ally queried; it was just a de­fault in­tro and prob­a­bly a de­fault an­swer, be­cause:

This was fol­lowed by an ex­ten­sive what seems to be copy-and-paste an­swer from their docs ex­plain­ing how daily and weekly lim­its work.

And it closed with the typ­i­cally frus­trat­ing line, that no cus­tomer likes to read at the end of an e-mail and which is just the clas­si­cal mid­dle-fin­ger of cus­tomer sup­port - we don’t care if your prob­lem is solved or not, we de­clared it closed.

Note that fur­ther replies to this ticket may not be mon­i­tored. If your re­quest is not re­gard­ing us­age lim­its on your Pro or Max plan, or you need ad­di­tional sup­port, please visit our help page at”

Great! Sending an au­to­mated e-mail that does not re­fer to the ac­tual prob­lem and then clos­ing the chan­nel. Thanks for noth­ing, I guess? Or was I wrong. I asked Claude Haiku:

@Haiku:

See the cus­tomer’s re­quest here and the re­sponse from the AI and later W***** - did they an­swer the con­cern/​ques­tion of the cus­tomer?

See the cus­tomer’s re­quest here and the re­sponse from the AI and later W***** - did they an­swer the con­cern/​ques­tion of the cus­tomer?

(╯°_°)╯︵ ┻━┻

Declining qual­ity

In the fol­low­ing days and weeks, the qual­ity was far from sat­is­fy­ing my needs or match­ing my ini­tial ex­pe­ri­ence. While I used to be able to work on up to three pro­jects at once, now the to­ken limit was ex­hausted af­ter two hours on a sin­gle pro­ject.

And the qual­ity was de­grad­ing. I am fully aware this is quite sub­jec­tive and that the qual­ity of the agent is al­ways heav­ily im­pacted by the op­er­a­tor. The fail­ure usu­ally ap­pears in front of the screen. But hey, I also de­velop us­ing Github’s Copilot, OpenAI’s Codex and I am run­ning my own in­fer­ence with OMLX and Continue us­ing Qwen3.5 – 9B. I’m not the ex­pert, I’m lazy some­times but I prob­a­bly know a thing or two.

Let me give you this won­der­ful ex­am­ple: yes­ter­day I asked Claude Opus to refac­tor a pro­ject.

While I was brows­ing the mod­el’s think­ing log - which I strongly sug­gest do­ing not only oc­ca­sion­ally - I found this:

Rather than edit­ing every slider in JSX, I’ll add a generic ini­tial­izer in ui-events.js that auto-in­jects value dis­plays for all range in­puts that lack one.

Rather than edit­ing every slider in JSX, I’ll add a generic ini­tial­izer in ui-events.js that auto-in­jects value dis­plays for all range in­puts that lack one.

This is clearly bad prac­tice. It’s a cheap workaround you would­n’t ex­pect even from a ju­nior dev; it reads like some­one who just does­n’t want to de­liver a good re­sult. My re­sponse:

you can’t be se­ri­ous — is this how you fix things? just WORKAROUNDS????”

At least Opus ad­mit­ted:

You’re right, that was lazy. Let me do it prop­erly — add the la­bels di­rectly in the JSX and wire them ex­plic­itly.”

Needless to say, this short­cut cost me around 50% of my five-hour to­ken al­lowance.

(ง •̀_•́)ง

And even more…

Now this cache topic comes up

-

among oth­ers

. at least they are talk­ing about it openly. The prob­lem was: when you get back to work af­ter some time, your con­ver­sa­tion cache is gone and the model starts read­ing your code­base again. Cost-wise this is smart. But ex­pe­ri­ence-wise? It means you paid to­kens for the ini­tial load and, af­ter a forced break be­cause the five-hour to­ken win­dow hit its limit, you pay again for the same load.

Think that’s all? Wait, I also got this funny anec­dote: all of a sud­den the weekly win­dow changed from to­day to Monday. OK, I was thank­ful be­cause it came with a re­set to zero. But still: what is go­ing on, Anthropic? Not only that — while I was work­ing on my pro­ject, watch­ing to­ken us­age with Argus-eyed vig­i­lance, this lit­tle warn­ing popped up:

Wait, what? I’m nei­ther part of an or­ga­ni­za­tion nor do I see any hint why I sud­denly have to worry about a monthly us­age limit” — also the hourly and weekly lim­its were still not ex­ceeded. What is hap­pen­ing right now?

Turns out — two hours later - it al­lowed me to con­tinue work­ing. The warn­ing was gone.

At least

this doc­u­men­ta­tion

does not men­tion a monthly us­age limit. And the set­tings page only lists the lim­its for the cur­rent ses­sion and week.

So… what is this monthly limit all about, Anthropic?

Sorry to let you down, Anthropic

I am a huge fan of the prod­uct. Theoretically every­thing just works like a charm; it of­fers so many op­por­tu­ni­ties. I built my

Are you a robot?

www.bloomberg.com

Please make sure your browser sup­ports JavaScript and cook­ies and that you are not block­ing them from load­ing. For more in­for­ma­tion you can re­view our Terms of Service and Cookie Policy.

On sabotaging projects by overthinking, scope creep, and structural diffing

kevinlynagh.com

Hi friends,

I’ll be at­tend­ing Babashka Conf on May 8 and Dutch Clojure Days on May 9.

If you’re at­tend­ing ei­ther (or just vis­it­ing Amsterdam), drop me a line!

When I have an idea for a pro­ject, it tends to go in one of these two di­rec­tions:

I just do it. Maybe I make a few mi­nor re­vi­sions, but of­ten it turns out ex­actly how I’d imag­ined and I’m happy.

I just do it. Maybe I make a few mi­nor re­vi­sions, but of­ten it turns out ex­actly how I’d imag­ined and I’m happy.

I think, I should look for prior art”. There’s a lot of prior art, deal­ing with a much broader scope than I’d orig­i­nally imag­ined. I start to won­der if I should in­cor­po­rate that scope. Or per­haps try to build my thing on top of the ex­ist­ing sorta-nearby-so­lu­tions. Or maybe I should just use the pop­u­lar thing. Although I could do a bet­ter job than that thing, if I put a bunch of time into it. But ac­tu­ally, I don’t want to main­tain a big pop­u­lar pro­ject, nor do I want to put that much time into this pro­ject. Uh oh, now I’ve spent a bunch of time, hav­ing nei­ther ad­dressed the orig­i­nal is­sue nor ex­pe­ri­enced the joy of cre­at­ing some­thing.

I think, I should look for prior art”. There’s a lot of prior art, deal­ing with a much broader scope than I’d orig­i­nally imag­ined. I start to won­der if I should in­cor­po­rate that scope. Or per­haps try to build my thing on top of the ex­ist­ing sorta-nearby-so­lu­tions. Or maybe I should just use the pop­u­lar thing. Although I could do a bet­ter job than that thing, if I put a bunch of time into it. But ac­tu­ally, I don’t want to main­tain a big pop­u­lar pro­ject, nor do I want to put that much time into this pro­ject. Uh oh, now I’ve spent a bunch of time, hav­ing nei­ther ad­dressed the orig­i­nal is­sue nor ex­pe­ri­enced the joy of cre­at­ing some­thing.

I pre­fer the first out­come, and I think the piv­otal fac­tor is how well I’ve in­ter­nal­ized my own suc­cess cri­te­ria.

For ex­am­ple, last week­end I hosted my friend Marcin and we de­cided it’d be fun to do some wood­work­ing, so we threw to­gether this shelf and 3d-printed hang­ers for my kitchen:

Absolute banger of a pro­ject:

brain­stormed the de­sign over cof­fee

did a few 3d-print it­er­a­tions for the Ikea bin hang­ers (OnShape CAD, if you want to print your own)

used ma­te­r­ial left­over from my work­bench

rounded the cor­ner by eye with a palm sander

sealed the raw ply­wood edge with some left­over paint from a friend

done in a week­end

The main suc­cess cri­te­ria was to jam on wood­work­ing with a friend, and that helped me not over­think the ob­ject-level suc­cess cri­te­ria: Just make a shelf for my ex­act kitchen!

In con­trast, this past Friday I no­ticed diff­tas­tic did a poor job, so I de­cided to shop around for struc­tural/​se­man­tic diff tools and re­lated work­flows (a topic I’ve never stud­ied, that I’m in­creas­ingly in­ter­ested in as I’m re­view­ing more and more LLM-generated code).

I spent 4 hours over the week­end re­search­ing ex­ist­ing tools (see my notes be­low), go­ing through dark pe­ri­ods of both semantic tree diff­ing is a PhD-level com­plex prob­lem” and why do all of these have MCP servers? I don’t want an MCP server”, be­fore I came to my senses and re­mem­bered my orig­i­nal suc­cess cri­te­ria: I just want a nicer diff­ing work­flow for my­self in Emacs, I should just build it my­self — should take about 4 hours.

I’m cau­tiously op­ti­mistic that, hav­ing had this re­al­iza­tion and com­mit­ting my­self to a min­i­mal scope, I’ll be able to knock out a pro­to­type be­fore run­ning out of mo­ti­va­tion.

However, other long-run­ning in­ter­ests of mine:

in­ter­faces for pro­to­typ­ing hard­ware (discussed September 2023)

a pro­gram­ming lan­guage that fuses what I like about Clojure and Rust (November 2023)

a pro­gram­ming lan­guage for CAD (constraints, bidi­rec­tional edit­ing, other du­bi­ous ideas)

seem to be deep in the well of out­come #2.

That is, I’ve spent hun­dreds of hours on back­ground re­search and lit­tle pro­to­types, but haven’t yet syn­the­sized any­thing that ad­dresses the orig­i­nal mo­ti­vat­ing is­sue.

It’s not quite that I re­gret that time — I do love learn­ing by read­ing — but I have a nag­ging sense of un­ease that my in­ner critic (fear of fail­ure?) is si­lenc­ing my gen­er­a­tive ten­den­cies, keep­ing me from the much more en­joy­able (and pro­duc­tive!) learn­ing by do­ing.

I think in these cases the suc­cess cri­te­ria has been much fuzzier: Am I try­ing to re­place my own us­age of Rust/Clojure?

Only for some sub­set of prob­lems?

Or is it that I ac­tu­ally just need a play­ground to learn about lan­guage de­sign/​im­ple­men­ta­tion, and it’s fine if I don’t end up us­ing it?

Ditto for CAD: Am I try­ing to re­place my com­mer­cial CAD tool in fa­vor of my own?

Only for some sub­set of sim­ple or par­tic­u­larly para­met­ric parts?

Do I care if it’s use­ful for oth­ers?

Does my tool need to be leg­i­bly dif­fer­ent from ex­ist­ing open-source tools?

It’s worth con­sid­er­ing these ques­tions, sure.

But at the end of the day, I’d much rather have done a lot than have only con­sid­ered a lot.

So I’m try­ing to em­brace my in­ner clue­less 20-year-old and just do things — even if some turn out to be obviously bad” in hind­sight, I’ll still be com­ing out ahead on net =D

Conservation of scope creep

Of course, there’s only so much time to just do things”, and there’s a bal­ance to be had. I’m not sure how many times I’ll re-learn YAGNI (“you ain’t gonna need it”) in my ca­reer, but I was re­minded of it again af­ter writ­ing a bunch of code with an LLM agent, then even­tu­ally com­ing to my senses and throw­ing it all out.

I wanted a Finda-style filesys­tem-wide fuzzy path search for Emacs.

Since I’ve built (by hand, typ­ing the code my­self!) this ex­act func­tion­al­ity be­fore (walk filesys­tem to col­lect paths, in­dex them by tri­gram, do fast fuzzy queries via bitmap in­ter­sec­tions), I fig­ured it’d only take a few hours to su­per­vise an LLM to write all the code.

I started with a plan mode” chat, and the LLM sug­gested a li­brary, Nucleo, which turned up since I wrote Finda (10 years ago, eek!).

I read through it, found it quite well-de­signed and doc­u­mented, and de­cided to use it so I’d get its smart case and Unicode nor­mal­iza­tion func­tion­al­ity.

(E.g., query foo matches Foo and foo, whereas query Foo won’t match foo; sim­i­larly for cafe and café.)

Finding a great li­brary was­n’t the prob­lem, the prob­lem was that Nucleo also sup­ported some ex­tra func­tion­al­ity: an­chors (^foo only matches at the be­gin­ning of a line).

This got me think­ing about what that might mean in a cor­pus that con­sists en­tirely of file paths.

Anchoring to the be­gin­ning of a line is­n’t use­ful (everything starts with /), so I de­cided to try and in­ter­pret the an­chors with re­spect to the path seg­ments.

E.g., ^foo would match /root/foobar/ but not /root/barfoo/.

But to do this ef­fi­ciently, the in­dex needs to keep track of seg­ment bound­aries so that the query can be checked against each seg­ment quickly.

But then we also need to han­dle a slash oc­cur­ring in an an­chored query (e.g., ^foo/bar) since that would­n’t get matched when only look­ing at seg­ments in­di­vid­u­ally (root, foo, bar, and baz of a match­ing path /root/foo/bar/baz/).

Working through this took sev­eral hours: first throw­ing around de­sign ideas with an LLM, hav­ing it write code to wrap Nucleo’s types, then re­al­iz­ing its code was bloated and did­n’t spark joy, so fi­nally writ­ing my own (smaller) wrap­per.

Then, af­ter a break, I re­al­ized:

I can’t think of a sit­u­a­tion where I’d ever wished Finda had an­chor func­tion­al­ity

In a cor­pus of paths, I can an­chor by just adding / to the start or end of a query (this works for every­thing ex­cept an­chor­ing to the end of a file­name).

So I tossed all of the an­chor­ing code.

I’m pretty sure I still came out ahead com­pared to if I’d tried to write every­thing my­self sans LLM or dis­cus­sion with oth­ers, but I’m not cer­tain.

Perhaps there’s some kind of con­ser­va­tion law here: Any in­creases in pro­gram­ming speed will be off­set by a cor­re­spond­ing in­crease in un­nec­es­sary fea­tures, rab­bit holes, and di­ver­sions.

Structural diff­ing

Speaking of un­nec­es­sary di­ver­sions, let me tell you every­thing I’ve learned about struc­tural diff­ing re­cently — if you have thoughts/​feel­ings/​ref­er­ences in this space, I’d love to hear about em!

When we’re talk­ing about code, a diff” usu­ally means a sum­mary of the line-by-line changes be­tween two ver­sions of a file.

This might be ren­dered as a unified” view, where changed lines are pre­fixed with + or - to in­di­cate whether they’re ad­di­tions or dele­tions.

For ex­am­ple:

We’ve re­moved cof­fee and added ap­ple.

The same diff might also be ren­dered in a side-by-side view, which can be eas­ier to read when there are more com­plex changes:

The prob­lem with these line-by-line diffs is that they’re not aware of higher-level struc­ture like func­tions, types, etc. — if some braces match up some­how be­tween ver­sions, they might not be shown at all, even if the braces belong” to dif­fer­ent func­tions.

There’s a won­der­ful tool, diff­tas­tic, which tries to ad­dress this by cal­cu­lat­ing diffs us­ing treesit­ter-pro­vided con­crete syn­tax trees.

It’s a huge im­prove­ment over line-based diffs, but un­for­tu­nately it does­n’t al­ways do a great job match­ing en­ti­ties be­tween ver­sions.

Here’s the diff that mo­ti­vated this en­tire foray:

Note that it does­n’t match up struct PendingClick, it shows it deleted on the left and added on the right.

I haven’t dug into why diff­tas­tic fails to match here, but I do feel like it’s wrong — even if the over­all diff would be longer, I’d still rather see PendingClickRequest and PendingClick matched up be­tween both sides.

Here’s a sum­mary of tools / ref­er­ences in the space:

The most baked” and thought­ful se­man­tic diff tool I found is, per­haps un­sur­pris­ingly, se­man­ticd­iff.com, a small German com­pany with a free VSCode plu­gin and web app that shows diffs for github PRs. Unfortunately they don’t have any code li­braries I can use as a foun­da­tion for the work­flow I want.

this se­man­ticd­iff vs. diff­tas­tic blog post cov­ers a lot of great de­tails (including that diff­tas­tic does­n’t even show se­man­ti­cally mean­ing­ful in­den­ta­tion changes in python !!!)

one of the au­thors has great HN com­ments with hard-won back­ground knowl­edge. E.g., they moved away from treesit­ter be­cause it’s un­re­li­able for se­man­tics:

Context-sensitive key­words in par­tic­u­lar were a con­stant source of an­noy­ance. The gram­mar looks cor­rect, but it will fail to parse be­cause of the way the lexer works. You don’t want your tool to abort just be­cause some­one named their pa­ra­me­ter async”.

The most baked” and thought­ful se­man­tic diff tool I found is, per­haps un­sur­pris­ingly, se­man­ticd­iff.com, a small German com­pany with a free VSCode plu­gin and web app that shows diffs for github PRs. Unfortunately they don’t have any code li­braries I can use as a foun­da­tion for the work­flow I want.

this se­man­ticd­iff vs. diff­tas­tic blog post cov­ers a lot of great de­tails (including that diff­tas­tic does­n’t even show se­man­ti­cally mean­ing­ful in­den­ta­tion changes in python !!!)

one of the au­thors has great HN com­ments with hard-won back­ground knowl­edge. E.g., they moved away from treesit­ter be­cause it’s un­re­li­able for se­man­tics:

Context-sensitive key­words in par­tic­u­lar were a con­stant source of an­noy­ance. The gram­mar looks cor­rect, but it will fail to parse be­cause of the way the lexer works. You don’t want your tool to abort just be­cause some­one named their pa­ra­me­ter async”.

Context-sensitive key­words in par­tic­u­lar were a con­stant source of an­noy­ance. The gram­mar looks cor­rect, but it will fail to parse be­cause of the way the lexer works. You don’t want your tool to abort just be­cause some­one named their pa­ra­me­ter async”.

diff­sit­ter

built on treesit­ter, has MCP server. README in­cludes list of sim­i­lar pro­jects.

lots of github stars, but does­n’t seem par­tic­u­larly well-doc­u­mented; I could­n’t find an ex­pla­na­tion of how it works, but the diff­tas­tic wiki says it runs longest-com­mon-sub­se­quence on the leaves of the tree”

diff­sit­ter

built on treesit­ter, has MCP server. README in­cludes list of sim­i­lar pro­jects.

lots of github stars, but does­n’t seem par­tic­u­larly well-doc­u­mented; I could­n’t find an ex­pla­na­tion of how it works, but the diff­tas­tic wiki says it runs longest-com­mon-sub­se­quence on the leaves of the tree”

gumtree

re­search / aca­d­e­mic ori­gin in 2014

re­quires Java, so no-go for my use case of a quick tool I can use via Emacs

gumtree

re­search / aca­d­e­mic ori­gin in 2014

re­quires Java, so no-go for my use case of a quick tool I can use via Emacs

mer­gi­raf: treesit­ter-based merge-dri­ver writ­ten in rust

very nice ar­chi­tec­ture overview; tool uses Gumtree al­go­rithm

docs and adorable il­lus­tra­tions in­di­cate this pro­ject was clearly writ­ten by a thought­ful hu­man

se­man­ticd­iff.com au­thor in HN com­ments:

> GumTree is good at re­turn­ing a re­sult quickly, but there are quite a few cases where it al­ways re­turned bad matches for us, no mat­ter how many fol­low-up pa­pers with im­prove­ments we tried to im­ple­ment. In the end we switched over to a dijk­stra based ap­proach that tries to min­i­mize the cost of the map­ping

mer­gi­raf: treesit­ter-based merge-dri­ver writ­ten in rust

very nice ar­chi­tec­ture overview; tool uses Gumtree al­go­rithm

Are you a robot?

www.bloomberg.com

Please make sure your browser sup­ports JavaScript and cook­ies and that you are not block­ing them from load­ing. For more in­for­ma­tion you can re­view our Terms of Service and Cookie Policy.

how to be anti-social

nate.leaflet.pub

GitHub - matz/spinel

github.com

Spinel — Ruby AOT Compiler

Spinel com­piles Ruby source code into stand­alone na­tive ex­e­cuta­bles.

It per­forms whole-pro­gram type in­fer­ence and gen­er­ates op­ti­mized C code,

achiev­ing sig­nif­i­cant speedups over CRuby.

Spinel is self-host­ing: the com­piler back­end is writ­ten in Ruby and

com­piles it­self into a na­tive bi­nary.

How It Works

Ruby (.rb)

|

v

spinel_­parse Parse with Prism (libprism), se­ri­al­ize AST

| (C bi­nary, or CRuby + Prism gem as fall­back)

v

AST text file

|

v

spinel_­code­gen Type in­fer­ence + C code gen­er­a­tion

| (self-hosted na­tive bi­nary)

v

C source (.c)

|

v

cc -O2 -Ilib -lm Standard C com­piler + run­time header

|

v

Native bi­nary Standalone, no run­time de­pen­den­cies

Quick Start

# Fetch libprism sources (from the prism gem on rubygems.org):

make deps

# Build every­thing:

make

# Write a Ruby pro­gram:

cat > hello.rb <<‘RUBY’

def fib(n)

if n < 2

n

else

fib(n - 1) + fib(n - 2)

end

end

puts fib(34)

RUBY

# Compile and run:

./spinel hello.rb

./hello # prints 5702887 (instantly)

Options

./spinel app.rb # com­piles to ./app

./spinel app.rb -o myapp # com­piles to ./myapp

./spinel app.rb -c # gen­er­ates app.c only

./spinel app.rb -S # prints C to std­out

Self-Hosting

Spinel com­piles its own back­end. The boot­strap chain:

CRuby + spinel_­parse.rb → AST

CRuby + spinel_­code­gen.rb → gen1.c → bin1

bin1 + AST → gen2.c → bin2

bin2 + AST → gen3.c

gen2.c == gen3.c (bootstrap loop closed)

Benchmarks

74 tests pass. 55 bench­marks pass.

Geometric mean: ~11.6x faster than miniruby (Ruby 4.1.0dev) across

the 28 bench­marks be­low. Baseline is the lat­est CRuby miniruby build

(without bun­dled gems), which is con­sid­er­ably faster than the sys­tem

ruby (3.2.3); Spinel’s ad­van­tage is cor­re­spond­ingly smaller but still

sub­stan­tial on com­pu­ta­tion-heavy work­loads.

Computation

Data Structures & GC

Real-World Programs

Supported Ruby Features

Core: Classes, in­her­i­tance, su­per, in­clude (mixin), at­tr_ac­ces­sor,

Struct.new, alias, mod­ule con­stants, open classes for built-in types.

Control Flow: if/​el­sif/​else, un­less, case/​when,

case/​in (pattern match­ing), while, un­til, loop, for..in

(range and ar­ray), break, next, re­turn, catch/​throw,

&. (safe nav­i­ga­tion).

Blocks: yield, block­_­given?, &block, proc {}, Proc.new,

lambda -> x { }, method(:name). Block meth­ods: each,

each_with­_in­dex, map, se­lect, re­ject, re­duce, sort_by,

any?, all?, none?, times, upto, downto.

Exceptions: be­gin/​res­cue/​en­sure/​retry, raise,

cus­tom ex­cep­tion classes.

Types: Integer, Float, String (immutable + mu­ta­ble), Array, Hash,

Range, Time, StringIO, File, Regexp, Bigint (auto-promoted), Fiber.

Polymorphic val­ues via tagged unions. Nullable ob­ject types (T?)

for self-ref­er­en­tial data struc­tures (linked lists, trees).

Global Variables: $name com­piled to sta­tic C vari­ables with

type-mis­match de­tec­tion at com­pile time.

Strings: << au­to­mat­i­cally pro­motes to mu­ta­ble strings (sp_String)

for O(n) in-place ap­pend. +, in­ter­po­la­tion, tr, ljust/​rjust/​cen­ter,

and all stan­dard meth­ods work on both. Character com­par­isons like

s[i] == c” are op­ti­mized to di­rect char ar­ray ac­cess (zero al­lo­ca­tion).

Chained con­cate­na­tion (a + b + c + d) col­lapses to a sin­gle mal­loc

via sp_str_­con­cat4 / sp_str_­con­cat_arr — N-1 fewer al­lo­ca­tions.

Loop-local str.split(sep) reuses the same sp_S­trAr­ray across

it­er­a­tions (csv_process: 4 M al­lo­ca­tions elim­i­nated).

Regexp: Built-in NFA reg­exp en­gine (no ex­ter­nal de­pen­dency).

=~, $1-$9, match?, gsub(/​re/, str), sub(/​re/, str),

scan(/​re/), split(/​re/).

Bigint: Arbitrary pre­ci­sion in­te­gers via mruby-big­int. Auto-promoted

from loop mul­ti­pli­ca­tion pat­terns (e.g. q = q * k). Linked as sta­tic

li­brary — only in­cluded when used.

Add DOS platform support (DJGPP) by AJenbo · Pull Request #15377 · libsdl-org/SDL

github.com

and oth­ers

added 30 com­mits

Seeking breaks oth­er­wise. We might be able to just fflush() be­fore or seek­ing

in­stead?

Turns out DosBox-X was hav­ing trou­ble with the Sound Blaster or some­thing;

stan­dard DosBox works cor­rectly di­rectly from the in­ter­rupt han­dler, and

with­out dou­bling the buffer size.

This is MUCH faster than just leav­ing buffer­ing dis­abled, and also works

around get­ting bo­gus reads af­ter an fseek. SDL_LoadWAV on test/​sam­ple.wav

no longer takes sev­eral sec­onds to fin­ish, and comes up with the cor­rect

data.

I won­der if we’re trig­ger­ing this in LoadWAV be­cause we’re mal­loc’ing data

be­tween seeks/​reads, and it’s caus­ing the djgpp trans­fer buffer to change. Or

maybe the Fat DS trick is con­fus­ing it? I don’t know, I haven’t had time to

de­bug it, it might just be a le­git libc bug in djgpp too, for all I know.

This uses an old trick we used in SDL 1.2 for MacOS Classic, which did its

au­dio call­back in a hard­ware in­ter­rupt. If the au­dio is locked when the

in­ter­rupt fires, make a note of it and re­turn im­me­di­ately. When the lock is

re­leased, if the in­ter­rupt has been fired, run the au­dio de­vice it­er­a­tion

right then.

Since there is­n’t a big de­vice lock in SDL3 (available to the app, at least),

this keeps a counter of when any SDL_AudioStream is locked, which is prob­a­bly

good enough.

This uses VESA in­ter­faces to man­age the dis­play and works with the soft­ware

ren­derer.

Events aren’t hooked up yet, so pre­pare to close DosBox on each run.  :)

…upport.

This gets most of the ren­der­ing ex­am­ples, which use SDL_GetBasePath() to

find tex­tures to load, work­ing.

Of course Quake 1 solved this bet­ter, haha. It’s smart: less mem­ory, dirt

sim­ple, and you don’t even have to worry about syn­chro­niz­ing with the

in­ter­rupt han­dler, be­cause it’s safe for both sides no mat­ter when an

in­ter­rupt fires.

[sdl-ci-filter djgpp]

[sdl-ci-artifacts]

- SDL_runapp.c: Add SDL_PLATFORM_DOS to the ex­clu­sion list so the

generic

SDL_RunApp() is dis­abled when the DOS-specific one is com­piled.

- SDL.c: Exclude SDL_Gtk_Quit() on DOS. DJGPP de­fines __unix__ which

sets

SDL_PLATFORM_UNIX, but DOS has no GTK/display server. The GTK source

is not com­piled (CMake UNIX is false for DOS) so this was a link

er­ror.

- sdlplat­form.cmake: Add DOS case to SDL_DetectCMakePlatform so the

plat­form is prop­erly de­tected from CMAKE_SYSTEM_NAME=DOS.

- i586-pc-ms­dos­d­jgpp.cmake: Add i386-pc-ms­dos­d­jgpp-gcc as a fall­back

com­piler name, since some DJGPP tool­chain builds use the i386 pre­fix.

- Implement dou­ble-buffered page-flip­ping for VBE modes with >1 im­age

page

- Save and re­store full VBE state on video init/​quit for clean mode

switch­ing

- Improve DOS key­board han­dling: sup­port ex­tended scan­codes and Pause

key

- Lock ISR code/​data to pre­vent page faults dur­ing in­ter­rupts

- Always vsync when blit­ting in sin­gle-buffered modes to re­duce tear­ing

Move au­dio mix­ing out of IRQ han­dler to main loop for im­proved

sta­bil­ity and to avoid reen­trancy is­sues. Add SDL_DOS_PumpAudio

func­tion, up­date DMA buffer han­dling, and ad­just sam­ple rate to 22050

Hz.

Silence stale DMA buffer halves to pre­vent stut­ter dur­ing load.

Detect SB ver­sion and se­lect 8-bit mono or 16-bit stereo mode.

Handle DMA and DSP setup for both SB16 and pre-SB16 hard­ware.

Add FORCE_SB_8BIT op­tion for test­ing in DOSBox.

- Poll Sound Blaster DSP sta­tus in­stead of fixed de­lay af­ter speaker-on

- Clarify DPMI con­ven­tional mem­ory is al­ways locked; up­date com­ments

- Document and jus­tify DMA mem­ory al­lo­ca­tion strat­egy

- Free IRET wrap­per af­ter restor­ing in­ter­rupt vec­tor to avoid leaks

- Throttle joy­stick axis polling to ~60 Hz to re­duce BIOS tim­ing loop

cost

- Always poll joy­stick but­tons di­rectly for re­spon­sive­ness

Implement banked frame­buffer ac­cess for VBE 1.2+ modes with­out LFB.

Detect and ini­tial­ize banked modes, copy frame­buffer data us­ing bank

switch­ing, and blank the frame­buffer on mode set. Page-flipping is

dis­abled in banked mode.

Open

How LLMs Work — A Visual Deep Dive

ynarwal.github.io

How LLMsActually Work

A com­plete walk­through of how large lan­guage mod­els like ChatGPT are built — from raw in­ter­net text to a con­ver­sa­tional as­sis­tant. Based on Andrej Karpathy’s tech­ni­cal deep dive.

Representative fig­ures from fron­tier mod­els circa 2024 — ex­act num­bers shift with every re­lease. The scale is the point, not the pre­ci­sion.

Human: What is be­hind this text box?

Downloadingthe Internet

The first step is col­lect­ing an enor­mous amount of text. Organizations like Common Crawl have been crawl­ing the web since 2007 — in­dex­ing 2.7 bil­lion pages by 2024. This raw data is then fil­tered into a high-qual­ity dataset like FineWeb.

The goal: large quan­tity of high qual­ity, di­verse doc­u­ments. After ag­gres­sive fil­ter­ing, you end up with about 44 ter­abytes — roughly 10 con­sumer hard dri­ves worth of text — rep­re­sent­ing ~15 tril­lion to­kens.

Key Insight

The qual­ity and di­ver­sity of this train­ing data has more im­pact on the fi­nal model than al­most any­thing else. Garbage in, garbage out — but at a tril­lion-to­ken scale.

Click any stage to read more de­tail

🌐 Common Crawl

2.7B web pages · Raw HTML · Since 2007

A non-profit or­ga­ni­za­tion that crawls the web and freely pro­vides its data. Their bots fol­low links from seed pages, re­cur­sively in­dex­ing the in­ter­net. The raw archive is petabytes of gzip’d WARC files con­tain­ing raw HTML.

🚫 URL Filtering

Blocklists · Malware · Spam · Adult con­tent

Block-lists of known mal­ware sites, spam net­works, adult con­tent, mar­ket­ing pages, and low-qual­ity do­mains are ap­plied. Entire do­mains can be re­moved. This is the cheap­est fil­ter so it runs first.

📄 Text Extraction

HTML → clean text · Remove nav­i­ga­tion & CSS

Raw HTML con­tains <div> tags, CSS, JavaScript, nav­i­ga­tion menus, and ads. Parsers ex­tract just the mean­ing­ful text con­tent. This is harder than it sounds — heuris­tics de­cide what’s content” vs chrome”.

🌍 Language Filtering

Keep pages ≥65% English · Language clas­si­fier

A lan­guage clas­si­fier es­ti­mates the lan­guage of each page. Pages with less than 65% tar­get-lan­guage con­tent are dropped. This is a de­sign de­ci­sion — fil­ter ag­gres­sively for one lan­guage or train mul­ti­lin­gual.

♻️ Deduplication

Exact & fuzzy match­ing · Reduce rep­e­ti­tion

Identical or near-iden­ti­cal pages ap­pear mil­lions of times on the in­ter­net (copied ar­ti­cles, boil­er­plate). Training on the same text re­peat­edly causes mem­o­riza­tion. Dedup uses MinHash and ex­act-match tech­niques to re­move du­pli­cates.

🔒 PII Removal

Names · Addresses · SSNs · Emails

Personally Identifiable Information is de­tected and ei­ther redacted or the page is dropped. Regex pat­terns and ML clas­si­fiers find phone num­bers, emails, Social Security num­bers, phys­i­cal ad­dresses, and named in­di­vid­u­als.

✅ FineWeb Dataset

44 TB · 15 Trillion to­kens · High qual­ity

The fi­nal fil­tered dataset. Articles about tor­na­does in 2012, med­ical facts, his­tory, code, recipes, sci­ence pa­pers — the full breadth of hu­man knowl­edge ex­pressed in text. This be­comes the train­ing cor­pus.

Chapter 1 · Pre-Training · Stage 2

Tokenization

Neural net­works can’t process raw text — they need num­bers. The so­lu­tion is to­k­eniza­tion: break­ing text into tokens” (sub-word chunks) and as­sign­ing each an ID.

GPT-4 uses a vo­cab­u­lary of 100,277 to­kens, built via the Byte Pair Encoding (BPE) al­go­rithm. BPE starts with in­di­vid­ual bytes (256 sym­bols), then it­er­a­tively merges the most fre­quent ad­ja­cent pairs — com­press­ing the se­quence length while ex­pand­ing the vo­cab­u­lary.

Why not just use words?

Words have in­fi­nite vari­ants. run”, running”, runner” would be 3 sep­a­rate en­tries. Subword to­kens share roots: run” + ning”, run” + ner”. This also han­dles new words, ty­pos, and mul­ti­ple lan­guages ef­fi­ciently.

BPE in Action

Step 1 of 5

Chapter 1 · Pre-Training · Stage 3

Training theNeural Network

The Transformer neural net­work is ini­tial­ized with ran­dom pa­ra­me­ters — bil­lions of knobs”. Training ad­justs these knobs so the net­work gets bet­ter at pre­dict­ing the next to­ken in any se­quence.

Every train­ing step: sam­ple a win­dow of to­kens → feed to net­work → com­pare pre­dic­tion to ac­tual next to­ken → nudge all pa­ra­me­ters slightly in the right di­rec­tion. Repeat bil­lions of times.

The loss — a sin­gle num­ber mea­sur­ing pre­dic­tion er­ror — falls steadily as the model learns the sta­tis­ti­cal pat­terns of hu­man lan­guage.

Scale

GPT-2 (2019): 1.6B params, 100B to­kens, ~$40K to train. Today: same qual­ity for ~$100. Llama 3: 405B params, 15T to­kens. Modern fron­tier mod­els: hun­dreds of bil­lions of pa­ra­me­ters, tril­lions of to­kens.

Transformer Architecture

What is an Embedding?

Each to­ken ID maps to a learned vec­tor of ~1,000 – 4,000 num­bers called its em­bed­ding. Think of it as a co­or­di­nate in mean­ing-space — ini­tial­ized ran­domly, then shaped by train­ing. The same to­ken (e.g. bank”) al­ways en­ters the net­work with the same em­bed­ding vec­tor. Attention lay­ers then mix in con­text from sur­round­ing to­kens, so by the time bank” reaches deeper lay­ers, river bank” and bank ac­count” carry com­pletely dif­fer­ent rep­re­sen­ta­tions. Polysemy is re­solved by con­text, not by stor­ing mul­ti­ple mean­ings per to­ken.

Select a train­ing stage to see model out­put qual­ity

Training Loss ↓

4.8

Cross-entropy loss

500

Training step

Model Output at This Stage

the model has learn­ing but con­fus­tion still the wqp mxr model bns to pre­dict…

What the model is learn­ing

At step 1: pure noise. By step 500: lo­cal co­her­ence ap­pears. By step 32K: flu­ent English. The model is learn­ing gram­mar, facts, rea­son­ing pat­terns — all im­plic­itly from to­ken pre­dic­tion.

Chapter 1 · Pre-Training · Stage 4

Inference &Token Sampling

Once trained, the net­work gen­er­ates text au­tore­gres­sively: feed a se­quence of to­kens → get a prob­a­bil­ity dis­tri­b­u­tion over all 100K pos­si­ble next to­kens → sam­ple one → ap­pend → re­peat.

This process is sto­chas­tic — the same prompt gen­er­ates dif­fer­ent out­puts every time be­cause we’re flip­ping a bi­ased coin. Higher-probability to­kens are more likely but not guar­an­teed to be cho­sen.

Temperature con­trols ran­dom­ness. Low tem­per­a­ture (0.1) → model al­ways picks the top to­ken. High tem­per­a­ture (2.0) → uni­form chaos. 0.7 – 1.0 is the sweet spot for co­her­ent-but-cre­ative text.

Key Mental Model

The model does­n’t think” about what to say. It com­putes a prob­a­bil­ity dis­tri­b­u­tion over all pos­si­ble next to­kens and sam­ples from it. Every word is a coin flip — just a very in­formed one.

Token Sampling Demo

Watch the model choose the next word. Each bar shows the prob­a­bil­ity of a can­di­date to­ken.

The sky ap­pears blue

Temperature (randomness)

0.8

Next to­ken can­di­dates

Chapter 2 · The Base Model

The InternetSimulator

After pre-train­ing, you have a base model — a so­phis­ti­cated au­to­com­plete en­gine. It’s not an as­sis­tant. It does­n’t an­swer ques­tions. It con­tin­ues to­ken se­quences based on what it saw on the in­ter­net.

Give it a Wikipedia sen­tence and it’ll com­plete it from mem­ory. Ask it What is 2+2?” and it might give you a math text­book page, a quiz an­swer key, or go off on a tan­gent — what­ever was sta­tis­ti­cally com­mon in its train­ing data.

The base mod­el’s knowl­edge lives in its 405 bil­lion pa­ra­me­ters — a lossy com­pres­sion of the in­ter­net, like a zip file that ap­prox­i­mates rather than per­fectly stores in­for­ma­tion.

Base Model Behavior

Few-Shot Prompting

Hello: Bonjour | Cat: Chat | Dog: Chien | Teacher:

→ Professeur  ✓ cor­rect

Memorization

Zebras (/ˈzɛbrə, ˈziːbrə/) are African equines with dis­tinc­tive…

…black-and-white striped coats. There are three liv­ing species: the Grévy’s ze­bra, plains ze­bra, and moun­tain ze­bra…

↑ Verbatim Wikipedia re­call from weights

Hallucination

The Republican Party nom­i­nated Trump and [running mate] in the 2024 elec­tion against…

→ …Mike Pence, fac­ing Hillary Clinton and Tim Kaine…

→ …Ron DeSantis, against Joe Biden and Kamala Harris…

↑ Knowledge cut­off → plau­si­ble con­fab­u­la­tion

In-Context Learning

Base mod­els can per­form trans­la­tion, clas­si­fi­ca­tion, and Q&A via few-shot prompts — no fine-tun­ing needed. The model in­fers the task from the pat­tern of ex­am­ples in its con­text win­dow.

Chapter 3 · Post-Training

Building the Assistant

The base model is a to­ken sim­u­la­tor. To turn it into a help­ful as­sis­tant, we need post-train­ing — a much cheaper but equally crit­i­cal stage. This is where the model learns con­ver­sa­tions.

Supervised Fine-Tuning (SFT)

Human la­bel­ers cre­ate a dataset of ideal con­ver­sa­tions, fol­low­ing de­tailed la­bel­ing in­struc­tions: be help­ful, be truth­ful, be harm­less. The model is then trained on these con­ver­sa­tions — not from scratch, but by con­tin­u­ing to ad­just the pre-trained weights on this new data.

Modern SFT datasets (like UltraChat) have mil­lions of con­ver­sa­tions — mostly syn­thetic (LLM-generated), with hu­man re­view. The model learns by im­i­ta­tion: it adopts the per­sona of the ideal as­sis­tant re­flected in the data.

Human

Changelog | OpenAI API

developers.openai.com

April, 2026

Apr 24

Feature

gpt-5.5

gpt-5.5-pro

v1/​re­sponses

v1/​chat/​com­ple­tions

v1/​batch

Released GPT-5.5, a new fron­tier model for com­plex pro­fes­sional work, to the Chat Completions and Responses API, and re­leased GPT-5.5 pro for Responses API re­quests for tougher prob­lems that ben­e­fit from more com­pute.

GPT-5.5 sup­ports a 1M to­ken con­text win­dow, im­age in­put, struc­tured out­puts, func­tion call­ing, prompt caching, Batch, tool search, built-in com­puter use, hosted shell, ap­ply patch, Skills, MCP, and web search. Key up­dates in­clude:

Reasoning ef­fort now de­faults to medium.

When im­age_de­tail is un­set or set to auto, the model now uses orig­i­nal be­hav­ior.

Caching for GPT-5.5 only works with ex­tended prompt caching. In-memory prompt caching is not sup­ported.

Learn more here.

Apr 21

Feature

gpt-im­age-2

v1/​im­ages/​gen­er­a­tions

v1/​im­ages/​ed­its

v1/​batch

Released GPT Image 2, a state-of-the-art im­age gen­er­a­tion model for im­age gen­er­a­tion and edit­ing. GPT Image 2 sup­ports flex­i­ble im­age sizes, high-fi­delity im­age in­puts, to­ken-based im­age pric­ing, and Batch API sup­port with a 50% dis­count.

Apr 15

Update

Updated the Agents SDK with new ca­pa­bil­i­ties, in­clud­ing:

run­ning agents in con­trolled sand­boxes;

in­spect­ing and cus­tomiz­ing the open-source har­ness; and

con­trol­ling when mem­o­ries are cre­ated and where they’re stored.

March, 2026

Mar 17

Feature

gpt-5.4-mini

gpt-5.4-nano

v1/​re­sponses

v1/​chat/​com­ple­tions

Released GPT-5.4 mini and GPT-5.4 nano to the Chat Completions and Responses API. GPT-5.4 mini brings GPT-5.4-class ca­pa­bil­i­ties to a faster, more ef­fi­cient model for high-vol­ume work­loads, while GPT-5.4 nano is op­ti­mized for sim­ple high-vol­ume tasks where speed and cost mat­ter most.

GPT-5.4 mini sup­ports tool search, built-in com­puter use, and com­paction. GPT-5.4 nano sup­ports com­paction, but does not sup­port tool search or com­puter use.

Mar 16

Update

gpt-5.3-chat-lat­est

Updated the gpt-5.3-chat-lat­est slug to point to the lat­est model cur­rently used in ChatGPT.

Mar 13

Fix

gpt-5.4

v1/​re­sponses

v1/​chat/​com­ple­tions

Updated our im­age en­coder to fix a small bug with in­put_im­age in­puts in GPT-5.4. Some im­age un­der­stand­ing use cases may now see im­proved qual­ity. No ac­tion is re­quired.

Mar 12

Feature

sora-2

sora-2-pro

v1/​videos

v1/​videos/​char­ac­ters

v1/​videos/​ex­ten­sions

v1/​batch

Expanded the Sora API with reusable char­ac­ter ref­er­ences, longer gen­er­a­tions up to 20 sec­onds, 1080p out­put for sora-2-pro, video ex­ten­sions, and Batch API sup­port for POST /v1/videos. 1080p gen­er­a­tions on sora-2-pro are billed at $0.70 per sec­ond. Learn more here.

Mar 12

Update

sora-2

sora-2-pro

v1/​videos/​ed­its

v1/​videos/{​video_id}/​remix

Added POST /v1/videos/edits for edit­ing ex­ist­ing videos. This will re­place POST /v1/videos/{video_id}/remix, which will be dep­re­cated in 6 months. Learn more here.

Mar 5

Feature

gpt-5.4

gpt-5.4-pro

v1/​re­sponses

v1/​chat/​com­ple­tions

Released GPT-5.4, our newest fron­tier model for pro­fes­sional work, to the Chat Completions and Responses API, and re­leased GPT-5.4 pro to the Responses API for tougher prob­lems that ben­e­fit from more com­pute.

Also re­leased:

Tool search in the Responses API, which lets mod­els de­fer large tool sur­faces un­til run­time to re­duce to­ken us­age, pre­serve cache per­for­mance, and im­prove la­tency.

Built-in Computer use sup­port in GPT-5.4 through the Responses API com­puter tool for screen­shot-based UI in­ter­ac­tion.

A 1M to­ken con­text win­dow and na­tive Compaction sup­port for longer-run­ning agent work­flows.

Mar 3

Feature

gpt-5.3-chat-lat­est

v1/​chat/​com­ple­tions

v1/​re­sponses

Released gpt-5.3-chat-lat­est to the Chat Completions and Responses API. This model points to the GPT-5.3 Instant snap­shot cur­rently used in ChatGPT. Read more here.

February, 2026

Feb 24

Feature

v1/​re­sponses

v1/​chat/​com­ple­tions

Expanded in­put_­file sup­port to ac­cept more doc­u­ment, pre­sen­ta­tion, spread­sheet, code, and text file types. Learn more here.

Feb 24

Feature

v1/​re­sponses

Released phase to the Responses API. It la­bels an as­sis­tant mes­sage as in­ter­me­di­ate com­men­tary (commentary) or the fi­nal an­swer (final_answer). Read more here.

Feb 24

Feature

gpt-5.3-codex

v1/​re­sponses

Released gpt-5.3-codex to the Responses API. Read more here.

Feb 23

Feature

v1/​re­sponses

Launched WebSocket mode for the Responses API. Learn more here.

Feb 23

Feature

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.