10 interesting stories served every morning and every evening.
10 interesting stories served every morning and every evening.
🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length.
🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world’s top closed-source models.
🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice.
Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today!
📄 Tech Report: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf
🤗 Open Weights: https://huggingface.co/collections/deepseek-ai/deepseek-v4
DeepSeek-V4-Pro
🔹 Enhanced Agentic Capabilities: Open-source SOTA in Agentic Coding benchmarks.
🔹 Rich World Knowledge: Leads all current open models, trailing only Gemini-3.1-Pro.
🔹 World-Class Reasoning: Beats all current open models in Math/STEM/Coding, rivaling top closed-source models.
DeepSeek-V4-Flash
🔹 Reasoning capabilities closely approach V4-Pro.
🔹 Performs on par with V4-Pro on simple Agent tasks.
🔹 Smaller parameter size, faster response times, and highly cost-effective API pricing.
Structural Innovation & Ultra-High Context Efficiency
🔹 Novel Attention: Token-wise compression + DSA (DeepSeek Sparse Attention).
🔹 Peak Efficiency: World-leading long context with drastically reduced compute & memory costs.
🔹 1M Standard: 1M context is now the default across all official DeepSeek services.
Dedicated Optimizations for Agent Capabilities
🔹 DeepSeek-V4 is seamlessly integrated with leading AI agents like Claude Code, OpenClaw & OpenCode.
🔹 Already driving our in-house agentic coding at DeepSeek.
The figure below showcases a sample PDF generated by DeepSeek-V4-Pro.
API is Available Today!
🔹 Keep base_url, just update model to deepseek-v4-pro or deepseek-v4-flash.
🔹 Supports OpenAI ChatCompletions & Anthropic APIs.
🔹 Both models support 1M context & dual modes (Thinking / Non-Thinking): https://api-docs.deepseek.com/guides/thinking_mode
⚠️ Note: deepseek-chat & deepseek-reasoner will be fully retired and inaccessible after Jul 24th, 2026, 15:59 (UTC Time). (Currently routing to deepseek-v4-flash non-thinking/thinking).
🔹 Amid recent attention, a quick reminder: please rely only on our official accounts for DeepSeek news. Statements from other channels do not reflect our views.
🔹 Thank you for your continued trust. We remain committed to longtermism, advancing steadily toward our ultimate goal of AGI.
First enthusiasm
A couple of weeks ago I subscribed to Claude Code, and during the first few weeks I had a really nice experience. It was fast, the token allowance was fair, and the quality was good.
I learned they had
raised the token allowance for non-rush hours
, and since they opposed some governmental rules, it felt good to support the right cause.
(づ  ̄ ³ ̄)づ
However… for about three weeks now my initial enthusiasm has been rapidly waning.
It began with an issue three weeks ago. I started working in the morning after about a ten-hour break; enough time for my tokens to refresh.
I sent two small questions to Claude Haiku. They were simple questions, not even related to the repository.
Suddenly, token usage spiked to 100%.
Have a nice break…
I contacted their “AI support bot”, which returned some default support nonsense and didn’t really understand the problem. So I asked for human support. A couple of days later a - what appeared to be - human support person sent a reply. It began like this:
“Our systems are detecting your inquiry is regarding usage limits on your Pro or Max plan.”
Yeah, well — it’s the Pro plan. Seems like your systems weren’t actually queried; it was just a default intro and probably a default answer, because:
This was followed by an extensive what seems to be copy-and-paste answer from their docs explaining how daily and weekly limits work.
And it closed with the typically frustrating line, that no customer likes to read at the end of an e-mail and which is just the classical middle-finger of customer support - we don’t care if your problem is solved or not, we declared it closed.
“Note that further replies to this ticket may not be monitored. If your request is not regarding usage limits on your Pro or Max plan, or you need additional support, please visit our help page at”
Great! Sending an automated e-mail that does not refer to the actual problem and then closing the channel. Thanks for nothing, I guess? Or was I wrong. I asked Claude Haiku:
@Haiku:
See the customer’s request here and the response from the AI and later W***** - did they answer the concern/question of the customer?
See the customer’s request here and the response from the AI and later W***** - did they answer the concern/question of the customer?
(╯°_°)╯︵ ┻━┻
Declining quality
In the following days and weeks, the quality was far from satisfying my needs or matching my initial experience. While I used to be able to work on up to three projects at once, now the token limit was exhausted after two hours on a single project.
And the quality was degrading. I am fully aware this is quite subjective and that the quality of the agent is always heavily impacted by the operator. The failure usually appears in front of the screen. But hey, I also develop using Github’s Copilot, OpenAI’s Codex and I am running my own inference with OMLX and Continue using Qwen3.5 – 9B. I’m not the expert, I’m lazy sometimes but I probably know a thing or two.
Let me give you this wonderful example: yesterday I asked Claude Opus to refactor a project.
While I was browsing the model’s thinking log - which I strongly suggest doing not only occasionally - I found this:
Rather than editing every slider in JSX, I’ll add a generic initializer in ui-events.js that auto-injects value displays for all range inputs that lack one.
Rather than editing every slider in JSX, I’ll add a generic initializer in ui-events.js that auto-injects value displays for all range inputs that lack one.
This is clearly bad practice. It’s a cheap workaround you wouldn’t expect even from a junior dev; it reads like someone who just doesn’t want to deliver a good result. My response:
“you can’t be serious — is this how you fix things? just WORKAROUNDS????”
At least Opus admitted:
“You’re right, that was lazy. Let me do it properly — add the labels directly in the JSX and wire them explicitly.”
Needless to say, this shortcut cost me around 50% of my five-hour token allowance.
(ง •̀_•́)ง
And even more…
Now this cache topic comes up
-
among others
. at least they are talking about it openly. The problem was: when you get back to work after some time, your conversation cache is gone and the model starts reading your codebase again. Cost-wise this is smart. But experience-wise? It means you paid tokens for the initial load and, after a forced break because the five-hour token window hit its limit, you pay again for the same load.
Think that’s all? Wait, I also got this funny anecdote: all of a sudden the weekly window changed from today to Monday. OK, I was thankful because it came with a reset to zero. But still: what is going on, Anthropic? Not only that — while I was working on my project, watching token usage with Argus-eyed vigilance, this little warning popped up:
Wait, what? I’m neither part of an organization nor do I see any hint why I suddenly have to worry about a “monthly usage limit” — also the hourly and weekly limits were still not exceeded. What is happening right now?
Turns out — two hours later - it allowed me to continue working. The warning was gone.
At least
this documentation
does not mention a monthly usage limit. And the settings page only lists the limits for the current session and week.
So… what is this monthly limit all about, Anthropic?
Sorry to let you down, Anthropic
I am a huge fan of the product. Theoretically everything just works like a charm; it offers so many opportunities. I built my
Please make sure your browser supports JavaScript and cookies and that you are not blocking them from loading. For more information you can review our Terms of Service and Cookie Policy.
Hi friends,
I’ll be attending Babashka Conf on May 8 and Dutch Clojure Days on May 9.
If you’re attending either (or just visiting Amsterdam), drop me a line!
When I have an idea for a project, it tends to go in one of these two directions:
I just do it. Maybe I make a few minor revisions, but often it turns out exactly how I’d imagined and I’m happy.
I just do it. Maybe I make a few minor revisions, but often it turns out exactly how I’d imagined and I’m happy.
I think, “I should look for prior art”. There’s a lot of prior art, dealing with a much broader scope than I’d originally imagined. I start to wonder if I should incorporate that scope. Or perhaps try to build my thing on top of the existing sorta-nearby-solutions. Or maybe I should just use the popular thing. Although I could do a better job than that thing, if I put a bunch of time into it. But actually, I don’t want to maintain a big popular project, nor do I want to put that much time into this project. Uh oh, now I’ve spent a bunch of time, having neither addressed the original issue nor experienced the joy of creating something.
I think, “I should look for prior art”. There’s a lot of prior art, dealing with a much broader scope than I’d originally imagined. I start to wonder if I should incorporate that scope. Or perhaps try to build my thing on top of the existing sorta-nearby-solutions. Or maybe I should just use the popular thing. Although I could do a better job than that thing, if I put a bunch of time into it. But actually, I don’t want to maintain a big popular project, nor do I want to put that much time into this project. Uh oh, now I’ve spent a bunch of time, having neither addressed the original issue nor experienced the joy of creating something.
I prefer the first outcome, and I think the pivotal factor is how well I’ve internalized my own success criteria.
For example, last weekend I hosted my friend Marcin and we decided it’d be fun to do some woodworking, so we threw together this shelf and 3d-printed hangers for my kitchen:
Absolute banger of a project:
brainstormed the design over coffee
did a few 3d-print iterations for the Ikea bin hangers (OnShape CAD, if you want to print your own)
used material leftover from my workbench
rounded the corner by eye with a palm sander
sealed the raw plywood edge with some leftover paint from a friend
done in a weekend
The main success criteria was to jam on woodworking with a friend, and that helped me not overthink the object-level success criteria: Just make a shelf for my exact kitchen!
In contrast, this past Friday I noticed difftastic did a poor job, so I decided to shop around for structural/semantic diff tools and related workflows (a topic I’ve never studied, that I’m increasingly interested in as I’m reviewing more and more LLM-generated code).
I spent 4 hours over the weekend researching existing tools (see my notes below), going through dark periods of both “semantic tree diffing is a PhD-level complex problem” and “why do all of these have MCP servers? I don’t want an MCP server”, before I came to my senses and remembered my original success criteria: I just want a nicer diffing workflow for myself in Emacs, I should just build it myself — should take about 4 hours.
I’m cautiously optimistic that, having had this realization and committing myself to a minimal scope, I’ll be able to knock out a prototype before running out of motivation.
However, other long-running interests of mine:
interfaces for prototyping hardware (discussed September 2023)
a programming language that fuses what I like about Clojure and Rust (November 2023)
a programming language for CAD (constraints, bidirectional editing, other dubious ideas)
seem to be deep in the well of outcome #2.
That is, I’ve spent hundreds of hours on background research and little prototypes, but haven’t yet synthesized anything that addresses the original motivating issue.
It’s not quite that I regret that time — I do love learning by reading — but I have a nagging sense of unease that my inner critic (fear of failure?) is silencing my generative tendencies, keeping me from the much more enjoyable (and productive!) learning by doing.
I think in these cases the success criteria has been much fuzzier: Am I trying to replace my own usage of Rust/Clojure?
Only for some subset of problems?
Or is it that I actually just need a playground to learn about language design/implementation, and it’s fine if I don’t end up using it?
Ditto for CAD: Am I trying to replace my commercial CAD tool in favor of my own?
Only for some subset of simple or particularly parametric parts?
Do I care if it’s useful for others?
Does my tool need to be legibly different from existing open-source tools?
It’s worth considering these questions, sure.
But at the end of the day, I’d much rather have done a lot than have only considered a lot.
So I’m trying to embrace my inner clueless 20-year-old and just do things — even if some turn out to be “obviously bad” in hindsight, I’ll still be coming out ahead on net =D
Conservation of scope creep
Of course, there’s only so much time to “just do things”, and there’s a balance to be had. I’m not sure how many times I’ll re-learn YAGNI (“you ain’t gonna need it”) in my career, but I was reminded of it again after writing a bunch of code with an LLM agent, then eventually coming to my senses and throwing it all out.
I wanted a Finda-style filesystem-wide fuzzy path search for Emacs.
Since I’ve built (by hand, typing the code myself!) this exact functionality before (walk filesystem to collect paths, index them by trigram, do fast fuzzy queries via bitmap intersections), I figured it’d only take a few hours to supervise an LLM to write all the code.
I started with a “plan mode” chat, and the LLM suggested a library, Nucleo, which turned up since I wrote Finda (10 years ago, eek!).
I read through it, found it quite well-designed and documented, and decided to use it so I’d get its smart case and Unicode normalization functionality.
(E.g., query foo matches Foo and foo, whereas query Foo won’t match foo; similarly for cafe and café.)
Finding a great library wasn’t the problem, the problem was that Nucleo also supported some extra functionality: anchors (^foo only matches at the beginning of a line).
This got me thinking about what that might mean in a corpus that consists entirely of file paths.
Anchoring to the beginning of a line isn’t useful (everything starts with /), so I decided to try and interpret the anchors with respect to the path segments.
E.g., ^foo would match /root/foobar/ but not /root/barfoo/.
But to do this efficiently, the index needs to keep track of segment boundaries so that the query can be checked against each segment quickly.
But then we also need to handle a slash occurring in an anchored query (e.g., ^foo/bar) since that wouldn’t get matched when only looking at segments individually (root, foo, bar, and baz of a matching path /root/foo/bar/baz/).
Working through this took several hours: first throwing around design ideas with an LLM, having it write code to wrap Nucleo’s types, then realizing its code was bloated and didn’t spark joy, so finally writing my own (smaller) wrapper.
Then, after a break, I realized:
I can’t think of a situation where I’d ever wished Finda had anchor functionality
In a corpus of paths, I can anchor by just adding / to the start or end of a query (this works for everything except anchoring to the end of a filename).
So I tossed all of the anchoring code.
I’m pretty sure I still came out ahead compared to if I’d tried to write everything myself sans LLM or discussion with others, but I’m not certain.
Perhaps there’s some kind of conservation law here: Any increases in programming speed will be offset by a corresponding increase in unnecessary features, rabbit holes, and diversions.
Structural diffing
Speaking of unnecessary diversions, let me tell you everything I’ve learned about structural diffing recently — if you have thoughts/feelings/references in this space, I’d love to hear about ’em!
When we’re talking about code, a “diff” usually means a summary of the line-by-line changes between two versions of a file.
This might be rendered as a “unified” view, where changed lines are prefixed with + or - to indicate whether they’re additions or deletions.
For example:
We’ve removed coffee and added apple.
The same diff might also be rendered in a side-by-side view, which can be easier to read when there are more complex changes:
The problem with these line-by-line diffs is that they’re not aware of higher-level structure like functions, types, etc. — if some braces match up somehow between versions, they might not be shown at all, even if the braces “belong” to different functions.
There’s a wonderful tool, difftastic, which tries to address this by calculating diffs using treesitter-provided concrete syntax trees.
It’s a huge improvement over line-based diffs, but unfortunately it doesn’t always do a great job matching entities between versions.
Here’s the diff that motivated this entire foray:
Note that it doesn’t match up struct PendingClick, it shows it deleted on the left and added on the right.
I haven’t dug into why difftastic fails to match here, but I do feel like it’s wrong — even if the overall diff would be longer, I’d still rather see PendingClickRequest and PendingClick matched up between both sides.
Here’s a summary of tools / references in the space:
The most “baked” and thoughtful semantic diff tool I found is, perhaps unsurprisingly, semanticdiff.com, a small German company with a free VSCode plugin and web app that shows diffs for github PRs. Unfortunately they don’t have any code libraries I can use as a foundation for the workflow I want.
this semanticdiff vs. difftastic blog post covers a lot of great details (including that difftastic doesn’t even show semantically meaningful indentation changes in python !!!)
one of the authors has great HN comments with hard-won background knowledge. E.g., they moved away from treesitter because it’s unreliable for semantics:
Context-sensitive keywords in particular were a constant source of annoyance. The grammar looks correct, but it will fail to parse because of the way the lexer works. You don’t want your tool to abort just because someone named their parameter “async”.
The most “baked” and thoughtful semantic diff tool I found is, perhaps unsurprisingly, semanticdiff.com, a small German company with a free VSCode plugin and web app that shows diffs for github PRs. Unfortunately they don’t have any code libraries I can use as a foundation for the workflow I want.
this semanticdiff vs. difftastic blog post covers a lot of great details (including that difftastic doesn’t even show semantically meaningful indentation changes in python !!!)
one of the authors has great HN comments with hard-won background knowledge. E.g., they moved away from treesitter because it’s unreliable for semantics:
Context-sensitive keywords in particular were a constant source of annoyance. The grammar looks correct, but it will fail to parse because of the way the lexer works. You don’t want your tool to abort just because someone named their parameter “async”.
Context-sensitive keywords in particular were a constant source of annoyance. The grammar looks correct, but it will fail to parse because of the way the lexer works. You don’t want your tool to abort just because someone named their parameter “async”.
diffsitter
built on treesitter, has MCP server. README includes list of similar projects.
lots of github stars, but doesn’t seem particularly well-documented; I couldn’t find an explanation of how it works, but the difftastic wiki says it “runs longest-common-subsequence on the leaves of the tree”
diffsitter
built on treesitter, has MCP server. README includes list of similar projects.
lots of github stars, but doesn’t seem particularly well-documented; I couldn’t find an explanation of how it works, but the difftastic wiki says it “runs longest-common-subsequence on the leaves of the tree”
gumtree
research / academic origin in 2014
requires Java, so no-go for my use case of a quick tool I can use via Emacs
gumtree
research / academic origin in 2014
requires Java, so no-go for my use case of a quick tool I can use via Emacs
mergiraf: treesitter-based merge-driver written in rust
very nice architecture overview; tool uses Gumtree algorithm
docs and adorable illustrations indicate this project was clearly written by a thoughtful human
semanticdiff.com author in HN comments:
> GumTree is good at returning a result quickly, but there are quite a few cases where it always returned bad matches for us, no matter how many follow-up papers with improvements we tried to implement. In the end we switched over to a dijkstra based approach that tries to minimize the cost of the mapping
mergiraf: treesitter-based merge-driver written in rust
very nice architecture overview; tool uses Gumtree algorithm
Please make sure your browser supports JavaScript and cookies and that you are not blocking them from loading. For more information you can review our Terms of Service and Cookie Policy.
Spinel — Ruby AOT Compiler
Spinel compiles Ruby source code into standalone native executables.
It performs whole-program type inference and generates optimized C code,
achieving significant speedups over CRuby.
Spinel is self-hosting: the compiler backend is written in Ruby and
compiles itself into a native binary.
How It Works
Ruby (.rb)
|
v
spinel_parse Parse with Prism (libprism), serialize AST
| (C binary, or CRuby + Prism gem as fallback)
v
AST text file
|
v
spinel_codegen Type inference + C code generation
| (self-hosted native binary)
v
C source (.c)
|
v
cc -O2 -Ilib -lm Standard C compiler + runtime header
|
v
Native binary Standalone, no runtime dependencies
Quick Start
# Fetch libprism sources (from the prism gem on rubygems.org):
make deps
# Build everything:
make
# Write a Ruby program:
cat > hello.rb <<‘RUBY’
def fib(n)
if n < 2
n
else
fib(n - 1) + fib(n - 2)
end
end
puts fib(34)
RUBY
# Compile and run:
./spinel hello.rb
./hello # prints 5702887 (instantly)
Options
./spinel app.rb # compiles to ./app
./spinel app.rb -o myapp # compiles to ./myapp
./spinel app.rb -c # generates app.c only
./spinel app.rb -S # prints C to stdout
Self-Hosting
Spinel compiles its own backend. The bootstrap chain:
CRuby + spinel_parse.rb → AST
CRuby + spinel_codegen.rb → gen1.c → bin1
bin1 + AST → gen2.c → bin2
bin2 + AST → gen3.c
gen2.c == gen3.c (bootstrap loop closed)
Benchmarks
74 tests pass. 55 benchmarks pass.
Geometric mean: ~11.6x faster than miniruby (Ruby 4.1.0dev) across
the 28 benchmarks below. Baseline is the latest CRuby miniruby build
(without bundled gems), which is considerably faster than the system
ruby (3.2.3); Spinel’s advantage is correspondingly smaller but still
substantial on computation-heavy workloads.
Computation
Data Structures & GC
Real-World Programs
Supported Ruby Features
Core: Classes, inheritance, super, include (mixin), attr_accessor,
Struct.new, alias, module constants, open classes for built-in types.
Control Flow: if/elsif/else, unless, case/when,
case/in (pattern matching), while, until, loop, for..in
(range and array), break, next, return, catch/throw,
&. (safe navigation).
Blocks: yield, block_given?, &block, proc {}, Proc.new,
lambda -> x { }, method(:name). Block methods: each,
each_with_index, map, select, reject, reduce, sort_by,
any?, all?, none?, times, upto, downto.
Exceptions: begin/rescue/ensure/retry, raise,
custom exception classes.
Types: Integer, Float, String (immutable + mutable), Array, Hash,
Range, Time, StringIO, File, Regexp, Bigint (auto-promoted), Fiber.
Polymorphic values via tagged unions. Nullable object types (T?)
for self-referential data structures (linked lists, trees).
Global Variables: $name compiled to static C variables with
type-mismatch detection at compile time.
Strings: << automatically promotes to mutable strings (sp_String)
for O(n) in-place append. +, interpolation, tr, ljust/rjust/center,
and all standard methods work on both. Character comparisons like
s[i] == “c” are optimized to direct char array access (zero allocation).
Chained concatenation (a + b + c + d) collapses to a single malloc
via sp_str_concat4 / sp_str_concat_arr — N-1 fewer allocations.
Loop-local str.split(sep) reuses the same sp_StrArray across
iterations (csv_process: 4 M allocations eliminated).
Regexp: Built-in NFA regexp engine (no external dependency).
=~, $1-$9, match?, gsub(/re/, str), sub(/re/, str),
scan(/re/), split(/re/).
Bigint: Arbitrary precision integers via mruby-bigint. Auto-promoted
from loop multiplication patterns (e.g. q = q * k). Linked as static
library — only included when used.
and others
added 30 commits
Seeking breaks otherwise. We might be able to just fflush() before or seeking
instead?
Turns out DosBox-X was having trouble with the Sound Blaster or something;
standard DosBox works correctly directly from the interrupt handler, and
without doubling the buffer size.
This is MUCH faster than just leaving buffering disabled, and also works
around getting bogus reads after an fseek. SDL_LoadWAV on test/sample.wav
no longer takes several seconds to finish, and comes up with the correct
data.
I wonder if we’re triggering this in LoadWAV because we’re malloc’ing data
between seeks/reads, and it’s causing the djgpp transfer buffer to change. Or
maybe the Fat DS trick is confusing it? I don’t know, I haven’t had time to
debug it, it might just be a legit libc bug in djgpp too, for all I know.
This uses an old trick we used in SDL 1.2 for MacOS Classic, which did its
audio callback in a hardware interrupt. If the audio is locked when the
interrupt fires, make a note of it and return immediately. When the lock is
released, if the interrupt has been fired, run the audio device iteration
right then.
Since there isn’t a big device lock in SDL3 (available to the app, at least),
this keeps a counter of when any SDL_AudioStream is locked, which is probably
good enough.
This uses VESA interfaces to manage the display and works with the software
renderer.
Events aren’t hooked up yet, so prepare to close DosBox on each run. :)
…upport.
This gets most of the rendering examples, which use SDL_GetBasePath() to
find textures to load, working.
Of course Quake 1 solved this better, haha. It’s smart: less memory, dirt
simple, and you don’t even have to worry about synchronizing with the
interrupt handler, because it’s safe for both sides no matter when an
interrupt fires.
[sdl-ci-filter djgpp]
[sdl-ci-artifacts]
- SDL_runapp.c: Add SDL_PLATFORM_DOS to the exclusion list so the
generic
SDL_RunApp() is disabled when the DOS-specific one is compiled.
- SDL.c: Exclude SDL_Gtk_Quit() on DOS. DJGPP defines __unix__ which
sets
SDL_PLATFORM_UNIX, but DOS has no GTK/display server. The GTK source
is not compiled (CMake UNIX is false for DOS) so this was a link
error.
- sdlplatform.cmake: Add DOS case to SDL_DetectCMakePlatform so the
platform is properly detected from CMAKE_SYSTEM_NAME=DOS.
- i586-pc-msdosdjgpp.cmake: Add i386-pc-msdosdjgpp-gcc as a fallback
compiler name, since some DJGPP toolchain builds use the i386 prefix.
- Implement double-buffered page-flipping for VBE modes with >1 image
page
- Save and restore full VBE state on video init/quit for clean mode
switching
- Improve DOS keyboard handling: support extended scancodes and Pause
key
- Lock ISR code/data to prevent page faults during interrupts
- Always vsync when blitting in single-buffered modes to reduce tearing
Move audio mixing out of IRQ handler to main loop for improved
stability and to avoid reentrancy issues. Add SDL_DOS_PumpAudio
function, update DMA buffer handling, and adjust sample rate to 22050
Hz.
Silence stale DMA buffer halves to prevent stutter during load.
Detect SB version and select 8-bit mono or 16-bit stereo mode.
Handle DMA and DSP setup for both SB16 and pre-SB16 hardware.
Add FORCE_SB_8BIT option for testing in DOSBox.
- Poll Sound Blaster DSP status instead of fixed delay after speaker-on
- Clarify DPMI conventional memory is always locked; update comments
- Document and justify DMA memory allocation strategy
- Free IRET wrapper after restoring interrupt vector to avoid leaks
- Throttle joystick axis polling to ~60 Hz to reduce BIOS timing loop
cost
- Always poll joystick buttons directly for responsiveness
Implement banked framebuffer access for VBE 1.2+ modes without LFB.
Detect and initialize banked modes, copy framebuffer data using bank
switching, and blank the framebuffer on mode set. Page-flipping is
disabled in banked mode.
Open
How LLMsActually Work
A complete walkthrough of how large language models like ChatGPT are built — from raw internet text to a conversational assistant. Based on Andrej Karpathy’s technical deep dive.
Representative figures from frontier models circa 2024 — exact numbers shift with every release. The scale is the point, not the precision.
Human: What is behind this text box?
Downloadingthe Internet
The first step is collecting an enormous amount of text. Organizations like Common Crawl have been crawling the web since 2007 — indexing 2.7 billion pages by 2024. This raw data is then filtered into a high-quality dataset like FineWeb.
The goal: large quantity of high quality, diverse documents. After aggressive filtering, you end up with about 44 terabytes — roughly 10 consumer hard drives worth of text — representing ~15 trillion tokens.
Key Insight
The quality and diversity of this training data has more impact on the final model than almost anything else. Garbage in, garbage out — but at a trillion-token scale.
Click any stage to read more detail
🌐 Common Crawl
2.7B web pages · Raw HTML · Since 2007
A non-profit organization that crawls the web and freely provides its data. Their bots follow links from seed pages, recursively indexing the internet. The raw archive is petabytes of gzip’d WARC files containing raw HTML.
🚫 URL Filtering
Blocklists · Malware · Spam · Adult content
Block-lists of known malware sites, spam networks, adult content, marketing pages, and low-quality domains are applied. Entire domains can be removed. This is the cheapest filter so it runs first.
📄 Text Extraction
HTML → clean text · Remove navigation & CSS
Raw HTML contains <div> tags, CSS, JavaScript, navigation menus, and ads. Parsers extract just the meaningful text content. This is harder than it sounds — heuristics decide what’s “content” vs “chrome”.
🌍 Language Filtering
Keep pages ≥65% English · Language classifier
A language classifier estimates the language of each page. Pages with less than 65% target-language content are dropped. This is a design decision — filter aggressively for one language or train multilingual.
♻️ Deduplication
Exact & fuzzy matching · Reduce repetition
Identical or near-identical pages appear millions of times on the internet (copied articles, boilerplate). Training on the same text repeatedly causes memorization. Dedup uses MinHash and exact-match techniques to remove duplicates.
🔒 PII Removal
Names · Addresses · SSNs · Emails
Personally Identifiable Information is detected and either redacted or the page is dropped. Regex patterns and ML classifiers find phone numbers, emails, Social Security numbers, physical addresses, and named individuals.
✅ FineWeb Dataset
44 TB · 15 Trillion tokens · High quality
The final filtered dataset. Articles about tornadoes in 2012, medical facts, history, code, recipes, science papers — the full breadth of human knowledge expressed in text. This becomes the training corpus.
Chapter 1 · Pre-Training · Stage 2
Tokenization
Neural networks can’t process raw text — they need numbers. The solution is tokenization: breaking text into “tokens” (sub-word chunks) and assigning each an ID.
GPT-4 uses a vocabulary of 100,277 tokens, built via the Byte Pair Encoding (BPE) algorithm. BPE starts with individual bytes (256 symbols), then iteratively merges the most frequent adjacent pairs — compressing the sequence length while expanding the vocabulary.
Why not just use words?
Words have infinite variants. “run”, “running”, “runner” would be 3 separate entries. Subword tokens share roots: “run” + “ning”, “run” + “ner”. This also handles new words, typos, and multiple languages efficiently.
BPE in Action
Step 1 of 5
Chapter 1 · Pre-Training · Stage 3
Training theNeural Network
The Transformer neural network is initialized with random parameters — billions of “knobs”. Training adjusts these knobs so the network gets better at predicting the next token in any sequence.
Every training step: sample a window of tokens → feed to network → compare prediction to actual next token → nudge all parameters slightly in the right direction. Repeat billions of times.
The loss — a single number measuring prediction error — falls steadily as the model learns the statistical patterns of human language.
Scale
GPT-2 (2019): 1.6B params, 100B tokens, ~$40K to train. Today: same quality for ~$100. Llama 3: 405B params, 15T tokens. Modern frontier models: hundreds of billions of parameters, trillions of tokens.
Transformer Architecture
What is an Embedding?
Each token ID maps to a learned vector of ~1,000 – 4,000 numbers called its embedding. Think of it as a coordinate in meaning-space — initialized randomly, then shaped by training. The same token (e.g. “bank”) always enters the network with the same embedding vector. Attention layers then mix in context from surrounding tokens, so by the time “bank” reaches deeper layers, “river bank” and “bank account” carry completely different representations. Polysemy is resolved by context, not by storing multiple meanings per token.
Select a training stage to see model output quality
Training Loss ↓
4.8
Cross-entropy loss
500
Training step
Model Output at This Stage
the model has learning but confustion still the wqp mxr model bns to predict…
What the model is learning
At step 1: pure noise. By step 500: local coherence appears. By step 32K: fluent English. The model is learning grammar, facts, reasoning patterns — all implicitly from token prediction.
Chapter 1 · Pre-Training · Stage 4
Inference &Token Sampling
Once trained, the network generates text autoregressively: feed a sequence of tokens → get a probability distribution over all 100K possible next tokens → sample one → append → repeat.
This process is stochastic — the same prompt generates different outputs every time because we’re flipping a biased coin. Higher-probability tokens are more likely but not guaranteed to be chosen.
Temperature controls randomness. Low temperature (0.1) → model always picks the top token. High temperature (2.0) → uniform chaos. 0.7 – 1.0 is the sweet spot for coherent-but-creative text.
Key Mental Model
The model doesn’t “think” about what to say. It computes a probability distribution over all possible next tokens and samples from it. Every word is a coin flip — just a very informed one.
Token Sampling Demo
Watch the model choose the next word. Each bar shows the probability of a candidate token.
The sky appears blue
Temperature (randomness)
0.8
Next token candidates
Chapter 2 · The Base Model
The InternetSimulator
After pre-training, you have a base model — a sophisticated autocomplete engine. It’s not an assistant. It doesn’t answer questions. It continues token sequences based on what it saw on the internet.
Give it a Wikipedia sentence and it’ll complete it from memory. Ask it “What is 2+2?” and it might give you a math textbook page, a quiz answer key, or go off on a tangent — whatever was statistically common in its training data.
The base model’s knowledge lives in its 405 billion parameters — a lossy compression of the internet, like a zip file that approximates rather than perfectly stores information.
Base Model Behavior
Few-Shot Prompting
Hello: Bonjour | Cat: Chat | Dog: Chien | Teacher:
→ Professeur ✓ correct
Memorization
Zebras (/ˈzɛbrə, ˈziːbrə/) are African equines with distinctive…
…black-and-white striped coats. There are three living species: the Grévy’s zebra, plains zebra, and mountain zebra…
↑ Verbatim Wikipedia recall from weights
Hallucination
The Republican Party nominated Trump and [running mate] in the 2024 election against…
→ …Mike Pence, facing Hillary Clinton and Tim Kaine…
→ …Ron DeSantis, against Joe Biden and Kamala Harris…
↑ Knowledge cutoff → plausible confabulation
In-Context Learning
Base models can perform translation, classification, and Q&A via few-shot prompts — no fine-tuning needed. The model infers the task from the pattern of examples in its context window.
Chapter 3 · Post-Training
Building the Assistant
The base model is a token simulator. To turn it into a helpful assistant, we need post-training — a much cheaper but equally critical stage. This is where the model learns conversations.
Supervised Fine-Tuning (SFT)
Human labelers create a dataset of ideal conversations, following detailed labeling instructions: be helpful, be truthful, be harmless. The model is then trained on these conversations — not from scratch, but by continuing to adjust the pre-trained weights on this new data.
Modern SFT datasets (like UltraChat) have millions of conversations — mostly synthetic (LLM-generated), with human review. The model learns by imitation: it adopts the persona of the ideal assistant reflected in the data.
Human
April, 2026
Apr 24
Feature
gpt-5.5
gpt-5.5-pro
v1/responses
v1/chat/completions
v1/batch
Released GPT-5.5, a new frontier model for complex professional work, to the Chat Completions and Responses API, and released GPT-5.5 pro for Responses API requests for tougher problems that benefit from more compute.
GPT-5.5 supports a 1M token context window, image input, structured outputs, function calling, prompt caching, Batch, tool search, built-in computer use, hosted shell, apply patch, Skills, MCP, and web search. Key updates include:
Reasoning effort now defaults to medium.
When image_detail is unset or set to auto, the model now uses original behavior.
Caching for GPT-5.5 only works with extended prompt caching. In-memory prompt caching is not supported.
Learn more here.
Apr 21
Feature
gpt-image-2
v1/images/generations
v1/images/edits
v1/batch
Released GPT Image 2, a state-of-the-art image generation model for image generation and editing. GPT Image 2 supports flexible image sizes, high-fidelity image inputs, token-based image pricing, and Batch API support with a 50% discount.
Apr 15
Update
Updated the Agents SDK with new capabilities, including:
running agents in controlled sandboxes;
inspecting and customizing the open-source harness; and
controlling when memories are created and where they’re stored.
March, 2026
Mar 17
Feature
gpt-5.4-mini
gpt-5.4-nano
v1/responses
v1/chat/completions
Released GPT-5.4 mini and GPT-5.4 nano to the Chat Completions and Responses API. GPT-5.4 mini brings GPT-5.4-class capabilities to a faster, more efficient model for high-volume workloads, while GPT-5.4 nano is optimized for simple high-volume tasks where speed and cost matter most.
GPT-5.4 mini supports tool search, built-in computer use, and compaction. GPT-5.4 nano supports compaction, but does not support tool search or computer use.
Mar 16
Update
gpt-5.3-chat-latest
Updated the gpt-5.3-chat-latest slug to point to the latest model currently used in ChatGPT.
Mar 13
Fix
gpt-5.4
v1/responses
v1/chat/completions
Updated our image encoder to fix a small bug with input_image inputs in GPT-5.4. Some image understanding use cases may now see improved quality. No action is required.
Mar 12
Feature
sora-2
sora-2-pro
v1/videos
v1/videos/characters
v1/videos/extensions
v1/batch
Expanded the Sora API with reusable character references, longer generations up to 20 seconds, 1080p output for sora-2-pro, video extensions, and Batch API support for POST /v1/videos. 1080p generations on sora-2-pro are billed at $0.70 per second. Learn more here.
Mar 12
Update
sora-2
sora-2-pro
v1/videos/edits
v1/videos/{video_id}/remix
Added POST /v1/videos/edits for editing existing videos. This will replace POST /v1/videos/{video_id}/remix, which will be deprecated in 6 months. Learn more here.
Mar 5
Feature
gpt-5.4
gpt-5.4-pro
v1/responses
v1/chat/completions
Released GPT-5.4, our newest frontier model for professional work, to the Chat Completions and Responses API, and released GPT-5.4 pro to the Responses API for tougher problems that benefit from more compute.
Also released:
Tool search in the Responses API, which lets models defer large tool surfaces until runtime to reduce token usage, preserve cache performance, and improve latency.
Built-in Computer use support in GPT-5.4 through the Responses API computer tool for screenshot-based UI interaction.
A 1M token context window and native Compaction support for longer-running agent workflows.
Mar 3
Feature
gpt-5.3-chat-latest
v1/chat/completions
v1/responses
Released gpt-5.3-chat-latest to the Chat Completions and Responses API. This model points to the GPT-5.3 Instant snapshot currently used in ChatGPT. Read more here.
February, 2026
Feb 24
Feature
v1/responses
v1/chat/completions
Expanded input_file support to accept more document, presentation, spreadsheet, code, and text file types. Learn more here.
Feb 24
Feature
v1/responses
Released phase to the Responses API. It labels an assistant message as intermediate commentary (commentary) or the final answer (final_answer). Read more here.
Feb 24
Feature
gpt-5.3-codex
v1/responses
Released gpt-5.3-codex to the Responses API. Read more here.
Feb 23
Feature
v1/responses
Launched WebSocket mode for the Responses API. Learn more here.
Feb 23
Feature
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.