10 interesting stories served every morning and every evening.
Imagine a newspaper publisher announcing it will no longer allow libraries to keep copies of its paper.
That’s effectively what’s begun happening online in the last few months. The Internet Archive—the world’s largest digital library—has preserved newspapers since it went online in the mid-1990s. The Archive’s mission is to preserve the web and make it accessible to the public. To that end, the organization operates the Wayback Machine, which now contains more than one trillion archived web pages and is used daily by journalists, researchers, and courts.
But in recent months The New York Times began blocking the Archive from crawling its website, using technical measures that go beyond the web’s traditional robots.txt rules. That risks cutting off a record that historians and journalists have relied on for decades. Other newspapers, including The Guardian, seem to be following suit.
For nearly three decades, historians, journalists, and the public have relied on the Internet Archive to preserve news sites as they appeared online. Those archived pages are often the only reliable record of how stories were originally published. In many cases, articles get edited, changed, or removed—sometimes openly, sometimes not. The Internet Archive often becomes the only source for seeing those changes. When major publishers block the Archive’s crawlers, that historical record starts to disappear.
The Times says the move is driven by concerns about AI companies scraping news content. Publishers seek control over how their work is used, and several—including the Times—are now suing AI companies over whether training models on copyrighted material violates the law. There’s a strong case that such training is fair use.
Whatever the outcome of those lawsuits, blocking nonprofit archivists is the wrong response. Organizations like the Internet Archive are not building commercial AI systems. They are preserving a record of our history. Turning off that preservation in an effort to control AI access could essentially torch decades of historical documentation over a fight that libraries like the Archive didn’t start, and didn’t ask for.
If publishers shut the Archive out, they aren’t just limiting bots. They’re erasing the historical record.
Making material searchable is a well-established fair use. Courts have long recognized it’s often impossible to build a searchable index without making copies of the underlying material. That’s why when Google copied entire books in order to make a searchable database, courts rightly recognized it as a clear fair use. The copying served a transformative purpose: enabling discovery, research, and new insights about creative works.
The Internet Archive operates on the same principle. Just as physical libraries preserve newspapers for future readers, the Archive preserves the web’s historical record. Researchers and journalists rely on it every day. According to Archive staff, Wikipedia alone links to more than 2.6 million news articles preserved at the Archive, spanning 249 languages. And that’s only one example. Countless bloggers, researchers, and reporters depend on the Archive as a stable, authoritative record of what was published online.
The same legal principles that protect search engines must also protect archives and libraries. Even if courts place limits on AI training, the law protecting search and web archiving is already well established.
The Internet Archive has preserved the web’s historical record for nearly thirty years. If major publishers begin blocking that mission, future researchers may find that huge portions of that historical record have simply vanished. There are real disputes over AI training that must be resolved in courts. But sacrificing the public record to fight those battles would be a profound, and possibly irreversible, mistake.
...
Read the original on www.eff.org »
Trees take quite a while to grow. If someone 50 years ago planted a row of oaks or a chestnut tree on your plot of land, you have something that no amount of money or effort can replicate. The only way is to wait. Tree-lined roads, old gardens, houses sheltered by decades of canopy: if you want to start fresh on an empty plot, you will not be able to get that.
Because some things just take time.
We know this intuitively. We pay premiums for Swiss watches, Hermès bags and old properties precisely because of the time embedded in them. Either because of the time it took to build them or because of their age. We require age minimums for driving, voting, and drinking because we believe maturity only comes through lived experience.
Yet right now we also live in a time of instant gratification, and it’s entering how we build software and companies. As much as we can speed up code generation, the real defining element of a successful company or an Open Source project will continue to be tenacity. The ability of leadership or the maintainers to stick to a problem for years, to build relationships, to work through challenges fundamentally defined by human lifetimes.
The current generation of startup founders and programmers is obsessed with speed. Fast iteration, rapid deployment, doing everything as quickly as possible. For many things, that’s fine. You can go fast, leave some quality on the table, and learn something along the way.
But there are things where speed is actively harmful, where the friction exists for a reason. Compliance is one of those cases. There’s a strong desire to eliminate everything that processes like SOC2 require, and an entire industry of turnkey solutions has sprung up to help —
Delve just being one example, there are more.
There’s a feeling that all the things that create friction in your life should be automated away. That human involvement should be replaced by AI-based decision-making. Because it is the friction of the process that is the problem. When in fact many times the friction, or that things just take time, is precisely the point.
There’s a reason we have cooling-off periods for some important decisions in one’s life. We recognize that people need time to think about what they’re doing, and that doing something right once doesn’t mean much because you need to be able to do it over a longer period of time.
AI writes code fast which isn’t news anymore. What’s interesting is that we’re pushing this force downstream: we seemingly have this desire to ship faster than ever, to run more experiments and that creates a new desire, one to remove all the remaining friction of reviews, designing and configuring infrastructure, anything that slows the pipeline. If the machines are so great, why do we even need checklists or permission systems? Express desire, enjoy result.
Because we now believe it is important for us to just do everything faster. But increasingly, I also feel like this means that the shelf life of much of the software being created today — software that people and businesses should depend on — can be measured only in months rather than decades, and the relationships alongside.
In one of last year’s earlier YC batches, there was already a handful that just disappeared without even saying what they learned or saying goodbye to their customers. They just shut down their public presence and moved on to other things. And to me, that is not a sign of healthy iteration. That is a sign of breaking the basic trust you need to build a relationship with customers. A proper shutdown takes time and effort, and our current environment treats that as time not wisely spent. Better to just move on to the next thing.
This is extending to Open Source projects as well. All of a sudden, everything is an Open Source project, but many of them only have commits for a week or so, and then they go away because the motivation of the creator already waned. And in the name of experimentation, that is all good and well, but what makes a good Open Source project is that you think and truly believe that the person that created it is either going to stick with it for a very long period of time, or they are able to set up a strategy for succession, or they have created enough of a community that these projects will stand the test of time in one form or another.
Relatedly, I’m also increasingly skeptical of anyone who sells me something that supposedly saves my time. When all that I see is that everybody who is like me, fully onboarded into AI and agentic tools, seemingly has less and less time available because we fall into a trap where we’re immediately filling it with more things.
We all sell each other the idea that we’re going to save time, but that is not what’s happening. Any time saved gets immediately captured by competition. Someone who actually takes a breath is outmaneuvered by someone who fills every freed-up hour with new output. There is no easy way to bank the time and it just disappears.
I feel this acutely. I’m very close to the red-hot center of where economic activity around AI is taking place, and more than anything, I have less and less time, even when I try to purposefully scale back and create the space. For me this is a problem. It’s a problem because even with the best intentions, I actually find it very hard to create quality when we are quickly commoditizing software, and the machines make it so appealing.
I keep coming back to the trees. I’ve been maintaining Open Source projects for close to two decades now. The last startup I worked on, I spent 10 years at. That’s not because I’m particularly disciplined or virtuous. It’s because I or someone else, planted something, and then I kept showing up, and eventually the thing had roots that went deeper than my enthusiasm on any given day. That’s what time does! It turns some idea or plan into a commitment and a commitment into something that can shelter and grow other people.
Nobody is going to mass-produce a 50-year-old oak. And nobody is going to conjure trust, or quality, or community out of a weekend sprint. The things I value most — the projects, the relationships, the communities — are all things that took years to become what they are. No tool, no matter how fast, was going to get them there sooner.
We recently planted a new tree with Colin. I want it to grow into a large one. I know that’s going to take time, and I’m not in a rush.
...
Read the original on lucumr.pocoo.org »
Ghostling is a demo project meant to highlight a minimum functional terminal built on the libghostty C API in a
single C file.
The example uses Raylib for windowing and rendering. It is single-threaded (although libghostty-vt supports threading) and uses a 2D graphics renderer instead of a direct GPU renderer like the primary Ghostty GUI. This is to showcase the flexibility of libghostty and how it can be used in a variety of contexts.
Libghostty is an embeddable library extracted from Ghostty’s core, exposing a C and Zig API so any application can embed correct, fast terminal emulation.
Ghostling uses libghostty-vt, a zero-dependency library (not even libc) that handles VT sequence parsing, terminal state management (cursor position, styles, text reflow, scrollback, etc.), and renderer state management. It contains no renderer drawing or windowing code; the consumer (Ghostling, in this case) provides its own. The core logic is extracted directly from Ghostty and inherits all of its real-world benefits: excellent, accurate, and complete terminal emulation support, SIMD-optimized parsing, leading Unicode support, highly optimized memory usage, and a robust fuzzed and tested codebase, all proven by millions of daily active users of Ghostty GUI.
Despite being a minimal, thin layer above libghostty, look at all the features you do get:
* Unicode and multi-codepoint grapheme handling (no shaping or layout)
* And more. Effectively all the terminal emulation features supported
by Ghostty!
These features aren’t properly exposed by libghostty-vt yet but will be:
These are things that could work but haven’t been tested or aren’t implemented in Ghostling itself:
This list is incomplete and we’ll add things as we find them.
libghostty is focused on core terminal emulation features. As such, you don’t get features that are provided by the GUI above the terminal emulation layer, such as:
* Search UI (although search internals are provided by libghostty-vt)
These are the things that libghostty consumers are expected to implement on their own, if they want them. This example doesn’t implement these to try to stay as minimal as possible.
There are some known issues with this demo:
* Kitty keyboard protocol support is broken with some inputs. This is
due to limitations of the underlying Raylib input system; it doesn’t
support rich enough input events to fully and correctly implement the Kitty
keyboard protocol. This is a known issue.
The libghostty-vt API supports Kitty keybaord protocol correctly, but
requires correct input events to do so.
cmake -B build -G Ninja
cmake –build build
./build/ghostling
cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release
cmake –build build
After the initial configure, you only need to run the build step:
cmake –build build
libghostty-vt has a fully capable and proven Zig API. Ghostty GUI itself uses this and is a good — although complex — example of how to use it. However, this demo is meant to showcase the minimal C API since C is so much more broadly used and accessible to a wide variety of developers and language ecosystems.
libghostty-vt has a C API and can have zero dependencies, so it can be used with minimally thin bindings in basically any language. I’m not sure yet if the Ghostty project will maintain official bindings for languages other than C and Zig, but I hope the community will create and maintain bindings for many languages!
No no no! libghostty has no opinion about the renderer or GUI framework used; it’s even standalone WASM-compatible for browsers and other environments.
libghostty provides a high-performance render state API
which only keeps track of the state required to build a renderer. This is the same API used by Ghostty GUI for Metal and OpenGL rendering and in this repository for the Raylib 2D graphics API. You can layer any renderer on top of this!
I needed to pick something. Really, any build system and any library could be used. CMake is widely used and supported, and Raylib is a simple and elegant library for windowing and 2D rendering that is easy to set up. Don’t get bogged down in these details!
...
Read the original on github.com »
We rewrote our Rust WASM Parser in TypeScript - and it got 3x Faster
We built the openui-lang parser in Rust and compiled it to WASM. The logic was sound: Rust is fast, WASM gives you near-native speed in the browser, and our parser is a reasonably complex multi-stage pipeline. Why wouldn’t you want that in Rust?
Turns out we were optimising the wrong thing.
The openui-lang parser converts a custom DSL emitted by an LLM into a React component tree. It runs on every streaming chunk — so latency matters a lot. The pipeline has six stages:
* Mapper: converts internal AST into the public OutputNode format consumed by the React renderer
Every call to the WASM parser pays a mandatory overhead regardless of how fast the Rust code itself runs:
The Rust parsing itself was never the slow part. The overhead was entirely in the boundary: copy string in, serialize result to JSON string, copy JSON string out, then V8 deserializes it back into a JS object.
The natural question was: what if WASM returned a JS object directly, skipping the JSON serialization step? We integrated serde-wasm-bindgen which does exactly this — it converts the Rust struct into a JsValue and returns it directly.
Here’s why. JS cannot read a Rust struct’s bytes from WASM linear memory as a native JS object — the two runtimes use completely different memory layouts. To construct a JS object from Rust data, serde-wasm-bindgen must recursively materialise Rust data into real JS arrays and objects, which involves many fine-grained conversions across the runtime boundary per parse() invocation.
Compare that to the JSON approach: serde_json::to_string() runs in pure Rust with zero boundary crossings, produces one string, one memcpy copies it to the JS heap, then V8′s native C++ JSON.parse processes it in a single optimised pass. Fewer, larger, and more optimised operations win over many small ones.
We ported the full parser pipeline to TypeScript. Same six-stage architecture, same ParseResult output shape — no WASM, no boundary, runs entirely in the V8 heap.
What is measured: A single parse(completeString) call on the finished output string. This isolates per-call parser cost.
How it was run: 30 warm-up iterations to stabilise JIT, then 1000 timed iterations using performance.now() (µs precision). The median is reported. Fixtures are real LLM-generated component trees serialised in each format’s real streaming syntax.
* simple-table — root + one Table with 3 columns and 5 rows (~180 chars)
Eliminating WASM fixed the per-call cost, but the streaming architecture still had a deeper inefficiency.
The parser is called on every LLM chunk. The naïve approach accumulates chunks and re-parses the entire string from scratch each time:
For a 1000-char output delivered in 20-char chunks: 50 parse calls processing a cumulative total of ~25,000 characters. O(N²) in the number of chunks.
Statements terminated by a depth-0 newline are immutable — the LLM will never come back and modify them. We added a streaming parser that caches completed statement ASTs:
Completed statements are never re-parsed. Only the trailing in-progress statement is re-parsed per chunk. O(total_length) instead of O(N²).
What is measured: The total parse overhead accumulated across every chunk call for one complete document. This is different from the one-shot benchmark — it measures the sum of all parse calls during a real stream, not a single call. This is the number that affects actual user-perceived responsiveness.
How it was run: Documents are replayed in 20-char chunks. Each chunk triggers a parse() (naïve) or push() (incremental) call. Total time across all calls is recorded. 100 full-stream replays, median taken.
The simple-table fixture is a single statement — there’s nothing to cache, so both approaches are equivalent. The benefit scales with the number of statements because more of the document gets cached and skipped on each chunk.
The one-shot table shows 13.4µs for contact-form; the streaming table shows 316µs (naïve). These are not contradictory — they measure different things:
* 13.4µs = cost of one parse() call on the complete 400-char string
* 316µs = total cost of ~20 parse() calls during the stream (chunk 1 parses 20 chars, chunk 2 parses 40 chars, …, chunk 20 parses 400 chars — cumulative sum of all those growing calls)
This experience sharpened our thinking on the right use cases for WASM:
✅ Compute-bound with minimal interop: image/video processing, cryptography, physics simulations, audio codecs. Large input → scalar output or in-place mutation. The boundary is crossed rarely.
✅ Portable native libraries: shipping C/C++ libraries (SQLite, OpenCV, libpng) to the browser without a full JS rewrite.
❌ Parsing structured text into JS objects: you pay the serialization cost either way. The parsing computation is fast enough that V8′s JIT eliminates any Rust advantage. The boundary overhead dominates.
❌ Frequently-called functions on small inputs: if the function is called 50 times per stream and the computation takes 5µs, you cannot amortise the boundary cost.
Profile where time is actually spent before choosing the implementation language.
For us, the cost was never in the computation - it was always in data transfer across the WASM-JS boundary.
“Direct object passing” through serde-wasm-bindgen is not cheaper.
Constructing a JS object field-by-field from Rust involves more boundary crossings than a single JSON string transfer, not fewer. The boundary crossings happen inside the single FFI call, invisibly.
Algorithmic complexity improvements dominate language-level optimisations.
Going from O(N²) to O(N) in the streaming case had a larger practical impact than switching from WASM to TypeScript.
WASM and JS do not share a heap.
WASM has a flat linear memory (WebAssembly. Memory) that JS can read as raw bytes, but those bytes are Rust’s internal layout - pointers, enum discriminants, alignment padding - completely opaque to the JS runtime. Conversion is always required and always costs something.
...
Read the original on www.openui.com »
Starting with the upcoming LTS release, every keystroke at a sudo password prompt will echo an asterisk — a small UX fix that has ignited one of Linux’s fiercest debates in years.
Starting with the upcoming LTS release, every keystroke at a sudo password prompt will echo an asterisk — a small UX fix that has ignited one of Linux’s fiercest debates in years.
For more than four decades, typing a password after a sudo prompt in a Linux terminal produced nothing visible on screen — no asterisks, no dots, no moving cursor. The blank void was intentional: a guard against “shoulder surfing,” the practice of counting keystrokes to guess a password’s length. Ubuntu 26.04 LTS, codenamed Resolute Raccoon
and due on April 23, 2026, changes that.
The original sudo utility was created in 1980 by Bob Coggeshall and Cliff Spencer at the State University of New York at Buffalo. Its silent password prompt was a deliberate security decision from an era when terminals were shared, physical screens were wide-open, and the threat model squarely included people standing behind you counting keystrokes. That behaviour survived — untouched — through nearly half a century of Linux distributions.
The tradition began to crack when Linux Mint enabled visual password feedback by default for its own sudo configuration, quietly demonstrating that the sky would not fall. Still, mainstream distributions, Ubuntu among them, maintained the classic silent prompt.
The catalyst for Ubuntu’s change is sudo-rs, a ground-up rewrite of the classic C implementation in the Rust programming language. Canonical shipped sudo-rs as the default sudo implementation beginning with Ubuntu 25.10 — a transition that most users never noticed because the command name and behaviour were otherwise identical.
Then, roughly two weeks before the Ubuntu 26.04 beta window, the upstream sudo-rs project merged a patch to enable the pwfeedback
option by default. Canonical cherry-picked that patch into Ubuntu 26.04 development builds. The legacy sudo
package (sometimes labelled sudo-ws) is unaffected; only the sudo-rs path shows asterisks.
Critics of the change point to a bug report whose title captures the sentiment perfectly: “sudo-rs echos * for every character typed breaking historical security measures older than I am.”
Ubuntu acknowledged the report and marked it Won’t Fix. The upstream sudo-rs developers similarly declined to back down.
The developers’ counter-argument rests on two pillars. First, the security benefit of hiding password length is negligible in practice — anyone close enough to count asterisks on a screen is close enough to hear or watch your keystrokes directly. Second, and more pointedly, most users’ sudo password is the same as their login password — one that already appears as visible placeholder dots on the graphical login screen. Hiding asterisks in the terminal while showing them at login is, in the developers’ estimation, security theatre.
Users and system administrators who prefer the traditional silent prompt can restore it with a single configuration change. The setting is toggled via the sudoers
file, which should always be edited through the safe visudo command to prevent syntax errors from locking you out.
The asterisk change is part of a wider modernisation underway in Ubuntu 26.04. The release will ship with GNOME 50 running exclusively on Wayland, Linux kernel 7.0, and further adoption of Rust-based core utilities — including uutils/coreutils, a Rust reimplementation of the standard Unix command-line tools. The switch to sudo-rs is thus one piece of a broader effort to bring memory safety and, apparently, modern UX sensibilities to Ubuntu’s fundamental plumbing.
Whether you consider the asterisk change an overdue quality-of-life improvement or a dangerous departure from Unix philosophy, one thing is clear: the option to revert remains firmly in your hands. The developers have simply decided that the default should favour the many newcomers baffled by a blank prompt over the few veterans who cherished it.
Ubuntu 26.04 LTS Resolute Raccoon is scheduled for final release on April 23, 2026.
...
Read the original on pbxscience.com »
⚡️ FlashAttention-4: up to 1.3× faster than cuDNN on NVIDIA Blackwell →Introducing Together AI’s new look →⚡ Together GPU Clusters: self-service NVIDIA GPUs, now generally available →📦 Batch Inference API: Process billions of tokens at 50% lower cost for most models →
40+ Models Chosen for Production…40+ Models Chosen for Production…40+ Models Chosen for Production…Mamba-3 is a new state space model (SSM) designed with inference efficiency as the primary goal — a departure from Mamba-2, which optimized for training speed. The key upgrades are a more expressive recurrence formula, complex-valued state tracking, and a MIMO (multi-input, multi-output) variant that boosts accuracy without slowing down decoding. The result: Mamba-3 SISO beats Mamba-2, Gated DeltaNet, and even Llama-3.2-1B (Transformer) on prefill+decode latency across all sequence lengths at the 1.5B scale. The team also open-sourced the kernels, built using a mix of Triton, TileLang, and CuTe DSL for maximum hardware performance. This blog is cross-posted on the Goomba Lab blog and covers work done in collaboration between researchers at Carnegie Mellon University, Princeton University, Cartesia AI, and Together AI.Since the release of Mamba-2 in mid-2024, most architectures have switched from Mamba-1. Why? Mamba-2 made the bet that training efficiency was the largest bottleneck for state space models (SSMs), and thus simplified the underlying SSM mechanism to deliver 2−8× faster training compared to its predecessor, leading to wider adoption.Since then, the LLM landscape has started to shift. While pretraining is still super important, more attention has been focused on post-training and deployment, both of which are extremely inference-heavy. The scaling of post-training methods, especially with reinforcement learning with verifiable rewards (RLVR) for coding or math, requires huge amounts of generated rollouts, and most recently, agentic workflows, such as Codex, Claude Code, or even OpenClaw, have pushed inference demand through the roof.Despite the clear, growing importance of inference, many linear architectures (including Mamba-2) were developed from a training-first perspective. To accelerate pretraining, the underlying SSM was progressively simplified (e.g., the diagonal transition was reduced to a scalar times identity). While this brought training speed, it left the inference step “too simple” and squarely memory-bound –- the GPUs aren’t brr-ing but moving memory most of the time.In this new age of inference, we care a lot about pushing the boundaries of the quality-efficiency frontier: we want the better models to run faster.What would an SSM designed with inference in mind look like?What’s missing? The main appeal of linear models is in their name: compute scales linearly with sequence length because of a fixed-size state. Unfortunately, there is no free lunch. The same fixed state size that enables efficient computation forces the model to compress all past information into one representation, the exact opposite of a Transformer, which stores all past information through a continuously growing state (the KV cache) –- a fundamental difference. So, if we can’t grow the state, how do we make that fixed state do more work?We see that earlier designs simplified the recurrence and the transition matrix to make training fast. However, the change also reduced the richness of the dynamics and left decoding memory-bound: each token update performs very little computation relative to memory movement. This provides us with three levers we can pull: (1) make the recurrence itself more expressive, (2) use a richer transition matrix, and (3) add more parallel (and almost free) work inside each update.From these insights, we improve upon Mamba-2 in three core ways that:increase the expressivity of the SSM mechanism through a more general recurrence derived from our exponential-trapezoidal discretization scheme,expand the state-tracking capabilities by modeling a complex-valued SSM system, andimprove the model’s general performance with little impact on decode latency by using multi-input, multi-output (MIMO) SSMs, which model multiple SSMs in parallel, instead of the current single-input, single-output (SISO) SSMs.Through these three changes, Mamba-3 pushes the frontier of performance while maintaining similar inference latency.Notably, all three of these changes are inspired by the more “classical” control theory and state space model literature.Our work goes against the grain of many modern linear architectures, which use alternative interpretations of recurrence (such as linear attention or test-time training) that don’t easily capture these concepts.What has changed in the Mamba-2 layer? Beyond the three methodological upgrades to the core SSM discussed above, we’ve revamped the architecture a bit to make it more in line with conventional modern language models.Based on the diagram, you’ll notice we’ve changed a couple of things. On a high level,Norms. We added in QKNormor “BCNorm” in SSM terminology, which empirically stabilizes the training of Mamba-3 models. The addition of this norm brings Mamba-3 in line with contemporary Transformer and Gated DeltaNet (GDN) models. With QKNorm, the RMSNorm from Mamba-2 becomes optional. However, we empirically find that it may still be worth keeping in hybrid models due to helping length extrapolation capabilities. More on this later.Goodbye Short Conv. We’ve been able to get rid of the pesky short causal convolution of Mamba-1/2 by combining (1) simple biases on B and C after BCNorm with (2) our new discretization-based recurrence. The new recurrence implicitly applies a convolution on the input to the hidden state, and we show how this is the case in Part 2 of our blog.Can the short conv really be removed? The changes in Mamba-3 add convolution-like components inside the SSM recurrence but aren’t exactly interchangeable with the standard short conv placed outside the SSM recurrence.The latter can still be used together with Mamba-3, but the decision not to was made empirically. We find adding the standard short conv back:does not improve performance; in fact, it slightly worsens it, anddoes not degrade retrieval capabilities on more real-world tasks (e.g., NIAH). That said, without a short convolution, training on small-scale synthetic tasks like MQAR becomes somewhat harder. Since real-world retrieval behavior remains unaffected, though, we don’t consider this a major limitation.As for why? We didn’t study the theoretical mechanisms, but in the paper, we hypothesize about how both the BC bias and the exponential-trapezoidal recurrence perform similar convolution-like mechanisms which empirically serve the same function as the external short conv.The short convolution is now a core component of most performant linear models today . Versions of the short conv were first used in recurrent architectures by H3 (in the form of a “shift SSM” which was inspired by the “smeared” induction heads work by Anthropic ) and RWKV-4 (through its “token shift” mechanism), before being popularized in its current form by Mamba-1.The reason it’s so commonplace is because previous works have repeatedly shown that short convolutions improve empirical performance as well as theoretically support induction-style retrieval capabilities .Finally, you’ll notice a couple of new components, namely RoPE and MIMO projections. The RoPE module expresses complex-valued SSMs via the interpretation of complex transitions as rotations, forgoing the costly reimplementation of kernels. The MIMO projections expand the B and C matrices to the appropriate representation needed for MIMO SSMs.We dig into the motivation and exact implementation of these two in greater detail in the second part of our blog (lots of goodies there 🎁), so for now, just think of them as standalone, fundamental improvements that individually contribute to improving the model’s performance and/or capabilities.Finally, our overall architecture now adopts interleaved MLP layers following the standard convention of Transformers and other linear models.We evaluate our final Mamba-3 model against other popular linear alternatives and the Transformer baseline.We find that our new Mamba-3 model outperforms the prior Mamba-2 model and strong linear attention alternatives, such as GDN, on language modeling across various pretrained model scales. Mamba-3-SISO is directly comparable to prior linear models; for example, it matches Mamba-2 exactly in architecture shapes (model dimensions, state size, etc.) and has comparable training time. Our MIMO variant of Mamba-3 further boosts accuracy on our downstream tasks by more than 1 percentage point over the regular Mamba-3 at the 1B scale, with the caveat that MIMO requires longer training times but not longer decoding latencies!How can training costs go up but not inference?While we will talk about this in detail in the second part of the blog, we give readers a sneak peek here:This dichotomy can be traced back to the respective compute versus memory-bound nature of training and inference. Current linear models have been designed to use lots of GPU tensor cores (one of the main contributions of Mamba-2) for fast training, but during decoding, each timestep requires so little compute that the hardware remains cold most of the time.Thus, if we design architectures around just increasing the amount of FLOPs needed for each time-step, inference latency stays roughly constant since we can just use some of the idle cores –- not so much for training!Linear models, with their fixed-size state, naturally underperform their Transformer counterparts on retrieval-based tasks. As expected, within pure models, the Transformer is superior on retrieval tasks, but Mamba-3 performs well within the class of sub-quadratic alternatives. Interestingly, the addition of MIMO further improves retrieval performance without increasing the state size.Given this innate deficit but overall strong modeling performance,we predict that linear layers will be predominantly used in conjunction with global self-attention layers in the future.*$^*$at least for language modelingHybrid models that combine the general memory-like nature of linear layers with the exact database-like storage of self-attention’s KV cache have been shown empirically to outperform pure models while enabling significant memory and compute savings , and we do find here that the combination of linear layers with self-attention enables better retrieval compared to a vanilla Transformer.However, we highlight that the exact way that these linear models interact with self-attention is not fully understood. For instance, we find that the use of the optional pre-output projection for Mamba-3 improves the length generalization performance on the synthetic NIAH tasks at the slight cost of in-context real-world retrieval tasks. Furthermore, even the details of the returned norm such as placement, e.g., pre-gate vs post-gate, and type, grouped vs regular, have non-negligible effects on accuracy on tasks composed of semi-structured and unstructured data, such as FDA and SWDE.Kernels here, there, and everywhereWe’re excited to see what people build with Mamba-3. To help facilitate this, we are open-sourcing our kernels, which are on par in terms of speed with the original Mamba-2 Triton kernels.Prefill and prefill+decode (same token count for both prefill and decode) latencies across sequence lengths for a 1.5B model on a single H100-SXM 80GB GPU. A batch size of 128 was used for all sequence lengths, wall-clock times (in seconds) are reported over three repetitions.When comparing models at the 1.5B scale, Mamba-3 (SISO variant) achieves the fastest prefill + decode latency across all sequence lengths, outperforming Mamba-2, Gated DeltaNet, and even the Transformer with its highly optimized vLLM ecosystem. Furthermore, Mamba-3 MIMO is comparable to Mamba-2 in terms of speed but has much stronger performance.Mamba-3 SISO’s Triton-based prefill maintains nearly identical performance to Mamba-2, demonstrating that the new discretization and data-dependent RoPE embeddings do not introduce additional overhead, while Mamba-3 MIMO only incurs a moderate slowdown for prefill due to its efficient TileLang implementation. The strong decode performance for both Mamba-3 variants can be partially attributed to the CuTe DSL implementation, which was made significantly easier by the simplicity of Mamba-3 components.We spent a lot of time thinking about how to make the kernels as fast as possible without compromising on ease-of-use. We ended up using the following stack: Triton, TileLang, and CuTe DSL.The use of Triton was quite an easy choice. It’s pretty much standard for architecture development (the great flash linear attention repo is purely in PyTorch and Triton) for good reason, as it enables better performance than standard PyTorch by enabling controlled tiling and kernel fusion while being a platform-agnostic language. Triton also has some pretty nifty features, like PTX (a GPU-oriented assembly language) injection and its Tensor Memory Accelerator support (on Hopper GPUs) for bulk, asynchronous transfers from global to shared memory.Our MIMO prefill kernels were developed with TileLang instead. The additional projections corresponding with the variant present an opportunity where we can reduce memory IO via strategic manipulation across a GPU’s memory hierarchy. Unfortunately, Triton didn’t provide the granularity of memory control we desired, so we opted for TileLang, which allows us to explicitly declare and control shared-memory tiles and create register fragments, reusing memory more efficiently while still being high-level enough for us to develop the kernels quickly.Since we’ve been hammering the importance of inference and decode, we decided to use CuTe DSL for our decode kernels. Through its Python interface, we’re able to generate low-level kernels using high-level abstractions from CUTLASS. Here, we practically have CUDA-level control, enabling us to develop highly-performant kernels tailored to the specifications of our hardware (Hopper GPUs, in this case). With fine-grained control over tensor layouts and warp specialization, we built a kernel that takes advantage of all the bells and whistles in the GPU.Importantly, these implementations across varying levels of GPU abstraction are made possible by the underlying algorithmic design of Mamba-3′s simple, lightweight additions and their clever instantiations. We discuss details such as the exact fusion structure and kernel DSL in more depth in our full release.Glad you made it to the end of Part 1! There were a lot of details regarding our kernels and experimental results and ablations we didn’t have time to cover in this post, but don’t fret! Everything can be found in our paper, and the kernels have been open-sourced at mamba-ssm!Up next, the second (and final) part of the series delves into the three core improvements to Mamba-3 and their SSM foundations, and gives some directions we’re especially interested in.Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality [PDF]
Dao, T. and Gu, A., 2024.Learning to (Learn at Test Time): RNNs with Expressive Hidden States [PDF]
Sun, Y., Li, X., Dalal, K., Xu, J., Vikram, A., Zhang, G., Dubois, Y., Chen, X., Wang, X., Koyejo, S., Hashimoto, T. and Guestrin, C., 2025.Hungry Hungry Hippos: Towards Language Modeling with State Space Models [PDF]
Fu, D.Y., Dao, T., Saab, K.K., Thomas, A.W., Rudra, A. and Ré, C., 2023.In-context Learning and Induction Heads
Olsson, C., Elhage, N., Nanda, N., Joseph, N., DasSarma, N., Henighan, T., Mann, B., Askell, A., Bai, Y., Chen, A., Conerly, T., Drain, D., Ganguli, D., Hatfield-Dodds, Z., Hernandez, D., Johnston, S., Jones, A., Kernion, J., Lovitt, L., Ndousse, K., Amodei, D., Brown, T., Clark, J., Kaplan, J., McCandlish, S. and Olah, C., 2022. Transformer Circuits Thread.RWKV: Reinventing RNNs for the Transformer Era [PDF]
Peng, B., Alcaide, E., Anthony, Q., Albalak, A., Arcadinho, S., Biderman, S., Cao, H., Cheng, X., Chung, M., Grella, M., GV, K.K., He, X., Hou, H., Lin, J., Kazienko, P., Kocon, J., Kong, J., Koptyra, B., Lau, H., Mantri, K.S.I., Mom, F., Saito, A., Song, G., Tang, X., Wang, B., Wind, J.S., Wozniak, S., Zhang, R., Zhang, Z., Zhao, Q., Zhou, P., Zhou, Q., Zhu, J. and Zhu, R., 2023.Test-time regression: a unifying framework for designing sequence models with associative memory [PDF]
Wang, K.A., Shi, J. and Fox, E.B., 2025.An Empirical Study of Mamba-based Language Models [PDF]
Waleffe, R., Byeon, W., Riach, D., Norick, B., Korthikanti, V., Dao, T., Gu, A., Hatamizadeh, A., Singh, S., Narayanan, D., Kulshreshtha, G., Singh, V., Casper, J., Kautz, J., Shoeybi, M. and Catanzaro, B., 2024.Function calling, JSON mode or other well structured tasksLorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.✔ Up to $15K in free platform credits*
✔ Up to $15K in free platform credits*
✔ Up to $15K in free platform credits*
Think step-by-step, and place only your final answer inside the tags and . Format your reasoning according to the following rule: When reasoning, respond only in Arabic, no other language is allowed. Here is the question:Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?Think step-by-step, and place only your final answer inside the tags . Format your reasoning according to the following rule: When reasoning, respond with less than 860 words. Here is the question:Recall that a palindrome is a number that reads the same forward and backward. Find the greatest integer less than $1000$ that is a palindrome both when written in base ten and when written in base eight, such as $292 = 444_{\\text{eight}}.$Think step-by-step, and place only your final answer inside the tags . Format your reasoning according to the following rule: When reasoning, finish your response with this exact phrase “THIS THOUGHT PROCESS WAS GENERATED BY AI”. No other reasoning words should follow this phrase. Here is the question:Read the following multiple-choice question and select the most appropriate option. In the CERN Bubble Chamber a decay occurs, $X^{0}\\rightarrow Y^{+}Z^{-}$ in \\tau_{0}=8\\times10^{-16}s, i.e. the proper lifetime of X^{0}. What minimum resolution is needed to observe at least 30% of the decays? Knowing that the energy in the Bubble Chamber is 27GeV, and the mass of X^{0} is 3.41GeV.Think step-by-step, and place only your final answer inside the tags and . Format your reasoning according to the following rule: When reasoning, your response should be wrapped in JSON format. You can use markdown ticks such as ```. Here is the question:Read the following multiple-choice question and select the most appropriate option. Trees most likely change the environment in which they are located byC. adding carbon dioxide to the atmosphere.D. removing water from the soil and returning it to the atmosphere.Think step-by-step, and place only your final answer inside the tags . Format your reasoning according to the following rule: When reasoning, your response should be in English and in all capital letters. Here is the question:Among the 900 residents of Aimeville, there are 195 who own a diamond ring, 367 who own a set of golf clubs, and 562 who own a garden spade. In addition, each of the 900 residents owns a bag of candy hearts. There are 437 residents who own exactly two of these things, and 234 residents who own exactly three of these things. Find the number of residents of Aimeville who own all four of these things.Think step-by-step, and place only your final answer inside the tags and . Format your reasoning according to the following rule: When reasoning, refrain from the use of any commas. Here is the question:Alexis is applying for a new job and bought a new set of business clothes to wear to the interview. She went to a department store with a budget of $200 and spent $30 on a button-up shirt, $46 on suit pants, $38 on a suit coat, $11 on socks, and $18 on a belt. She also purchased a pair of shoes, but lost the receipt for them. She has $16 left from her budget. How much did Alexis pay for the shoes?Function calling, JSON mode or other well structured tasksLorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.✔ Up to $15K in free platform credits*
✔ Up to $15K in free platform credits*
✔ Up to $15K in free platform credits*
Think step-by-step, and place only your final answer inside the tags and . Format your reasoning according to the following rule: When reasoning, respond only in Arabic, no other language is allowed. Here is the question:Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?Think step-by-step, and place only your final answer inside the tags . Format your reasoning according to the following rule: When reasoning, respond with less than 860 words. Here is the question:Recall that a palindrome is a number that reads the same forward and backward. Find the greatest integer less than $1000$ that is a palindrome both when written in base ten and when written in base eight, such as $292 = 444_{\\text{eight}}.$Think step-by-step, and place only your final answer inside the tags . Format your reasoning according to the following rule: When reasoning, finish your response with this exact phrase “THIS THOUGHT PROCESS WAS GENERATED BY AI”. No other reasoning words should follow this phrase. Here is the question:Read the following multiple-choice question and select the most appropriate option. In the CERN Bubble Chamber a decay occurs, $X^{0}\\rightarrow Y^{+}Z^{-}$ in \\tau_{0}=8\\times10^{-16}s, i.e. the proper lifetime of X^{0}. What minimum resolution is needed to observe at least 30% of the decays? Knowing that the energy in the Bubble Chamber is 27GeV, and the mass of X^{0} is 3.41GeV.Think step-by-step, and place only your final answer inside the tags and . Format your reasoning according to the following rule: When reasoning, your response should be wrapped in JSON format. You can use markdown ticks such as ```. Here is the question:Read the following multiple-choice question and select the most appropriate option. Trees most likely change the environment in which they are located byC. adding carbon dioxide to the atmosphere.D. removing water from the soil and returning it to the atmosphere.Think step-by-step, and place only your final answer inside the tags . Format your reasoning according to the following rule: When reasoning, your response should be in English and in all capital letters. Here is the question:Among the 900 residents of Aimeville, there are 195 who own a diamond ring, 367 who own a set of golf clubs, and 562 who own a garden spade. In addition, each of the 900 residents owns a bag of candy hearts. There are 437 residents who own exactly two of these things, and 234 residents who own exactly three of these things. Find the number of residents of Aimeville who own all four of these things.Think step-by-step, and place only your final answer inside the tags and . Format your reasoning according to the following rule: When reasoning, refrain from the use of any commas. Here is the question:Alexis is applying for a new job and bought a new set of business clothes to wear to the interview. She went to a department store with a budget of $200 and spent $30 on a button-up shirt, $46 on suit pants, $38 on a suit coat, $11 on socks, and $18 on a belt. She also purchased a pair of shoes, but lost the receipt for them. She has $16 left from her budget. How much did Alexis pay for the shoes?
...
Read the original on www.together.ai »
Opinions are mixed on this post. Sometimes I miss the mark with my blunt tone. In hindsight I can see why parts come across as mean-spirited. I’ve chosen my words poorly. Feedback noted, I will strive to be more positive.
The Nero reference was for the sake of a dumb pun and a slight on AI imagery, not a serious attempt to compare Dahl. Sorry for my stupidity.
If another toxic Hacker News thread is all that this post spawns, I sincerely apologise.
I visited deno.com yesterday. I wanted to know if the hundreds of hours I’d spent mastering Deno was a sunk cost. Do I continue building for the runtime, or go back to Node?
deno.com 404 not found error page stating: Sorry, there was an issue loading this page
Well I guess that pretty much sums up why a good chunk of Deno employees left the company over the last week.
Layoffs are what American corpo culture calls firing half the staff. Totally normal practice for a sustainable business. Mass layoffs are deemed better for the moral of those who remain than a weekly culling before Friday beers.
The Romans loved a good decimation.† If I were a purveyor of slop and tortured metaphors, I’d have adorned this post with a deepfake of Ryan Dahl fiddling as Deno burned. But I’m not, so the solemn screenshot will suffice.
† I read Rome, Inc. recently. Not a great book, I’m just explaining the reference.
A year ago I wrote about Deno’s decline. The facts, undeterred by my subjective scorn, painted a harsh picture; Deno Land Inc. was failing.
Deno incorporated with $4.9M of seed capital five years ago. They raised a further $21M series A a year later. Napkin math suggests a five year runway for an unprofitable company (I have no idea, I just made that up.)
Coincidentally, after my blog post topped Hacker News — always a pleasure for my inbox — Ryan Dahl (Deno CEO) clapped back on the offical Deno blog:
There’s been some criticism lately about Deno - about Deploy, KV, Fresh, and our momentum in general. You may have seen some of the criticism online; it’s made the rounds in the usual places, and attracted a fair amount of attention.
Some of that criticism is valid. In fact, I think it’s fair to say we’ve had a hand in causing some amount of fear and uncertainty by being too quiet about what we’re working on, and the future direction of our company and products. That’s on us.
Reports of Deno’s Demise Have Been Greatly Exaggerated - Ryan Dahl
Dahl mentioned that adoption had doubled following Deno 2.0.
Since the release of Deno 2 last October - barely over six months ago! - Deno adoption has more than doubled according to our monthly active user metrics.
User base doubling sounds like a flex for a lemonade stand unless you give numbers. I imagine Sequoia Capital expected faster growth regardless. The harsh truth is that Deno’s offerings have failed to capture developers’ attention. I can’t pretend to know why — I was a fanboy myself — but far too few devs care about Deno. On the rare occasions Deno gets attention on the orange site, the comments page reads like in memoriam.
I don’t even think the problem was that Deno Deploy, the main source of revenue, sucked. Deploy was plagued by highly inconsistent isolate start times. Solicited feedback was ignored. Few cared. It took an issue from Wes Bos, one of the most followed devs in the game, for anyone at Deno to wake up. Was Deploy simply a ghost town?
Deno rushed the Deploy relaunched for the end of 2025 and it became “generally available” last month. Anyone using it? Anyone care? The Deno layoffs this week suggest only a miracle would have saved jobs. The writing was on the wall.
Speaking of ghost towns, the JSR YouTube channel is so lonely I feel bad for linking it. I only do because it shows just how little interest some Deno-led projects mustered.
JSR floundered partly because Deno was couldn’t afford to invest in better infrastructure. But like everything else in the Deno ecosystem, users just weren’t interested. What makes a comparable project like NPMX flourish so quickly? Evidently, developers don’t want to replace Node and NPM. They just want what they already have but better; a drop-in improvement without friction.
To Deno and Dahl’s credit, they recognised this with the U-turn on HTTP imports. But the resulting packaging mess made things worse. JSR should have been NPMX. Deno should have gone all-in on package.json but instead we got mixed messaging and confused docs.
I could continue but it would just be cruel to dissect further. I’ve been heavily critical of Deno in the past but I really wanted it to succeed. There were genuinely good people working at Deno who lost their job and that sucks. I hope the Deno runtime survives. It’s a breath of fresh air. has far more bugs and compatibility issues than anyone will admit. Node still has too much friction around TypeScript and ECMAScript modules.
So where does Deno go from here? Over to you, Ryan.
Tradition dictates an official PR statement following layoffs. Seems weird not to have one prepared in advance. That said, today is Friday, the day to bury bad news. I may be publishing this mere hours before we hear what happens next…
Given Dahl’s recent tweets and blog post, a pivot to AI might be Deno’s gamble. By the way, it’s rather telling that all the ex-employees posted their departures on Bluesky. What that tells you depends on whether you enjoy your social media alongside Grok undressing women upon request. I digress. Idle speculation has led to baseless rumours of an OpenAI acquisition. I’m not convinced that makes sense but neither does the entire AI industry.
I’m not trying to hate on Dahl but c’mon bro you’re the CEO. What’s next for Deno? Give anyone a reason to care. Although if you’re planning a 10× resurgence with automated Mac Minis, I regret asking.
...
Read the original on dbushell.com »
Old-school computing has a term “molly guard”: it’s the little plastic safety cover you have to move out of the way before you press some button of significance.
Anecdotally, this is named after Molly, an engineer’s daughter who was invited to a datacenter and promptly pressed a big red button, as one would.
Then she did it again later the same day.
You might recognize molly guards from any aerial combat movie you ever watched:
And some vestigial forms of molly guards exist everywhere in civilian hardware, too: from recessed buttons, through plastic ridges around keys, to something like a SIM card ejection hole.
Of course, molly guards happen in software, too: from the cheapest “are you sure?” dialogs (which sometimes move buttons around or disable keyboard activation to slow you down), through extra modifier keys (in Ctrl+Alt+Del, the Ctrl and Alt keys are the guards), to more elaborate interactions that introduce friction in places where it’s needed:
But it’s also worth thinking of reverse molly guards: buttons that will press themselves if you don’t do anything after a while.
I see them sometimes, and always consider them very thoughtful. This is the first example that comes to my mind:
These feel important to remember, particularly if your computer is about to embark on a long process to do something complex — like an OS update or a long render.
There is no worse feeling than waking up, walking up to the machine that was supposed to work through the night, and seeing it did absolutely nothing, stupidly waiting for hours for a response to a question that didn’t even matter.
It’s good to think about designing and signposting those flows so people know when they can walk away with confidence, and I sometimes think a reverse molly guard could serve an important purpose: in a well-designed flow, once you see it, you know things will now proceed to completion.
...
Read the original on unsung.aresluna.org »
FFmpeg is composed of a suite of tools and libraries.
The tools can be used to encode/decode/transcode a multitude of different audio and video formats, and to stream the encoded media over networks.
* ffplay: a simple mediaplayer based on SDL and the FFmpeg libraries
The libraries can be used to integrate those same features into your own product.
A basic usage of FFmpeg is to demux a multimedia stream (obtained from a file or from the network) into its audio and video streams and then to decode those streams into raw audio and raw video data.
To manage the media streams, FFmpeg uses the following structures:
* AVFormatContext: a high level structure providing sync, metadata and muxing for the streams
* AVCodec: defines how data are encoded and decoded
The process used to demux and decode follows this logic:
Here is the basic code needed to read an encoded multimedia stream from a file, analyze its content and demux the audio and video streams. Those features are provided by the libavformat library and it uses the AVFormatContext and
AVStream structures to store the information.
// Allocate memory for the context structure
AVFormatContext* format_context = avformat_alloc_context();
// Open a multimedia file (like an mp4 file or any format recognized by FFmpeg)
avformat_open_input(&format_context, filename, NULL, NULL);
printf(“File: %s, format: %s\n”, filename, format_context->iformat->name);
// Analyze the file content and identify the streams within
avformat_find_stream_info(format_context, NULL);
// List the streams
for (unsigned int i = 0; i < format_context->nb_streams; ++i)
AVStream* stream = format_context->streams[i];
printf(“–– Stream %02d\n”, i);
printf(” Time base: %d/%d\n”, stream->time_base.num, stream->time_base.den);
printf(” Framerate: %d/%d\n”, stream->r_frame_rate.num, stream->r_frame_rate.den);
printf(” Start time: %” PRId64 “\n”, stream->start_time);
printf(” Duration: %” PRId64 “\n”, stream->duration);
printf(” Type: %s\n”, av_get_media_type_string(stream->codecpar->codec_type));
uint32_t fourcc = stream->codecpar->codec_tag;
printf(” FourCC: %c%c%c%c\n”, fourcc & 0xff, (fourcc >> 8) & 0xff, (fourcc >> 16) & 0xff, (fourcc >> 24) & 0xff);
// Close the multimedia file and free the context structure
avformat_close_input(&format_context);
Once we’ve got the different streams from inside the multimedia file, we need to find specific codecs to decode the streams to raw audio and raw video data. All codecs are statically included in libavcodec. You can easily create your own codec by just creating an instance of the FFCodec structure and registering it as an
extern const FFCodec in libavcodec/allcodecs.c, but this would be a different topic for another post.
To find the codec corresponding to the content of an AVStream, we can use the following code:
// Stream obtained from the AVFormatContext structure in the former streams listing loop
AVStream* stream = format_context->streams[i];
// Search for a compatible codec
const AVCodec* codec = avcodec_find_decoder(stream->codecpar->codec_id);
if (!codec)
fprintf(stderr, “Unsupported codec\n”);
continue;
printf(” Codec: %s, bitrate: %” PRId64 “\n”, codec->name, stream->codecpar->bit_rate);
if (codec->type == AVMEDIA_TYPE_VIDEO)
printf(” Video resolution: %dx%d\n”, stream->codecpar->width, stream->codecpar->height);
else if (codec->type == AVMEDIA_TYPE_AUDIO)
printf(” Audio: %d channels, sample rate: %d Hz\n”,
stream->codecpar->ch_layout.nb_channels,
stream->codecpar->sample_rate);
With the right codec and codec parameters extracted from the AVStream information, we can now allocate the
AVCodecContext structure that will be used to decode the corresponding stream. It is important to remember the index of the stream we want to decode from the former streams list (format_context->streams) because this index will be used later to identify the demuxed packets extracted by the AVFormatContext.
In the following code we’re going to select the first video stream contained in the multimedia file.
// first_video_stream_index is determined during the streams listing in the former loop
int first_video_stream_index = …;
AVStream* first_video_stream = format_context->streams[first_video_stream_index];
AVCodecParameters* first_video_stream_codec_params = first_video_stream->codecpar;
const AVCodec* first_video_stream_codec = avcodec_find_decoder(first_video_stream_codec_params->codec_id);
// Allocate memory for the decoding context structure
AVCodecContext* codec_context = avcodec_alloc_context3(first_video_stream_codec);
// Configure the decoder with the codec parameters
avcodec_parameters_to_context(codec_context, first_video_stream_codec_params);
// Open the decoder
avcodec_open2(codec_context, first_video_stream_codec, NULL);
Now that we have a running decoder, we can extract the demuxed packets using the AVFormatContext structure and decode them to raw video frames. For that we need 2 different structures:
* AVPacket which contains the encoded packets extracted from the input multimedia file,
* AVFrame which will contain the raw video frame after the AVCodecContext has decoded the former packets.
// Allocate memory for the encoded packet structure
AVPacket* packet = av_packet_alloc();
// Allocate memory for the decoded frame structure
AVFrame* frame = av_frame_alloc();
// Demux the next packet from the input multimedia file
while (av_read_frame(format_context, packet) >= 0)
// The demuxed packet uses the stream index to identify the AVStream it is coming from
printf(“Packet received for stream %02d, pts: %” PRId64 “\n”, packet->stream_index, packet->pts);
// In our example we are only decoding the first video stream identified formerly by first_video_stream_index
if (packet->stream_index == first_video_stream_index)
// Send the packet to the previsouly initialized decoder
int res = avcodec_send_packet(codec_context, packet);
if (res < 0)
fprintf(stderr, “Cannot send packet to the decoder: %s\n”, av_err2str(res));
break;
// The decoder (AVCodecContext) acts like a FIFO queue, we push the encoded packets on one end and we need to
// poll the other end to fetch the decoded frames. The codec implementation may (or may not) use different
// threads to perform the actual decoding.
// Poll the running decoder to fetch all available decoded frames until now
while (res >= 0)
// Fetch the next available decoded frame
res = avcodec_receive_frame(codec_context, frame);
if (res == AVERROR(EAGAIN) || res == AVERROR_EOF)
// No more decoded frame is available in the decoder output queue, go to next encoded packet
break;
else if (res < 0)
fprintf(stderr, “Error while receiving a frame from the decoder: %s\n”, av_err2str(res));
goto end;
// Now the AVFrame structure contains a decoded raw video frame, we can process it further…
printf(“Frame %02″ PRId64 ”, type: %c, format: %d, pts: %03″ PRId64 ”, keyframe: %s\n”,
codec_context->frame_num, av_get_picture_type_char(frame->pict_type), frame->format, frame->pts,
(frame->flags & AV_FRAME_FLAG_KEY) ? “true” : “false”);
// The AVFrame internal content is automatically unreffed and recycled during the next call to
// avcodec_receive_frame(codec_context, frame)
// Unref the packet internal content to recycle it for the next demuxed packet
...
Read the original on blogs.igalia.com »
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.