10 interesting stories served every morning and every evening.
About Kagi Log in Try for free
...
Read the original on translate.kagi.com »
Reddit researcher exposes Meta’s $2B campaign to force Apple and Google into building surveillance systems while exempting its own platforms
Reddit researcher exposes Meta’s $2B campaign to force Apple and Google into building surveillance systems while exempting its own platforms
A Reddit researcher just exposed how Meta funneled over $2 billion through shadowy nonprofits to push age verification laws that would force Apple and Google to build surveillance infrastructure into every device—while conveniently exempting Meta’s own platforms from the same requirements.
The investigation by GitHub user “upper-up” traces funding through organizations like the Digital Childhood Alliance (DCA), which launched December 18, 2024, and testified for Utah’s SB-142 just days later. Bloomberg and Deseret News reported Meta’s backing of DCA, part of a $70 million fragmented super PAC strategy designed to evade FEC tracking. Traditional election spending disclosure requirements don’t apply to this fragmented approach.
The technical reality hits harder than policy abstractions. These bills mandate OS-level APIs that apps can query for age data—creating a permanent identity layer baked into your phone’s core functions. Meta’s Horizon OS for Quest VR already implements this infrastructure through Family Center controls. Now they want Apple and Google to build similar systems that every app can access, turning age verification into persistent device fingerprinting.
Here’s where the lobbying gets surgical. The proposed laws hammer Apple’s App Store and Google Play with compliance requirements but reportedly spare social media platforms—Meta’s core business. It’s like Spotify lobbying for streaming regulations that only apply to Apple Music. The “child safety” rhetoric masks a competitive strategy that shifts liability from platforms to operating system makers.
The European Union’s Digital Identity Wallet takes a radically different approach. Zero-knowledge proofs let you verify age without revealing personal data—like showing you’re over 18 without disclosing your birthdate or identity details. It’s open-source, self-hostable, and only applies to large platforms while exempting FOSS and small entities. Meanwhile, US lawmakers seem ready to let Meta bamboozle them into complete privacy annihilation.
Your device’s trustworthiness hangs in the balance. These laws could force every Linux distribution and privacy-focused Android fork to implement identity verification or face legal liability. The choice between surveillance-free computing and regulatory compliance is coming faster than you think.
...
Read the original on www.gadgetreview.com »
A groundbreaking hack for Microsoft’s ‘unhackable’ Xbox One was revealed at the recent RE//verse 2026 conference. This console has remained a fortress since its launch in 2013, but now Markus ‘Doom’ Gaasedelen has showcased the ‘Bliss’ double glitch. Just as the Xbox 360 famously fell to the Reset Glitch Hack (RGH), the Xbox One has now fallen to Voltage Glitch Hacking (VGH).
“In 2013 some kind of iron curtain came down on security, of the Xbox ecosystem, and the Xbox One never got hacked,” noted Gaasedelen in his introduction. The same is true of the Xbox One’s successors, and Microsoft was rightly proud. Seven years after its launch, Microsoft engineers would still assert that the Xbox One was “the most secure product Microsoft has ever produced.”
What made the Xbox One so secure, so special? Gaasedelen referenced prior work and presentations to convey this information. I’ve shared a summary slide about this, too, but let’s fast forward to the demo of the new Bliss hack, which takes place from about 46 minutes into the presentation.
Since reset glitching wasn’t possible, Gaasedelen thought some voltage glitching could do the trick. So, instead of tinkering with the system rest pin(s) the hacker targeted the momentary collapse of the CPU voltage rail. This was quite a feat, as Gaasedelen couldn’t ‘see’ into the Xbox One, so had to develop new hardware introspection tools.
Eventually, the Bliss exploit was formulated, where two precise voltage glitches were made to land in succession. One skipped the loop where the ARM Cortex memory protection was setup. Then the Memcpy operation was targeted during the header read, allowing him to jump to the attacker-controlled data.
As a hardware attack against the boot ROM in silicon, Gaasedelen says the attack in unpatchable. Thus it is a complete compromise of the console allowing for loading unsigned code at every level, including the Hypervisor and OS. Moreover, Bliss allows access to the security processor so games, firmware, and so on can be decrypted.
What happens next with this technique remains to be seen. Digital archivists should enjoy new levels of access to Xbox One firmware, OS, games. There could be subsequent emulation breakthroughs thanks to this effort. We also now have a route to making a Bliss-a-like mod chip to automate the precise electrical glitching required.
Whether PC users, our core readership, will be interested in actually emulating Xbox One, looks unlikely. The 2013 system’s game library is largely overlapped in better quality on the PC platform.
Follow Tom’s Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.
...
Read the original on www.tomshardware.com »
Every layer of review makes you 10x slower
We’ve all heard of those network effect laws: the value of a network goes up with the square of the number of members. Or the cost of communication goes up with the square of the number of members, or maybe it was n log n, or something like that, depending how you arrange the members. Anyway doubling a team doesn’t double its speed; there’s coordination overhead. Exactly how much overhead depends on how badly you botch the org design.
But there’s one rule of thumb that someone showed me decades ago, that has stuck with me ever since, because of how annoyingly true it is. The rule is annoying because it doesn’t seem like it should be true. There’s no theoretical basis for this claim that I’ve ever heard. And yet, every time I look for it, there it is.
Here we go:
I know what you’re thinking. Come on, 10x? That’s a lot. It’s unfathomable. Surely we’re exaggerating.
Just to be clear, we’re counting “wall clock time” here rather than effort. Almost all the extra time is spent sitting and waiting.
Get it code reviewed by the peer next to you
300 minutes → 5 hours → half a day
Get a design doc approved by your architects team first
50 hours → about a week
Get it on some other team’s calendar to do all that
(for example, if a customer requests a feature)
500 hours → 12 weeks → one fiscal quarter
I wish I could tell you that the next step up — 10 quarters or about 2.5 years — was too crazy to contemplate, but no. That’s the life of an executive sitting above a medium-sized team; I bump into it all the time even at a relatively small company like Tailscale if I want to change product direction. (And execs sitting above large teams can’t actually do work of their own at all. That’s another story.)
First of all, this isn’t a post about AI, because AI’s direct impact on this problem is minimal. Okay, so Claude can code it in 3 minutes instead of 30? That’s super, Claude, great work.
Now you either get to spend 27 minutes reviewing the code yourself in a back-and-forth loop with the AI (this is actually kinda fun); or you save 27 minutes and submit unverified code to the code reviewer, who will still take 5 hours like before, but who will now be mad that you’re making them read the slop that you were too lazy to read yourself. Little of value was gained.
Now now, you say, that’s not the value of agentic coding. You don’t use an agent on a 30-minute fix. You use it on a monstrosity week-long project that you and Claude can now do in a couple of hours! Now we’re talking. Except no, because the monstrosity is so big that your reviewer will be extra mad that you didn’t read it yourself, and it’s too big to review in one chunk so you have to slice it into new bite-sized chunks, each with a 5-hour review cycle. And there’s no design doc so there’s no intentional architecture, so eventually someone’s going to push back on that and here we go with the design doc review meeting, and now your monstrosity week-long project that you did in two hours is… oh. A week, again.
I guess I could have called this post Systems Design 4 (or 5, or whatever I’m up to now, who knows, I’m writing this on a plane with no wifi) because yeah, you guessed it. It’s Systems Design time again.
The only way to sustainably go faster is fewer reviews
It’s funny, everyone has been predicting the Singularity for decades now. The premise is we build systems that are so smart that they themselves can build the next system that is even smarter, that builds the next smarter one, and so on, and once we get that started, if they keep getting smarter faster enough, then the incremental time (t) to achieve a unit (u) of improvement goes to zero, so (u/t) goes to infinity and foom.
Anyway, I have never believed in this theory for the simple reason we outlined above: the majority of time needed to get anything done is not actually the time doing it. It’s wall clock time. Waiting. Latency.
And you can’t overcome latency with brute force.
I know you want to. I know many of you now work at companies where the business model kinda depends on doing exactly that.
But you can’t just not review things!
Ah, well, no, actually yeah. You really can’t.
There are now many people who have seen the symptom: the start of the pipeline (AI generated code) is so much faster, but all the subsequent stages (reviews) are too slow! And so they intuit the obvious solution: stop reviewing then!
The result might be slop, but if the slop is 100x cheaper, then it only needs to deliver 1% of the value per unit and it’s still a fair trade. And if your value per unit is even a mere 2% of what it used to be, you’ve doubled your returns! Amazing.
There are some pretty dumb assumptions underlying that theory; you can imagine them for yourself. Suffice it to say that this produces what I will call the AI Developer’s Descent Into Madness:
Whoa, I produced this prototype so fast! I have super powers!
This prototype is getting buggy. I’ll tell the AI to fix the bugs.
Hmm, every change now causes as many new bugs as it fixes.
Aha! But if I have an AI agent also review the code, it can find its own bugs!
Wait, why am I personally passing data back and forth between agents
I can have my agent write an agent framework!
It’s actually alarming how many friends and respected peers I’ve lost to this cycle already. Claude Code only got good maybe a few months ago, so this only recenlty started happening, so I assume they will emerge from the spiral eventually. I mean, I hope they will. We have no way of knowing.
Anyway we know our symptom: the pipeline gets jammed up because of too much new code spewed into it at step 1. But what’s the root cause of the clog? Why doesn’t the pipeline go faster?
I said above that this isn’t an article about AI. Clearly I’m failing at that so far, but let’s bring it back to humans. It goes back to the annoyingly true observation I started with: every layer of review is 10x slower. As a society, we know this. Maybe you haven’t seen it before now. But trust me: people who do org design for a living know that layers are expensive… and they still do it.
As companies grow, they all end up with more and more layers of collaboration, review, and management. Why? Because otherwise mistakes get made, and mistakes are increasingly expensive at scale. The average value added by a new feature eventually becomes lower than the average value lost through the new bugs it causes. So, lacking a way to make features produce more value (wouldn’t that be nice!), we try to at least reduce the damage.
The more checks and controls we put in place, the slower we go, but the more monotonically the quality increases. And isn’t that the basis of continuous improvement?
Well, sort of. Monotonically increasing quality is on the right track. But “more checks and controls” went off the rails. That’s only one way to improve quality, and it’s a fraught one.
I wrote a few years ago about W. E. Deming and the “new” philosophy around
quality that he popularized in Japanese auto manufacturing. (Eventually U. S. auto manufacturers more or less got the idea. So far the software industry hasn’t.)
One of the effects he highlighted was the problem of a “QA” pass in a factory: build widgets, have an inspection/QA phase, reject widgets that fail QA. Of course, your inspectors probably miss some of the failures, so when in doubt, add a second QA phase after the first to catch the remaining ones, and so on.
In a simplistic mathematical model this seems to make sense. (For example, if every QA pass catches 90% of defects, then after two QA passes you’ve reduced the number of defects by 100x. How awesome is that?)
But in the reality of agentic humans, it’s not so simple. First of all, the incentives get weird. The second QA team basically serves to evaluate how well the first QA team is doing; if the first QA team keeps missing defects, fire them. Now, that second QA team has little incentive to produce that outcome for their friends. So maybe they don’t look too hard; after all, the first QA team missed the defect, it’s not unreasonable that we might miss it too.
Furthermore, the first QA team knows there is a second QA team to catch any defects; if I don’t work too hard today, surely the second team will pick up the slack. That’s why they’re there!
Also, the team making the widgets in the first place doesn’t check their work too carefully; that’s what the QA team is for! Why would I slow down the production of every widget by being careful, at a cost of say 20% more time, when there are only 10 defects in 100 and I can just eliminate them at the next step for only a 10% waste overhead? It only makes sense. Plus they’ll fire me if I go 20% slower.
To say nothing of a whole engineering redesign to improve quality, that would be super expensive and we could be designing all new widgets instead.
Sound like any engineering departments you know?
Well, this isn’t the right time to rehash Deming, but suffice it to say, he was on to something. And his techniques worked. You get things like the famous Toyota Production System where they eliminated the QA phase entirely, but gave everybody an “oh crap, stop the line, I found a defect!” button.
Famously, US auto manufacturers tried to adopt the same system by installing the same “stop the line” buttons. Of course, nobody pushed those buttons. They were afraid of getting fired.
The basis of the Japanese system that worked, and the missing part of the American system that didn’t, is trust. Trust among individuals that your boss Really Truly Actually wants to know about every defect, and wants you to stop the line when you find one. Trust among managers that executives were serious about quality. Trust among executives that individuals, given a system that can work and has the right incentives, will produce quality work and spot their own defects, and push the stop button when they need to push it.
But, one more thing: trust that the system actually does work. So first you need a system that will work.
AI coders are fallible; they write bad code, often. In this way, they are just like human programmers.
Deming’s approach to manufacturing didn’t have any magic bullets. Alas, you can’t just follow his ten-step process and immediately get higher quality engineering. The secret is, you have to get your engineers to engineer higher quality into the whole system, from top to bottom, repeatedly. Continuously.
Every time something goes wrong, you have to ask, “How did this happen?” and then do a whole post-mortem and the Five Whys (or however many Whys are in fashion nowadays) and fix the underlying Root Causes so that it doesn’t happen again. “The coder did it wrong” is never a root cause, only a symptom. Why was it possible for the coder to get it wrong?
The job of a code reviewer isn’t to review code. It’s to figure out how to obsolete their code review comment, that whole class of comment, in all future cases, until you don’t need their reviews at all anymore.
By the time your review catches a mistake, the mistake has already been made. The root cause happened already. You’re too late.
I wish I could tell you I had all the answers. Actually I don’t have much. If I did, I’d be first in line for the Singularity because it sounds kind of awesome.
I think we’re going to be stuck with these systems pipeline problems for a long time. Review pipelines — layers of QA — don’t work. Instead, they make you slower while hiding root causes. Hiding causes makes them harder to fix.
But, the call of AI coding is strong. That first, fast step in the pipeline is so fast! It really does feel like having super powers. I want more super powers. What are we going to do about it?
Maybe we finally have a compelling enough excuse to fix the 20 years of problems hidden by code review culture, and replace it with a real culture of quality.
I think the optimists have half of the right idea. Reducing review stages, even to an uncomfortable degree, is going to be needed. But you can’t just reduce review stages without something to replace them. That way lies the Ford Pinto or any recent Boeing aircraft.
The complete package, the table flip, was what Deming brought to manufacturing. You can’t half-adopt a “total quality” system. You need to eliminate the reviews and obsolete them, in one step.
How? You can fully adopt the new system, in small bites. What if some components of your system can be built the new way? Imagine an old-school U. S. auto manufacturer buying parts from Japanese suppliers; wow, these parts are so well made! Now I can start removing QA steps elsewhere because I can just assume the parts are going to work, and my job of “assemble a bigger widget from the parts” has a ton of its complexity removed.
I like this view. I’ve always liked small beautiful things, that’s my own bias. But, you can assemble big beautiful things from small beautiful things.
It’s a lot easier to build those individual beautiful things in small teams that trust each other, that know what quality looks like to them. They deliver their things to customer teams who can clearly explain what quality looks like to them. And on we go. Quality starts bottom-up, and spreads.
I think small startups are going to do really well in this new world, probably better than ever. Startups already have fewer layers of review just because they have fewer people. Some startups will figure out how to produce high quality components quickly; others won’t and will fail. Quality by natural selection?
Bigger companies are gonna have a harder time, because their slow review systems are baked in, and deleting them would cause complete chaos.
But, it’s not just about company size. I think engineering teams at any company can get smaller, and have better defined interfaces between them.
Maybe you could have multiple teams inside a company competing to deliver the same component. Each one is just a few people and a few coding bots. Try it 100 ways and see who comes up with the best one. Again, quality by evolution. Code is cheap but good ideas are not. But now you can try out new ideas faster than ever.
Maybe we’ll see a new optimal point on the monoliths-microservices
continuum. Microservices got a bad name because they were too micro; in the original terminology, a “micro” service was exactly the right size for a “two pizza team” to build and operate on their own. With AI, maybe it’s one pizza and some tokens.
What’s fun is you can also use this new, faster coding to experiment with different module boundaries faster. Features are still hard for lots of reasons, but refactoring and automated integration testing are things the AIs excel at. Try splitting out a module you were afraid to split out before. Maybe it’ll add some lines of code. But suddenly lines of code are cheap, compared to the coordination overhead of a bigger team maintaining both parts.
Every team has some monoliths that are a little too big, and too many layers of reviews. Maybe we won’t get all the way to Singularity. But, we can engineer a much better world. Our problems are solvable.
...
Read the original on apenwarr.ca »
What is now known as the Slug Algorithm for rendering fonts directly from Bézier curves on the GPU was developed in the Fall of 2016, so this year marks a full decade since its inception. I published a paper in JCGT about the technique in the middle of 2017, and my company sold the first license for version 1.0 of the Slug Library not long afterward. Since then, Slug has been licensed widely in the video games industry as well as by an array of companies specializing in areas like scientific visualization, CAD, video editing, medical equipment, and even planetariums. Our clients include Activision, Blizzard, id Software, 2K Games, Ubisoft, Warner Brothers, Insomniac, Zenimax, and Adobe among many others. Slug turned out to be the most successful software product I’ve ever made.
I originally created Slug in pursuit of better text rendering for the C4 Engine, where fonts needed to look great not only in the GUI, but inside game levels where they could appear very large and be viewed at oblique angles. Most recently, I used Slug to build the Radical Pie equation editor, which of course, needs extremely high-quality font rendering as well as vector graphics for things like brackets, radicals, and purely graphical items like arrows and highlights attached to mathematical expressions. Slug is also used to render the entire user interface inside the main editing window and all dialog boxes.
This post talks about what has changed within the rendering method since 2017, when the paper was published and the Slug Library was first released. It then concludes with an exciting announcement for those who may want to implement the Slug algorithm for their own projects.
Slug renders text and vector graphics on the GPU directly from Bézier curve data without the use of texture maps containing precomputed or cached images of any kind. Doing this robustly, while also being fast and producing high quality results, is a difficult problem when we have to deal with floating-point round-off errors. Robustness requires that we never see artifacts like dropped pixels, sparkles, or streaks under any circumstances, provably so. Being fast means that the algorithm can render any reasonable amount of text on the game consoles of 2016 without impacting frame rates significantly. Producing high-quality results means that we get nicely antialiased text with smooth curves and sharp corners when viewed at any scale and from any perspective. The principles by which the Slug rendering algorithm achieves all of this are summarized in the following diagram. (Click for PDF version.)
The method that determines root eligibility and calculates the winding number, which is responsible for robustness, is pretty much exactly the same now as it was in 2017 when Slug was first released. Some other parts of the rendering code that were described in the paper have changed over the years, however. I’ll briefly describe the smaller changes here before talking about the big addition called “dynamic dilation” in its own section below.
The original paper included a description of a “band split optimization” that could be turned on when it was known that glyphs would be rendered at a large size. It did provide a speed increase for large glyphs, but it also introduced some divergence in the pixel shader that could hurt performance a little for text rendered at a small size. This optimization also required that the list of curves intersecting each band be stored twice, once sorted for rays pointing in one direction and again sorted for rays pointing in the opposite direction. The speed improvement was modest and didn’t apply universally, so I decided to remove it. This eliminated some complexity in the pixel shader, and more importantly, it allowed the band data to be cut in half. The texture containing the band data now uses two 16-bit components instead of four.
In the Extensions section at the end of the paper, there was some discussion about supersampling. Though not necessary for rendering text at ordinary sizes, adaptive supersampling was implemented in early versions of Slug to enhance text drawn at very small sizes. If small text was rendered far away in a 3D scene, then supersampling reduced the amount of aliasing significantly as the camera moved, and because it was adaptive, the number of samples taken larger text was still just one. Supersampling was removed because (a) it made a difference only for text so small that it was barely readable anyway and (b) aliasing for tiny text was mitigated to a high degree by the dilation technique described below. Removing supersampling also simplified the pixel shader considerably. (Conditional compilation already eliminated the supersampling code when it was turned off, so its removal did not mean that the ordinary single-sample shader got any faster.)
The Extensions section also talked about adding a loop to the pixel shader in order to render multi-color emoji, which are essentially a stack of glyphs in which each layer has a different color. This proved to be unoptimal because many of the layers often only covered a small fraction of the total area of the composite glyph, but per-layer rendering calculations were still being performed over the full bounding polygon. It turned out to be better to render a bunch of independent glyphs on top of each other, even though it increased the amount of vertex data, so that each layer could have its own bounding polygon. This was faster, and it again simplified the pixel shader code.
There has been one major improvement to the rendering algorithm since the introduction of the Slug Library. It’s called dynamic dilation, and it solves the problem discussed in a previous post from 2019 when it was first added to the code. Before dynamic dilation, the user had to manually specify a constant distance by which every glyph’s bounding polygon would be expanded to ensure that all partially covered pixels get rasterized. This has two disadvantages: (a) if you choose a distance that’s too small, then glyphs rendered below a certain size start to have aliasing artifacts along their boundaries, and (b) any chosen distance will be too large for glyphs above a certain size, leaving empty space that eats up performance for no reason.
Dynamic dilation makes the optimal choice automatic, and it is recalculated in the vertex shader every time a glyph is rendered. The technique uses the current model-view-projection (MVP) matrix and viewport dimensions to determine how far a vertex needs to be moved outward along its normal direction in object space to effectively expand the bounding polygon by half a pixel in viewport space. This guarantees that the centers of any partially covered pixels are inside the bounding polygon so the rasterizer will pick them up. When text is viewed in perspective, the dilation distance can be different for each vertex. The code always produces the optimal value so that there’s never any unnecessary padding that wastes GPU resources.
The dynamic dilation calculation done in the vertex shader is shown in the diagram above, but I haven’t provided a derivation of it anywhere. So here we go. The goal is to find the distance d we must move an object-space vertex position \(\mathbf p = (p_x, p_y, 0, 1)\) along its normal vector \(\mathbf n = (n_x, n_y, 0, 0)\) for it to correspond to a half-pixel expansion of the bounding polygon in viewport space. The normal does not have unit length, but is instead scaled so that it would point to the new vertex location if both adjacent sides of the bounding polygon were to be pushed outward by one unit of distance, as shown in the diagram. We first calculate the distance d along the unit normal direction \(\hat{\mathbf n} = (\hat n_x, \hat n_y, 0)\) and then apply that to the original normal vector n to obtain the new vertex position \(\mathbf p + d\mathbf n\).
By applying the MVP matrix m (which is \(4 \times 4\)), the perspective divide, and the viewport scaling by its width w and height h to an object-space position p offset by the distance d in the unit normal direction \(\hat{\mathbf n}\), we can express differences \(\Delta x\) and \(\Delta y\) in viewport space as
If we set \((\Delta x)^2 + (\Delta y)^2 = (\frac{1}{2})^2\), then the offset in viewport space is one-half pixel. We just need to solve this equation for d, but it gets pretty messy. When we multiply everything out, simplify as much as possible, and write this as a quadratic equation in d, we get
It is convenient to make the assignments \(s = m_{30}p_x + m_{31}p_y + m_{33}\) and \(t = m_{30}\hat n_x + m_{31}\hat n_y\), which let us write
finally gives us the simplified quadratic equation
which has the solutions
Choosing the plus sign obtains the distance outward along the unit normal vector that the vertex needs to be moved for a half-pixel dilation. To make sure the glyph is still rendered at the original size, the em-space sampling coordinates also need to be offset. A \(2 \times 2\) inverse Jacobian matrix is stored with each vertex, and it gives us the information we need to transform an object-space displacement into an em-space offset vector. The Jacobian matrix, before inverting, is the upper-left \(2 \times 2\) portion of the transformation matrix that converts em-space coordinates to object-space coordinates, accounting for scale, stretch, skew, and possibly flips of the coordinate axes.
I was granted a patent for the Slug algorithm in 2019, and I legally have exclusive rights to it until the year 2038. But I think that’s too long. The patent has already served its purpose well, and I believe that holding on to it any longer benefits nobody. Therefore, effective today, I am permanently and irrevocably dedicating the Slug patent to the public domain. That means anybody can freely implement the Slug algorithm from this day forward without a license for whatever purpose they want, and they don’t need to worry about infringing upon any intellectual property rights. (For any legal experts reading this, my company has filed form SB/43 with the USPTO and paid the fee to disclaim the terminal part of the term for patent #10,373,352, effective March 17, 2026.)
To aid in implementations of the Slug algorithm, reference vertex and pixel shaders based on the actual code used in the Slug Library have been posted in a new GitHub repository and made available under the MIT license. The pixel shader is a significant upgrade compared to the code included with the JCGT paper, and the vertex shader includes dynamic dilation, which had not yet been implemented when the paper was published.
...
Read the original on terathon.com »
Spending your tokens to support Django by having an LLM work on tickets is not helpful. You and the community are better off donating that money to the Django Software Foundation instead.
We’re in a new era where people don’t have to type out all of their code. I used an LLM to build a good part of the new functionality in the djangonaut.space site. I know I wouldn’t have shipped that much in that amount of time without using an LLM.
But Django is different. The level of quality is much, much higher. This is because it has a much larger user base, it changes slowly, and the community expects it to be in use 20 years from now. It’s partly why it’s such an honor to have your name among the list of contributors.
This isn’t about whether you use an LLM, it’s about whether you still understand what’s being contributed. What I see now is people who are using LLMs to generate the code and write the PR description and handle the feedback from the PR review. It’s to the extent where I can’t tell if there’d be a difference if the reviewer had just used the LLM themselves. And that is a big problem.
If you do not understand the ticket, if you do not understand the solution, or if you do not understand the feedback on your PR, then your use of LLM is hurting Django as a whole.
Django contributors want to help others, they want to cultivate community, and they want to help you become a regular contributor. Before LLMs, this was easier to sense because you were limited to communicating what you understood. With LLMs, it’s much easier to communicate a sense of understanding to the reviewer, but the reviewer doesn’t know if you actually understood it.
In this way, an LLM is a facade of yourself. It helps you project understanding, contemplation, and growth, but it removes the transparency and vulnerability of being a human.
For a reviewer, it’s demoralizing to communicate with a facade of a human.
This is because contributing to open source, especially Django, is a communal endeavor. Removing your humanity from that experience makes that endeavor more difficult. If you use an LLM to contribute to Django, it needs to be as a complementary tool, not as your vehicle.
Use an LLM to develop your comprehension. Then communicate the best you can in your own words, then use an LLM to tweak that language. If you’re struggling to convey your ideas with someone, use an LLM more aggressively and mention that you used it. This makes it easier for others to see where your understanding is and where there are disconnects.
There needs to be understanding when contributing to Django. There’s no way around it. Django has been around for 20 years and expects to be around for another 20. Any code being added to a project with that outlook on longevity must be well understood.
There is no shortcut to understanding. If you want to contribute to Django, you will have to spend time reading, experimenting, and learning. Contributing to Django will help you grow as a developer.
While it is nice to be listed as a contributor to Django, the growth you earn from it is incredibly more valuable.
So please, stop using an LLM to the extent it hides you and your understanding. We want to know you, and we want to collaborate with you.
...
Read the original on www.better-simple.com »
A new minor release, FFmpeg 8.1 “Hoare”, is now available for download. Here are some of the highlights:
This release features a lot of internal changes and bugfixes. The groundwork for the upcoming swscale rewrite is progressing. The Vulkan compute-based codecs, and a few filters, no longer depend on runtime GLSL compilation, which speeds up their initialization.
A companion post about the Vulkan Compute-based codec implementations has been published on the Khronos blog, featuring technical details on the implementations and future plans.
We recommend users, distributors, and system integrators to upgrade unless they use current git master.
A new major release, FFmpeg 8.0 “Huffman”, is now available for download. Thanks to several delays, and modernization of our entire infrastructure, this release ended up being one of our largest releases to date. In short, its new features are:
A new class of decoders and encoders based on pure Vulkan compute implementation have been added. Vulkan is a cross-platform, open standard set of APIs that allows programs to use GPU hardware in various ways, from drawing on screen, to doing calculations, to decoding video via custom hardware accelerators. Rather than using a custom hardware accelerator present, these codecs are based on compute shaders, and work on any implementation of Vulkan 1.3.
Decoders use the same hwaccel API and commands, so users do not need to do anything special to enable them, as enabling Vulkan decoding is sufficient to use them.
Encoders, like our hardware accelerated encoders, require specifying a new encoder (ffv1_vulkan). Currently, the only codecs supported are: FFv1 (encoding and decoding) and ProRes RAW (decode only). ProRes (encode+decode) and VC-2 (encode+decode) implementations are complete and currently in review, to be merged soon and available with the next minor release.
Only codecs specifically designed for parallelized decoding can be implemented in such a way, with more mainstream codecs not being planned for support.
Depending on the hardware, these new codecs can provide very significant speedups, and open up possibilities to work with them for situations like non-linear video editors and lossless screen recording/streaming, so we are excited to learn what our downstream users can make with them.
The project has recently started to modernize its infrastructure. Our mailing list servers have been fully upgraded, and we have recently started to accept contributions via a new forge, available on code.ffmpeg.org, running a Forgejo instance.
As usual, we recommend that users, distributors, and system integrators to upgrade unless they use current git master.
FFmpeg 7.1 “Péter”, a new major release, is now available! A full list of changes can be found in the release changelog.
The more important highlights of the release are that the VVC decoder, merged as experimental in version 7.0, has had enough time to mature and be optimized enough to be declared as stable. The codec is starting to gain traction with broadcast standardization bodies.
Support has been added for a native AAC USAC (part of the xHE-AAC coding system) decoder, with the format starting to be adopted by streaming websites, due to its extensive volume normalization metadata.
MV-HEVC decoding is now supported. This is a stereoscopic coding tool that begun to be shipped and generated by recent phones and VR headsets.
LC-EVC decoding, an enhancement metadata layer to attempt to improve the quality of codecs, is now supported via an external library.
Support for Vulkan encoding, with H264 and HEVC was merged. This finally allows fully Vulkan-based decode-filter-encode pipelines, by having a sink for Vulkan frames, other than downloading or displaying them. The encoders have feature-parity with their VAAPI implementation counterparts. Khronos has announced that support for AV1 encoding is also coming soon to Vulkan, and FFmpeg is aiming to have day-one support.
In addition to the above, this release has had a lot of important internal work done. By far, the standout internally are the improvements made for full-range images. Previously, color range data had two paths, no negotiation, and was unreliably forwarded to filters, encoders, muxers. Work on cleaning the system up started more than 10 years ago, however this stalled due to how fragile the system was, and that breaking behaviour would be unacceptable. The new system fixes this, so now color range is forwarded correctly and consistently everywhere needed, and also laid the path for more advanced forms of negotiation.
Cropping metadata is now supported with Matroska and MP4 formats. This metadata is important not only for archival, but also with AV1, as hardware encoders require its signalling due to the codec not natively supporting one.
As usual, we recommend that users, distributors, and system integrators to upgrade unless they use current git master.
The number of issues FFmpeg has in Coverity (a static analyzer) is now lower than it has been since 2016. Our defect density is less than one 30th of the average in OSS with over a million code lines. All this was possible thanks to a grant from the Sovereign Tech Fund.
FFmpeg now implements a native xHE-AAC decoder. Currently, streams without (e)SBR, USAC or MPEG-H Surround are supported, which means the majority of xHE-AAC streams in use should work. Support for USAC and (e)SBR is coming soon. Work is also ongoing to improve its stability and compatibility. During the process we found several specification issues, which were then submitted back to the authors for discussion and potential inclusion in a future errata.
The FFmpeg community is excited to announce that Germany’s Sovereign Tech Fund
has become its first governmental sponsor. Their support will help sustain the maintainance of the FFmpeg project, a critical open-source software multimedia component essential to bringing audio and video to billions around the world everyday.
A new major release, FFmpeg 7.0 “Dijkstra”, is now available for download. The most noteworthy changes for most users are a native VVC decoder (currently experimental, until more fuzzing is done), IAMF support, or a multi-threaded ffmpeg CLI tool.
This release is not backwards compatible, removing APIs deprecated before 6.0. The biggest change for most library callers will be the removal of the old bitmask-based channel layout API, replaced by the AVChannelLayout API allowing such features as custom channel ordering, or Ambisonics. Certain deprecated ffmpeg
CLI options were also removed, and a C11-compliant compiler is now required to build the code.
As usual, there is also a number of new supported formats and codecs, new filters, APIs, and countless smaller features and bugfixes. Compared to 6.1, the git repository contains almost ∼2000 new commits by ∼100 authors, touching >100000 lines in ∼2000 files — thanks to everyone who contributed. See the Changelog, APIchanges, and the git log for more comprehensive lists of changes.
The libavcodec library now contains a native VVC (Versatile Video Coding) decoder, supporting a large subset of the codec’s features. Further optimizations and support for more features are coming soon. The code was written by Nuo Mi, Xu Mu, Frank Plowman, Shaun Loo, and Wu Jianhua.
The libavformat library can now read and write IAMF
(Immersive Audio) files. The ffmpeg CLI tool can configure IAMF structure with the new -stream_group option. IAMF support was written by James Almer.
Thanks to a major refactoring of the ffmpeg command-line tool, all the major components of the transcoding pipeline (demuxers, decoders, filters, encodes, muxers) now run in parallel. This should improve throughput and CPU utilization, decrease latency, and open the way to other exciting new features.
Note that you should not expect significant performance improvements in cases where almost all computational time is spent in a single component (typically video encoding).
FFmpeg 6.1 “Heaviside”, a new major release, is now available! Some of the highlights:
* command support in the setpts and asetpts filters
* Bitstream filter for converting VVC from MP4 to Annex B
* support for the P_SKIP hinting to speed up libx264 encoding
* ffmpeg CLI ‘-top’ option deprecated in favor of the setfield filter
* ffprobe XML output schema changed to account for multiple variable-fields elements within the same parent element
* ffprobe -output_format option added as an alias of -of
This release had been overdue for at least half a year, but due to constant activity in the repository, had to be delayed, and we were finally able to branch off the release recently, before some of the large changes scheduled for 7.0 were merged.
Internally, we have had a number of changes too. The FFT, MDCT, DCT and DST implementation used for codecs and filters has been fully replaced with the faster libavutil/tx (full article about it coming soon).
This also led to a reduction in the the size of the compiled binary, which can be noticeable in small builds.
There was a very large reduction in the total amount of allocations being done on each frame throughout video decoders, reducing overhead.
RISC-V optimizations for many parts of our DSP code have been merged, with mainly the large decoders being left.
There was an effort to improve the correctness of timestamps and frame durations of each packet, increasing the accurracy of variable frame rate video.
Next major release will be version 7.0, scheduled to be released in February. We will attempt to better stick to the new release schedule we announced at the start of this year.
We strongly recommend users, distributors, and system integrators to upgrade unless they use current git master.
A few days ago, Vulkan-powered decoding hardware acceleration code was merged into the codebase. This is the first vendor-generic and platform-generic decode acceleration API, enabling the same code to be used on multiple platforms, with very minimal overhead. This is also the first multi-threaded hardware decoding API, and our code makes full use of this, saturating all available decode engines the hardware exposes.
Those wishing to test the code can read our documentation page. For those who would like to integrate FFmpeg’s Vulkan code to demux, parse, decode, and receive a VkImage to present or manipulate, documentation and examples are available in our source tree. Currently, using the latest available git checkout of our repository is required. The functionality will be included in stable branches with the release of version 6.1, due to be released soon.
As this is also the first practical implementation of the specifications, bugs may be present, particularly in drivers, and, although passing verification, the implementation itself. New codecs, and encoding support are also being worked on, by both the Khronos organization for standardizing, and us as implementing it, and giving feedback on improving.
A new major release, FFmpeg 6.0 “Von Neumann”, is now available for download. This release has many new encoders and decoders, filters, ffmpeg CLI tool improvements, and also, changes the way releases are done. All major releases will now bump the version of the ABI. We plan to have a new major release each year. Another release-specific change is that deprecated APIs will be removed after 3 releases, upon the next major bump. This means that releases will be done more often and will be more organized.
New decoders featured are Bonk, RKA, Radiance, SC-4, APAC, VQC, WavArc and a few ADPCM formats. QSV and NVenc now support AV1 encoding. The FFmpeg CLI (we usually reffer to it as ffmpeg.c to avoid confusion) has speed-up improvements due to threading, as well as statistics options, and the ability to pass option values for filters from a file. There are quite a few new audio and video filters, such as adrc, showcwt, backgroundkey and ssim360, with a few hardware ones too. Finally, the release features many behind-the-scenes changes, including a new FFT and MDCT implementation used in codecs (expect a blog post about this soon), numerous bugfixes, better ICC profile handling and colorspace signalling improvement, introduction of a number of RISC-V vector and scalar assembly optimized routines, and a few new improved APIs, which can be viewed in the doc/APIchanges file in our tree. A few submitted features, such as the Vulkan improvements and more FFT optimizations will be in the next minor release, 6.1, which we plan to release soon, in line with our new release schedule. Some highlights are:
* ffmpeg now requires threading to be built
* ffmpeg now runs every muxer in a separate thread
* Add new mode to cropdetect filter to detect crop-area based on motion vectors and edges
* VAAPI decoding and encoding for 10/12bit 422, 10/12bit 444 HEVC and VP9
* QSV decoding and encoding for 10/12bit 422, 10/12bit 444 HEVC and VP9
* filtergraph syntax in ffmpeg CLI now supports passing file contents as option values
We strongly recommend users, distributors, and system integrators to upgrade unless they use current git master.
FFmpeg 5.1 “Riemann”, a new major release, is now available! Some of the highlights:
We strongly recommend users, distributors, and system integrators to upgrade unless they use current git master.
FFmpeg 5.0 “Lorentz”, a new major release, is now available! For this long-overdue release, a major effort underwent to remove the old encode/decode APIs and replace them with an N:M-based API, the entire libavresample library was removed, libswscale has a new, easier to use AVframe-based API, the Vulkan code was much improved, many new filters were added, including libplacebo integration, and finally, DoVi support was added, including tonemapping and remuxing. The default AAC encoder settings were also changed to improve quality. Some of the changelog highlights:
We strongly recommend users, distributors, and system integrators to upgrade unless they use current git master.
We have a new IRC home at Libera Chat now! Feel free to join us at #ffmpeg and #ffmpeg-devel. More info at contact#IRCChannels
FFmpeg 4.4 “Rao”, a new major release, is now available! Some of the highlights:
We strongly recommend users, distributors, and system integrators to upgrade unless they use current git master.
FFmpeg 4.3 “4:3”, a new major release, is now available! Some of the highlights:
* switch from AvxSynth to AviSynth+ on Linux
* Support for muxing pcm and pgs in m2ts
We strongly recommend users, distributors, and system integrators to upgrade unless they use current git master.
Note that this filter is not FDA approved, nor are we medical professionals. Nor has this filter been tested with anyone who has photosensitive epilepsy. FFmpeg and its photosensitivity filter are not making any medical claims.
That said, this is a new video filter that may help photosensitive people watch tv, play video games or even be used with a VR headset to block out epiletic triggers such as filtered sunlight when they are outside. Or you could use it against those annoying white flashes on your tv screen. The filter fails on some input, such as the Incredibles 2 Screen Slaver
scene. It is not perfect. If you have other clips that you want this filter to work better on, please report them to us on our trac.
See for yourself. Example was made with -vf photosensitivity=20:0.8
We are not professionals. Please use this in your medical studies to advance epilepsy research. If you decide to use this in a medical setting, or make a hardware hdmi input output realtime tv filter, or find another use for this, please let me know. This filter was a feature request of mine since 2013.
FFmpeg 4.2 “Ada”, a new major release, is now available! Some of the highlights:
* Support decoding of HEVC 4:4:4 content in nvdec and cuviddec
* mov muxer writes tracks with unspecified language instead of English by default
* added support for using clang to compile CUDA kernels
We strongly recommend users, distributors, and system integrators to upgrade unless they use current git master.
FFmpeg 4.1 “al-Khwarizmi”, a new major release, is now available! Some of the highlights:
* Support for AV1 in MP4 and Matroska/WebM
We strongly recommend users, distributors, and system integrators to upgrade unless they use current git master.
FFmpeg 4.0 “Wu”, a new major release, is now available! Some of the highlights:
* Bitstream filters for editing metadata in H.264, HEVC and MPEG-2 streams
* Dropped support for building for Windows XP. The minimum supported Windows version is Windows Vista.
* Removed the ffmenc and ffmdec muxer and demuxer
We strongly recommend users, distributors, and system integrators to upgrade unless they use current git master.
FFmpeg 3.4 “Cantor”, a new major release, is now available! Some of the highlights:
* support for decoding through D3D11VA in ffmpeg
We strongly recommend users, distributors, and system integrators to upgrade unless they use current git master.
FFmpeg 3.3 “Hilbert”, a new major release, is now available! Some of the highlights:
* configure now fails if autodetect-libraries are requested but not found
We strongly recommend users, distributors, and system integrators to upgrade unless they use current git master.
This has been a long time coming but we wanted to give a proper closure to our participation in this run of the program and it takes time. Sometimes it’s just to get the final report for each project trimmed down, others, is finalizing whatever was still in progress when the program finished: final patches need to be merged, TODO lists stabilized, future plans agreed; you name it.
Without further ado, here’s the silver-lining for each one of the projects we sought to complete during this Summer of Code season:
Stanislav Dolganov designed and implemented experimental support for motion estimation and compensation in the lossless FFV1 codec. The design and implementation is based on the snow video codec, which uses OBMC. Stanislav’s work proved that significant compression gains can be achieved with inter frame compression. FFmpeg welcomes Stanislav to continue working beyond this proof of concept and bring its advances into the official FFV1 specification within the IETF.
Petru Rares Sincraian added several self-tests to FFmpeg and successfully went through the in-some-cases tedious process of fine tuning tests parameters to avoid known and hard to avoid problems, like checksum mismatches due to rounding errors on the myriad of platforms we support. His work has improved the code coverage of our self tests considerably.
...
Read the original on ffmpeg.org »
It’s Tuesday morning. Your VP of Engineering is standing in front of a slide deck, vibrating with the kind of excitement usually reserved for people who just discovered cryptocurrency in 2017. They’ve just come back from a conference. Or maybe a vendor dinner. Three glasses of pinot noir and a demo, and now they have news.
The room does that thing where half the people nod along and the other half develop a sudden interest in their laptops. Your staff engineer is doing that face. You know the face - it’s the one where they’re calculating whether to say something or just update their LinkedIn later.
Nobody asks the question that matters, which is: velocity toward what, exactly?
Because here’s what just happened. Your VP looked at your entire software delivery organisation, identified the one thing that was already pretty fast, and decided to make it faster. They found a station on the assembly line that was not the bottleneck, and threw money at it.
If you know anything about how systems work, you know this doesn’t just fail to help. It makes everything actively worse.
In 1984, Eli Goldratt wrote The Goal, a novel about manufacturing that has no business being as relevant to software as it is. It’s also the most useful business book you’ll ever read that’s technically fiction, which is almost the exact opposite of most KPI frameworks.
The core idea is the Theory of Constraints, and it goes like this:
Every system has exactly one constraint. One bottleneck. The throughput of your entire system is determined by the throughput of that bottleneck. Nothing else matters until you fix the bottleneck.
That’s the part most people get. Here’s the part they don’t, and it’s the part that should scare you:
Think about it mechanically. If station A produces widgets faster but station B (the bottleneck) can still only process them at the same rate, all you’ve done is create a pile of unfinished widgets between A and B. Inventory goes up. Lead time goes up. The people at station B are now drowning. The pile creates confusion about what to work on next. Quality tanks because everyone’s triaging instead of thinking.
I bet some of you are already living this. I’ve lived it. It sucked.
Your developers are producing PRs faster than ever. Great. Wonderful. Gold star. Someone get the confetti cannon. Now those PRs hit the review queue, and your reviewers haven’t tripled. Nobody tripled the reviewers. Nobody even thought about the reviewers, because the reviewers weren’t in the vendor’s slide deck.
So PRs sit. A day. Two days. A week. The author has context-switched to their next AI-assisted feature and can barely remember what the first one did by the time review comments land. “Can you explain what this function does?” they ask, staring at code they wrote eight days ago, which in developer memory is roughly the Jurassic period.
Reviews start getting rubber-stamped because there are simply too damn many of them to review properly. Someone approves a PR they didn’t really read. We’ve all done it (don’t look at me like that). It merges. CI takes 45 minutes, fails on a flaky test, gets re-run, passes on the second attempt (the flaky test is fine, it’s always fine, until it isn’t and you’re debugging production at 2am on a Saturday in your underwear wondering where your life went wrong. Ask me how I know… actually, don’t). The deploy pipeline requires a manual approval from someone who’s in a meeting about meetings. The feature sits in staging for three days because nobody owns the “get it to production” step with any urgency.
Meanwhile, the developer has already shipped two more PRs. The queue grows. WIP goes through the roof. Everyone has six things in flight and nothing actually done. Cycle time (the thing that actually measures how fast you deliver value to users) gets worse.
You are producing more code and shipping less software. You have made your situation measurably, demonstrably worse, and you have a dashboard that says productivity is up 40%.
I have seen this exact movie play out at three different companies. The dashboard goes up. The shipping goes down. And nobody connects the two because the dashboard is the thing they’re reporting to the board, and the board doesn’t know what cycle time is, and nobody wants to be the person who explains it.
And here’s the bit that really keeps me up at night: a lot of this AI-generated code? Nobody fully understands it. The person who “wrote” it didn’t really write it. They prompted it, skimmed it, maybe ran it once. When it breaks in production at 2am, the person on-call didn’t write it and the person who prompted it can’t explain it. You’ve just increased the surface area for incidents while decreasing the number of humans who can reason about the system.
If it’s not writing code (and it almost never is), then where should you be looking? Walk the value stream. Follow a feature from “someone had an idea” to “a user got value from it.” I promise the bottleneck will jump out and wave at you - it might even flip you off because you’ve been ignoring it.
This is the one nobody wants to talk about because it’s embarrassing. Your PM hasn’t talked to a real user in two months. Your requirements arrive as a Jira ticket with three sentences and a Figma link to a design that was approved by someone who’s never used the product. Your engineers are making fifty micro-decisions a day about behaviour, edge cases, and error handling that nobody specified, because nobody thought about them.
I once watched a team spend six weeks building a feature based on a Slack message from a sales rep who paraphrased what a prospect maybe said on a call. Six weeks. The prospect didn’t even end up buying. The feature got used by eleven people, and nine of them were internal QA. That’s not a delivery problem. That’s an “oh fuck, what are we even doing” problem.
When you speed up code output in this environment, you are speeding up the rate at which you build the wrong thing. You have automated the guessing. You will build the wrong feature faster, ship it, watch it fail, and then do a retro where someone says “we need to talk to users more” and everyone nods solemnly and then absolutely nothing changes.
I put “done” in quotes because in most orgs, code being written is maybe 20% of the journey. The other 80% is your code sitting in various queues, slowly ageing, like a forgotten sandwich in the office fridge.
I’ve watched features where the code took an afternoon and it took two months to reach production. Two. Months. The code didn’t get slower. Everyone around the code got in its way.
PR review. CI. Staging. QA. Security review. Product sign-off. Deploy window. Canary rollout. The actual pipeline of getting code from a developer’s branch to a user’s screen is a long series of handoffs, waits, and queues. Most of the time, your code is sitting still. Waiting for a human to look at it. Waiting for a pipeline to run. Waiting for someone to give it permission to exist.
If you’ve ever watched a PR approval come through at 4:55pm on a Friday and thought “well, that’s shipping on Monday I guess,” you know exactly what I’m talking about.
If you want to ship faster, look at where things are waiting. Count the hours of actual work versus the hours of sitting in a queue. I guarantee the ratio will make you want to put your head through a wall.
I can’t count the number of teams I’ve worked with that were scared to deploy. Tests are flaky, observability is a mess, nobody trusts the canary process, and the last time someone deployed on a Thursday it ruined everyone’s weekend. So what do they do? They batch changes into bigger releases. Which are riskier. Which makes deploys scarier. Which makes everyone batch more.
Now add faster code output to this environment. More code, same terrified deploy culture. The batches get bigger. The risk gets higher. The releases get less frequent. You have given a team that was already scared of shipping even more reasons to not ship. Incredible work.
This one pairs with “you don’t know what to build” because it’s the same disease on the other end of the pipeline. You built the thing. You shipped the thing. And then… nothing. No analytics worth looking at. No user interviews after launch. Nobody circling back to check whether the feature actually solved the problem it was supposed to solve.
So you guess on the next feature too. And the one after that. The entire product roadmap is a series of educated guesses with no feedback between them.
You arrive at “we have no idea if this worked” more often, learn nothing each time, and somehow call that velocity.
Sometimes the bottleneck isn’t technical at all. It’s the meeting you need to get a decision. The three teams who need to agree on an API contract but haven’t talked to each other in a month. The architect who’s a single point of approval on every significant design choice and has a two-week backlog because apparently we built a system where one person’s calendar is a load-bearing wall. Or my personal favourite: the planning process that takes six weeks, runs quarterly, and means you can’t start working on something urgent for another five weeks because “it wasn’t in the plan.”
Not a technical problem. Not a code problem. A calendar problem. We spent more time talking about the feature than building it. At one point someone suggested we have a meeting to discuss the meeting. I wish I was joking. Now I need a shower, and some whiskey.
Writing code faster does precisely nothing for any of this. Zero. Your bottleneck is the org chart, and no amount of Copilot is going to refactor that.
You knew this section was coming. The boring bit. I’m not going to pretend this is glamorous, because it isn’t. Nobody’s going to write a LinkedIn post about it. Nobody’s going to give a keynote about it at a vendor conference. There’s no swag.
Map your value stream. Literally follow a feature from idea to production. Write down every step. Write down how long each step takes. Write down how long things sit between steps. The gap between steps is where your cycle time lives. This will be depressing. Do it anyway. Bring snacks.
Measure cycle time, not output. If you’re measuring lines of code, PRs merged, or “story points delivered” and not measuring how long it takes from commit to production to users using it, you’re optimising for the wrong thing. You’re counting widgets at station A and ignoring the pile on the floor. Stop it. I mean it.
Find the wait states and kill them. If PRs wait two days for review, fix review. Pair programming, smaller PRs, dedicated review time, async review norms, whatever works for your team. If deploys wait for a manual approval, automate it or at least make it a Slack button instead of a calendar invite. If decisions wait for a meeting, make smaller decisions that don’t need meetings.
Stop starting and start finishing. WIP limits exist for a reason. It’s better to have three things done than ten things in progress. Every item in flight is context-switching tax, and context-switching is where good engineers go to slowly lose their minds and start writing manifestos on internal wikis that nobody reads.
Talk to the people doing the work. Your developers already know where the bottleneck is. They complain about it in standups. They’ve been making memes about it in Slack for months. They just assumed nobody was listening, and honestly? They were probably right.
Go back to that Tuesday morning. Your VP is up there with their slide about 40% more code output. What they should have said, what would have actually been useful, is this: “We did a value stream analysis and found that features spend an average of nine days waiting between steps. We’re going to cut that in half.”
That’s not sexy. It doesn’t fit on a vendor’s slide deck. You can’t sell it as a product. There’s probably no conference talk in it (actually, this is giving me ideas…). But it’s the thing that would actually make you ship faster.
The speed of writing code was never your problem. If you thought it was, the gap between that belief and reality is where all your actual problems live. The competitive advantage doesn’t go to the team that writes code fastest. It goes to the team that figured out what to build, built it, and got it into users’ hands while everyone else was still drowning in a review queue full of AI-generated PRs that nobody has the time or the energy to read.
...
Read the original on debuggingleadership.com »
Python 3.15’s JIT is now back on track
(JIT performance as of 17 March (PST). Lower is better versus interpreter. Image credits to https://doesjitgobrrr.com/).
Great news—we’ve hit our (very modest) performance goals for the CPython JIT over a year early for macOS AArch64, and a few months early for x86_64 Linux. The 3.15 alpha JIT is about 11-12% faster on macOS AArch64 than the tail calling interpreter, and 5-6% faster than the standard interpreter on x86_64 Linux. These numbers are geometric means and are preliminary. The actual range is something like a 20% slowdown to over 100% speedup (ignoring the unpack_sequence microbenchmark). We don’t have proper free-threading support yet, but we’re aiming for that in 3.15/3.16. The JIT is now back on track.
I cannot overstate how tough this was. There was a point where I was seriously wondering if the JIT project would ever produce meaningful speedups. To recap, the original CPython JIT had practically no speedups: 8 months ago I posted a JIT reflections article on how the original CPython JIT in 3.13 and 3.14 was often slower than the interpreter. That was also around the time where the Faster CPython team lost funding by its main sponsor. I’m a volunteer so this didn’t affect me, but more importantly it did affect my friends working there, and at a point of time it seemed the JIT’s future was uncertain.
So what changed from 3.13 and 3.14? I’m not going to give some heroic tale of how we rescued the JIT from the jaws of failure through our acumen. I honestly attribute a lot of our current success to luck—right time, right place, right people, right bets. I seriously don’t think this would’ve been possible if a single one of the core JIT contributors: Savannah Ostrowski, Mark Shannon, Diego Russo, Brandt Bucher, and me were not in the picture. To not exclude the other active JIT contributors, I will also name a few more people: Hai Zhu, Zheaoli, Tomas Roun, Reiden Ong, Donghee Na, and I am probably missing a few more.
I’m going to cover a lesser talked about part of a JIT: the people, and a bit of luck. If you want the technical details of how we did it, it’s here
The Faster CPython team lost its main sponsor in 2025. I immediately raised the idea of community stewardship. At the time, I was pretty uncertain this would work. JIT projects are not known to be good for new contributors. It historically requires a lot of prior expertise.
At the CPython core sprint in Cambridge, the JIT core team met, and we wrote a plan for a 5% faster JIT by 3.15 and a 10% faster JIT by 3.16, with free-threading support. A side note, which was less headline grabbing, but vital to the health of the project: was to decrease the bus factor. We wanted 2 active maintainers in all 3 stages of the JIT; frontend (region selector), middle-end (optimizer), backend (code generator).
Previously, the JIT only had 2 active recurrent contributors middle-end. Today, the JIT has 4 active recurrent contributors to the middle-end, and I would consider the 2 non-core developers (Hai Zhu and Reiden) capable and valued members.
What worked in attracting people were the usual software engineering practices: breaking complex problems down into manageable parts. Brandt started this earlier in 3.14, where he opened multiple mega-issues that split optimizing the JIT into simple tasks. E.g. we would say “try optimizing a single instruction in the JIT”. I took Brandt’s idea and did this for 3.15. Luckily, I had an easier job as my issue involved converting the interpreter instructions to an easily optimizeable form. To encourage new contributors, I also laid out very detailed instructions that were immediately actionable. I also clearly demarcated units of work. I suspect that did help, as we have 11 contributors (including me) working on that issue, converting nearly the whole of the interpreter to something more JIT-optimizer friendly. The core was that the JIT could be broken down from an opaque blob to something that a C programmer with no JIT experience could contribute to.
Other things that worked: encouraging people, celebrating achievements big or small. Every JIT PR had a clear outcome, which I suspect gave people a sense of direction.
The community optimization efforts paid off. The JIT went from 1% faster on x86_64 Linux to 3-4% faster (see the blue line below) over that time period:
Again, I attribute a lot of this to luck, but during the CPython core sprints in Cambridge, Brandt nerd-sniped me to rewrite the JIT frontend to a tracing one. I initially didn’t like the idea, but as a friendly form of spite-driven-development, I thought I’d rewrite it just to prove to him it didn’t work.
The initial prototype worked in 3 days, however it took a month to get it JITting properly without failing the test suite. The initial results were dismal—about 6% slower on x86_64 Linux. I was about to ditch the idea, until a lucky accident happened: I misinterpertered a suggestion given by Mark.
Mark had suggested earlier to thread the dispatch table through the interpreter, thus having two dispatch tables in the interpreter (one normal interpreter, and one for tracing). Mark suggested we should have the tracing table be tracing versions of normal instructions. However, I misunderstood and came up with an even more extreme version: instead of tracing versions of normal instructions, I had only one instruction responsible for tracing, and all instructions in the second table point to that. Yes I know this part is confusing, I’ll hopefully try to explain better one day. This turned out to be a really really good choice. I found that the initial dual table approach was so much slower due to a doubling of the size of the interpreter, causing huge compiled code bloat, and naturally a slowdown. By using only a single instruction and two tables, we only increase the interpreter by a size of 1 instruction, and also keep the base interpreter ultra fast. I affectionally call this mechanism dual dispatch.
There’s a lot more that went into the design of the trace recording interpreter. I’m tooting my own horn here, but I truly think it’s a mini work of art. It took me 1 week to iterate on the interpreter until it was overall faster. It went from 6% slower to roughly no speedup after using dual dispatch. After that, I stamped out a bunch of slow edge cases in the tracing interpreter to eventually make it 1.x% faster. Tracing the interpreter itself is only 3-5x slower by my own estimations than the specializing interpreter. Key to this is that it respects all normal behavior of the specializing interpreter and mostly doesn’t intefere with it.
Just to give you an idea of how much trace recording mattered: it increased the JIT code coverage by 50%. This means all future optimizations would likely have been around 50% less effective (assuming all code executes the same, which of course isn’t true, just bear with me please :).
So I have to thank Brandt and Mark for leading me to stumble upon such a nice solution.
The other lucky bet we made early on was to try reference count elimination. This, again, was work originally by Matt Page done in CPython bytecode optimizer (more details in previous blog post on optimization). I noticed that there was still a branch left in the JITted code per reference count decrement even with the bytecode optimizer work. I thought: “why not try eliminating the branch”, and had no clue how much it would help. It turns out a single branch is actually quite expensive and these add up over time. Especially if it’s >=1 branch for every single Python instruction!
The other lucky part is how easy this was to parallelize and how great it was a tool to teach people about the interpreter and JIT. This was the main optimization that we directed people to work on in the Python 3.15 JIT. Although it was a mostly manual refactoring process, it taught people the key parts they needed to learn about the JIT without overhwhelming them.
We have a great infrastructure team. I say this partly in jest, because it’s one person. In reality, our “team” is currently 4 machines running in Savannah’s closet. Nevertheless Savannah has done the work equivalent of an entire infrastructure team for the JIT. The JIT could not have progressed so quickly if we had nothing to report our performance numbers. Daily JIT runs have been a game changer in the feedback loop. It helped us catch regressions in JIT performance, and lets us know our optimizations actually work.
Mark is technically excellent, and I think he knows the Internet gives him too much praise already so I’m not going to say anything more here :).
Diego is also great. He’s responsible for the JIT on ARM hardware, and also has recently started work on making the JIT friendly to profilers. I cannot overstate how hard of a problem this is.
Brandt laid the original foundation for our machine code backend, without which we’d have new contributors writing assembler, which probably would’ve put more people off.
I also want to encourage the idea of talking to people and sharing ideas.
A shoutout to CF Bolz-Tereick, who taught me a lot about PyPy. I spent a few months looking at PyPy’s source code, and I believe this made me a better JIT developer overall. CF was very helpful when I needed help.
I’m also part of a friendly compiler chat with Max Bernstein, without which I’d likely have lost motivation for this a long time ago. Max is a prolific writer, and a friendly compiler person.
Ideas don’t exist in a silo. I suspect I became better at writing JITs thanks to hanging out with a bunch of compiler people for some time. At the very least, looking at PyPy has broadened my view!
People are important, and with some luck, JIT go brrr.
...
Read the original on fidget-spinner.github.io »
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.