10 interesting stories served every morning and every evening.
Financially - several tiers and options available for GitHub, PayPal and Patreon.
Help in the Community discord and beyond (we also love blog posts).
Bounties, Fix bugs and add features faster as well as get paid for your work :)
...
Read the original on monogame.net »
This is read by an automated voice. Please report any issues or inconsistencies here.
This is read by an automated voice. Please report any issues or inconsistencies here.
Greg Abel faces the challenge of taking over Berkshire Hathaway from the legendary Warren Buffett .
Many regard Buffett as the world’s greatest investor after he grew Berkshire from a struggling New England textile mill that he starting buying up for $7.60 a share in 1962, to the massive conglomerate it is today with shares that go for more than $750,000 a pop. Buffett’s personal fortune of Berkshire stock is worth roughly $150 billion even after giving away more than $60 billion over the last 20 years.
Berkshire for decades has routinely outpaced the S&P 500 as Buffett bought up insurance companies like Geico and National Indemnity, manufacturers like Iscar Metalworking, retail brands like Dairy Queen, major utilities and even one of the nation’s biggest railroads, BNSF. Along the way, Buffett bought and sold hundreds of billions of dollars of stocks and profited handsomely from his famously long-term bets on companies like American Express, Coca-Cola and Apple.
Berkshire has struggled to keep that pace in recent years because it has grown so huge and also struggled to find new and significant acquisitions. Even this fall’s $9.7-billion acquisition of OxyChem probably isn’t big enough to make a difference in Berkshire’s profits.
Investors will be watching closely to see what changes Abel might make in Berkshire’s trajectory, but don’t expect any seismic shifts.
Buffett isn’t going anywhere and Abel has already been managing all of Berkshire’s noninsurance businesses since 2018. Buffett will remain chairman and plans to continue coming into the office each day to help spot new investments and offer Abel any advice he asks for.
CFRA Research analyst Cathy Seifert said it is natural for Abel to make some changes in the way Berkshire is run. Taking a more traditional approach to leadership with nearly 400,000 employees spread across dozens of subsidiaries makes a lot of sense, she said.
But Berkshire operates under an extremely decentralized structure that trusts its executives with significant decisions. Everyone associated with the company has said there are no plans to change that.
The world learned that Abel was to become the designated successor at Berkshire in 2021 when Buffett’s longtime business partner, the late Charlie Munger, assured shareholders at an annual meeting that Abel would maintain the company’s culture.
Part of Buffett’s sales pitch to company founders and CEOs thinking of selling their companies has always been that Berkshire would largely allow them to continue running their companies the same way as long as they delivered results.
“I think the investment community would likely applaud Greg’s management style to the degree that it sort of buttons things up,” Seifert said. “And if it helps performance, that can’t really be faulted.”
Abel has already shown himself to be a more hands-on manager than Buffett, but he still follows the Berkshire model of autonomy for acquired companies. Abel asks tough questions of company leaders and holds them accountable for their performance.
Abel did announce some leadership changes in December after investment manager and Geico CEO Todd Combs departed, and Chief Financial Officer Marc Hamburg announced his retirement. Abel also said he’s appointing NetJets CEO Adam Johnson as manager of all of Berkshire’s consumer, service and retail businesses. That essentially creates a third division of the company and takes some work off of Abel’s plate. He will continue to manage the manufacturing, utility and railroad businesses.
Abel will eventually face more pressure to start paying a dividend. From the beginning, Berkshire has held the position that it is better to reinvest profits rather than make quarterly or annual payouts to shareholders.
But if Abel can’t find a productive use of the $382 billion cash that Berkshire is sitting on, there may be a push from investors to start paying dividends or to adopt a traditional stock buyback program that would boost the value of shares they hold. Currently, Berkshire only repurchases shares when Buffett thinks they are a bargain, and he hasn’t done that since early 2024.
Still, Abel will be insulated from such pressure for some time since Buffett controls nearly 30% of the voting power in the stock. That will diminish gradually after his death as his children distribute his shares to charity as agreed.
Many of Berkshire’s subsidiaries tend to follow the economy and profit handsomely whenever the country is prosperous. Berkshire’s utilities typically generate a reliable profit, and its insurance companies like Geico and General Reinsurance supply more than $175 billion worth of premiums that can be invested until claims come due.
Investor Chris Ballard, who is managing director at Check Capital, said most of Berkshire’s businesses “can almost take care of themselves.” He sees a bright future for Berkshire under Abel.
One of the biggest questions right now may be how much additional change there will be in company leadership after Combs’ departure, if any at all. The head of the insurance unit, Vice Chairman Ajit Jain, who Buffett has long lavished with praise, is now 74. Many of the CEOs of the various companies have continued working long after retirement age because they like working for Buffett.
“As a long-term shareholder, we aren’t too concerned with Todd’s departure and don’t think this is the tip of some sort of iceberg,” said Ballard, whose firm counts Berkshire as its largest holding. “Todd’s situation is unique. It’s just a reminder that Warren’s pending departure is imminent and they’re preparing for a new phase — one that we’re still excited to see unfold.”
Funk writes for the Associated Press.
...
Read the original on www.latimes.com »
See the discussion of this post on Hacker News.
Many people reached out expressing their interest to buy the book. I’ve put the e-book up for pre-order . I’ll release each chapter as they are completed. A print version will be available on Amazon later.
Back in 2020-2022, my blog was getting a lot of attention. Some of the big tech book publishers reached out about whether I was interested in writing a book. I had a few conversations but decided against it. I did want to write a book but self-publishing seemed like the better option.
Then an acquisitions editor for another big publisher asked to chat. He had a similar background to me, an academic who enjoys coding and writing. He had written a few books for multiple publishers so he knew the process and wasn’t shy to share the good and bad. He even made decent money from his books.
I was intrigued. Writing a book was one of those goals that I liked the idea of but never made any progress on. I went and talked to a few other people that had published technical books and they gave me their reviews of the process and of each of the big publishers.
* Pros of a publisher: They are a forcing function to make progress, they handle a lot of logistics for you around the book, they provide some feedback on the content, they have large distribution channels, and it looks more “real” when a publisher’s name is on your book.
* Cons of a publisher: They nag you constantly, they may try to steer your book in other directions, the money is peanuts, they can stop printing the book whenever they want, they have control over future editions, and they actually do little to no marketing of your book.
I decided to write a book and sign with the publisher!
Before the actual deal, we had to agree on the book. Each publisher has their own template for pitching an idea. I did that and we went back and forth on some of the details. This part felt collaborative, they were trying to help me flesh out the concept based on their data and experience.
A lot of my blog posts involve classic programming projects that were relevant 30 years ago and will be just as fun 30 years from now. What if the book is a collection of tutorials on building these projects, each self-contained and teach fundamental computing concepts along the way?
To motivate that there is a market for this, I showed them that several of my blog posts in this space collectively had millions of views. Most notably, Challenging programming projects every programmer should try (and the two sequels). My readers seem to resonate with making useless stuff and programming for fun. Who doesn’t love hacking on a ray tracer or compiler or game or operating system just for fun?!
They liked it! It was quite different than many of their other books. I wrote a high-level outline of the entire book. The projects were going to be:
The compiler chapter would be a revised version of my Let’s make a Teeny Tiny compiler blog series. The last project chapter would list a bunch of smaller scale projects that are less of a commitment but still worthwhile for learning (e.g., an image file format converter). Also, each chapter would end with a bunch of suggestions on how to continue with the project. If I got to the end and the book was a bit short, I was planning to squeeze in one more project chapter too.
There was some negotiation of the contract terms. We had to agree specifically on what the book was going to be. This included the pitch for the book, who the intended audience is, and a very detailed Table of Contents (down to the subsubsection headings). I had to give a tentative schedule of when I’d deliver drafts for major milestones.
The weirdest thing in the contract was the number of illustrations that the book must contain. I asked to bump that up. In the end we agreed to 115,500 to 132,000 words in length, approximately 350 to 400 printed pages, with 10 to 30 illustrations.
They offered a $5000 advance with the first half paid out when they approve of the first third of the book and the second half when they accept the final manuscript for publication.
I didn’t even bother negotiating this because it is essentially nothing. I have a day job so I don’t need the money sooner to pay rent (it is just an advance after all!) and if I don’t sell way, way more than that then this was a bad use of my time financially anyway.
The royalties offered were surprisingly low. I did negotiate these and got them bumped up a tiny bit. They agreed to 12% of total sales for print and e-book on the first 7000 copies and then 15% on sales after that. 50% royalty on foreign translation sales.
I was later told that some people negotiated up to 18%. Meh. The finances don’t look great regardless unless you have one of their top books.
They refused to share statistics. I was able to get out of them that their median book sells in the range of single thousands of copies. Their top sellers are in the range of hundreds of thousands of copies but they only shared one example of a book that did that. Most are on the shelf for only a few years.
Oh, and I get 25 copies for myself and I can buy copies at a 50% discount.
We kicked the project off in early 2023.
They assigned me an editor that I met with regularly. This was my main contact with the publisher.
He walked me through the process and got me set up. I was required to use AsciiDoc or Microsoft Word for my drafts. Nope, I can’t use LaTeX. He gave me a very detailed style guide that I had to follow.
He emailed me, a lot. We had initially agreed to a chapter draft every 3 or 4 weeks (I was overly optimistic, and felt pressured…). Once that first soft deadline passed, the constant emails asking to see drafts began.
When I delivered a draft, I quickly got a marked-up version back. The feedback was mostly formatting and styling. The helpful feedback was pointing out rough transitions or assumptions I made about prior knowledge.
The unhelpful feedback was a consistent push to dumb down the book (which I don’t think is particularly complex but I do like to leave things for the reader to try) to appease a broader audience and to mellow out my personal voice. He also wanted me to add a chapter that acts as an intro to programming with Python… 😵
It became clear that they were following a formula for technical books. Don’t show too much personality, don’t be too technical, and hand hold the reader through a linear task. Just crank out the book so they get the book on the shelf. What you say on page 120 doesn’t matter because the reader has already bought the book.
This rubbed me the wrong way but I pushed through. They are just following their incentives and I should push back.
The book was started only a few months after the release of ChatGPT. The entire world was talking about AI!
So it wasn’t long before the publisher asked to chat. “Hey, is there any way you can incorporate AI into the book?”, I politely declined.
A bit later they come back basically saying The Powers That Be are requiring AI to be part of every book. I offered a few compromises (i.e., a chapter about implementing an ML algorithm or a note at the end of each chapter about leveraging AI in the creation of the projects). I got a mixed response.
In the end, I firmly told them no. It is antithetical to the premise of the book (classic programming projects!) that they agreed to publish. They went away.
I kept missing deadlines. I was busy with work (AI!) and life. Apparently every book’s timeline gets pushed at least once, so they were flexible. I eventually sent the editor the first third of the book and we made a few back-and-forths revising it.
This triggered the next stage: getting feedback on the technical content. We would do a round or two of this with a technical editor, then the draft would go off to strangers for review. If they hated it, the publisher had the right to cancel the project. If they don’t hate it, then the draft would go live for early readers to buy (and they’d receive future chapters as they are completed).
The first notes I got back from the technical editor didn’t seem like a good fit. Everything he said was correct, but indicated a mismatch in expectations. He was critiquing the project in the chapter as if it was supposed to be production quality software. But my projects are a balancing act of what can any programmer with very little knowledge on the subject make in a weekend that gives them a broad understanding of the concepts.
The second chapter of feedback from the technical editor was far more helpful to me. I think he “got” what I was going for and pointed out many flaws and suggested many improvements I could make. It was nice. Iterating on existing content is a very different workflow than writing new content so I slowed down even more. For example, it can be quite tedious to make sure that code snippets are consistent with snippets from 20 pages prior.
I continued to get further behind on delivering my revised draft of the first 1/3. This is a big milestone in the publisher’s process (soliciting external reviewers, determining whether the project should continue, putting the early adopters e-book up for sale, and paying out the first half of the advance).
The publisher was getting grumpy. I was getting grumpy. They were bringing up pivoting the book to be about AI again. They were “reevaluating” their portfolio. My editor left the company and I was assigned a new one. Things were piling on.
There was also a daunting voice in the back of my head that LLMs have eliminated the need for books like this. Why buy this book when ChatGPT can generate the same style of tutorial for ANY project that is customized to you?
The process with the publisher wasn’t enjoyable. I had hoped for it to be a positive motivator to keep me focused but it felt like a chore. And I was worried that the finished product would be void of personality and would be yet another boring programming book.
Around this time, there was a possibility of me changing jobs. Oh, and my wedding was coming up. That was the final nail in the coffin.
There were too many things going on and I didn’t enjoy working on the book anymore, so what is the point? I made up my mind to ask to freeze the project.
I think they thought of this as a temporary cooldown where I could sporadically work on the book without the stress of the deadlines. The new editor still pinged me regularly asking if I made progress. Repeatedly. (I suppose they are following the incentives again—they only get paid if people ship books.) Eventually I asked him to stop until I reached out first.
And then… life went on. I never got un-busy.
Fast forward, I just got notification from the publisher that the contract has been officially terminated and all rights of the work were transferred back to me.
I still love my book idea. Maybe I’ll just publish the chapters as blog posts or self-publish it!
Many people reached out expressing their interest to buy the book. I’ve put the e-book up for pre-order . I’ll release each chapter as they are completed. A print version will be available on Amazon later.
...
Read the original on austinhenley.com »
This is the third in my annual series reviewing everything that happened in the LLM space over the past 12 months. For previous years see Stuff we figured out about AI in 2023 and Things we learned about LLMs in 2024.
It’s been a year filled with a lot of different trends.
OpenAI kicked off the “reasoning” aka inference-scaling aka Reinforcement Learning from Verifiable Rewards (RLVR) revolution in September 2024 with o1 and o1-mini. They doubled down on that with o3, o3-mini and o4-mini in the opening months of 2025 and reasoning has since become a signature feature of models from nearly every other major AI lab.
My favourite explanation of the significance of this trick comes from Andrej Karpathy:
By training LLMs against automatically verifiable rewards across a number of environments (e.g. think math/code puzzles), the LLMs spontaneously develop strategies that look like “reasoning” to humans—they learn to break down problem solving into intermediate calculations and they learn a number of problem solving strategies for going back and forth to figure things out (see DeepSeek R1 paper for examples). […]
Running RLVR turned out to offer high capability/$, which gobbled up the compute that was originally intended for pretraining. Therefore, most of the capability progress of 2025 was defined by the LLM labs chewing through the overhang of this new stage and overall we saw ~similar sized LLMs but a lot longer RL runs.
Every notable AI lab released at least one reasoning model in 2025. Some labs released hybrids that could be run in reasoning or non-reasoning modes. Many API models now include dials for increasing or decreasing the amount of reasoning applied to a given prompt.
It took me a while to understand what reasoning was useful for. Initial demos showed it solving mathematical logic puzzles and counting the Rs in strawberry—two things I didn’t find myself needing in my day-to-day model usage.
It turned out that the real unlock of reasoning was in driving tools. Reasoning models with access to tools can plan out multi-step tasks, execute on them and continue to reason about the results such that they can update their plans to better achieve the desired goal.
A notable result is that AI assisted search actually works now. Hooking up search engines to LLMs had questionable results before, but now I find even my more complex research questions can often be answered by GPT-5 Thinking in ChatGPT.
Reasoning models are also exceptional at producing and debugging code. The reasoning trick means they can start with an error and step through many different layers of the codebase to find the root cause. I’ve found even the gnarliest of bugs can be diagnosed by a good reasoner with the ability to read and execute code against even large and complex codebases.
Combine reasoning with tool-use and you get…
I started the year making a prediction that agents were not going to happen. Throughout 2024 everyone was talking about agents but there were few to no examples of them working, further confused by the fact that everyone using the term “agent” appeared to be working from a slightly different definition from everyone else.
By September I’d got fed up of avoiding the term myself due to the lack of a clear definition and decided to treat them as an LLM that runs tools in a loop to achieve a goal. This unblocked me for having productive conversations about them, always my goal for any piece of terminology like that.
I didn’t think agents would happen because I didn’t think the gullibility problem could be solved, and I thought the idea of replacing human staff members with LLMs was still laughable science fiction.
I was half right in my prediction: the science fiction version of a magic computer assistant that does anything you ask of (Her) didn’t materialize…
But if you define agents as LLM systems that can perform useful work via tool calls over multiple steps then agents are here and they are proving to be extraordinarily useful.
The two breakout categories for agents have been for coding and for search.
The Deep Research pattern—where you challenge an LLM to gather information and it churns away for 15+ minutes building you a detailed report—was popular in the first half of the year but has fallen out of fashion now that GPT-5 Thinking (and Google’s “AI mode”, a significantly better product than their terrible “AI overviews”) can produce comparable results in a fraction of the time. I consider this to be an agent pattern, and one that works really well.
The “coding agents” pattern is a much bigger deal.
The most impactful event of 2025 happened in February, with the quiet release of Claude Code.
I say quiet because it didn’t even get its own blog post! Anthropic bundled the Claude Code release in as the second item in their post announcing Claude 3.7 Sonnet.
Claude Code is the most prominent example of what I call coding agents—LLM systems that can write code, execute that code, inspect the results and then iterate further.
The major labs all put out their own CLI coding agents in 2025
Vendor-independent options include GitHub Copilot CLI, Amp, OpenCode, OpenHands CLI, and Pi. IDEs such as Zed, VS Code and Cursor invested a lot of effort in coding agent integration as well.
My first exposure to the coding agent pattern was OpenAI’s ChatGPT Code Interpreter in early 2023—a system baked into ChatGPT that allowed it to run Python code in a Kubernetes sandbox.
I was delighted this year when Anthropic finally released their equivalent in September, albeit under the baffling initial name of “Create and edit files with Claude”.
In October they repurposed that container sandbox infrastructure to launch Claude Code for web, which I’ve been using on an almost daily basis ever since.
Claude Code for web is what I call an asynchronous coding agent—a system you can prompt and forget, and it will work away on the problem and file a Pull Request once it’s done. OpenAI “Codex cloud” (renamed to “Codex web” in the last week) launched earlier in May 2025. Gemini’s entry in this category is called Jules, also launched in May.
I love the asynchronous coding agent category. They’re a great answer to the security challenges of running arbitrary code execution on a personal laptop and it’s really fun being able to fire off multiple tasks at once—often from my phone—and get decent results a few minutes later.
I wrote more about how I’m using these in Code research projects with async coding agents like Claude Code and Codex and Embracing the parallel coding agent lifestyle.
In 2024 I spent a lot of time hacking on my LLM command-line tool for accessing LLMs from the terminal, all the time thinking that it was weird that so few people were taking CLI access to models seriously—they felt like such a natural fit for Unix mechanisms like pipes.
Maybe the terminal was just too weird and niche to ever become a mainstream tool for accessing LLMs?
Claude Code and friends have conclusively demonstrated that developers will embrace LLMs on the command line, given powerful enough models and the right harness.
It helps that terminal commands with obscure syntax like sed and ffmpeg and bash itself are no longer a barrier to entry when an LLM can spit out the right command for you.
As-of December 2nd Anthropic credit Claude Code with $1bn in run-rate revenue! I did not expect a CLI tool to reach anything close to those numbers.
With hindsight, maybe I should have promoted LLM from a side-project to a key focus!
The default setting for most coding agents is to ask the user for confirmation for almost every action they take. In a world where an agent mistake could wipe your home folder or a malicious prompt injection attack could steal your credentials this default makes total sense.
Anyone who’s tried running their agent with automatic confirmation (aka YOLO mode—Codex CLI even aliases –dangerously-bypass-approvals-and-sandbox to –yolo) has experienced the trade-off: using an agent without the safety wheels feels like a completely different product.
A big benefit of asynchronous coding agents like Claude Code for web and Codex Cloud is that they can run in YOLO mode by default, since there’s no personal computer to damage.
I run in YOLO mode all the time, despite being deeply aware of the risks involved. It hasn’t burned me yet…
One of my favourite pieces on LLM security this year is The Normalization of Deviance in AI by security researcher Johann Rehberger.
Johann describes the “Normalization of Deviance” phenomenon, where repeated exposure to risky behaviour without negative consequences leads people and organizations to accept that risky behaviour as normal.
This was originally described by sociologist Diane Vaughan as part of her work to understand the 1986 Space Shuttle Challenger disaster, caused by a faulty O-ring that engineers had known about for years. Plenty of successful launches led NASA culture to stop taking that risk seriously.
Johann argues that the longer we get away with running these systems in fundamentally insecure ways, the closer we are getting to a Challenger disaster of our own.
ChatGPT Plus’s original $20/month price turned out to be a snap decision by Nick Turley based on a Google Form poll on Discord. That price point has stuck firmly ever since.
This year a new pricing precedent has emerged: the Claude Pro Max 20x plan, at $200/month.
OpenAI have a similar $200 plan called ChatGPT Pro. Gemini have Google AI Ultra at $249/month with a $124.99/month 3-month starting discount.
These plans appear to be driving some serious revenue, though none of the labs have shared figures that break down their subscribers by tier.
I’ve personally paid $100/month for Claude in the past and will upgrade to the $200/month plan once my current batch of free allowance (from previewing one of their models—thanks, Anthropic) runs out. I’ve heard from plenty of other people who are happy to pay these prices too.
You have to use models a lot in order to spend $200 of API credits, so you would think it would make economic sense for most people to pay by the token instead. It turns out tools like Claude Code and Codex CLI can burn through enormous amounts of tokens once you start setting them more challenging tasks, to the point that $200/month offers a substantial discount.
2024 saw some early signs of life from the Chinese AI labs mainly in the form of Qwen 2.5 and early DeepSeek. They were neat models but didn’t feel world-beating.
This changed dramatically in 2025. My ai-in-china tag has 67 posts from 2025 alone, and I missed a bunch of key releases towards the end of the year (GLM-4.7 and MiniMax-M2.1 in particular.)
GLM-4.7, Kimi K2 Thinking, MiMo-V2-Flash, DeepSeek V3.2, MiniMax-M2.1 are all Chinese open weight models. The highest non-Chinese model in that chart is OpenAI’s gpt-oss-120B (high), which comes in sixth place.
The Chinese model revolution really kicked off on Christmas day 2024 with the release of DeepSeek 3, supposedly trained for around $5.5m. DeepSeek followed that on 20th January with DeepSeek R1 which promptly triggered a major AI/semiconductor selloff: NVIDIA lost ~$593bn in market cap as investors panicked that AI maybe wasn’t an American monopoly after all.
The panic didn’t last—NVIDIA quickly recovered and today are up significantly from their pre-DeepSeek R1 levels. It was still a remarkable moment. Who knew an open weight model release could have that kind of impact?
DeepSeek were quickly joined by an impressive roster of Chinese AI labs. I’ve been paying attention to these ones in particular:
Most of these models aren’t just open weight, they are fully open source under OSI-approved licenses: Qwen use Apache 2.0 for most of their models, DeepSeek and Z.ai use MIT.
Some of them are competitive with Claude 4 Sonnet and GPT-5!
Sadly none of the Chinese labs have released their full training data or the code they used to train their models, but they have been putting out detailed research papers that have helped push forward the state of the art, especially when it comes to efficient training and inference.
One of the most interesting recent charts about LLMs is Time-horizon of software engineering tasks different LLMscan complete 50% of the time from METR:
The chart shows tasks that take humans up to 5 hours, and plots the evolution of models that can achieve the same goals working independently. As you can see, 2025 saw some enormous leaps forward here with GPT-5, GPT-5.1 Codex Max and Claude Opus 4.5 able to perform tasks that take humans multiple hours—2024’s best models tapped out at under 30 minutes.
METR conclude that “the length of tasks AI can do is doubling every 7 months”. I’m not convinced that pattern will continue to hold, but it’s an eye-catching way of illustrating current trends in agent capabilities.
The most successful consumer product launch of all time happened in March, and the product didn’t even have a name.
One of the signature features of GPT-4o in May 2024 was meant to be its multimodal output—the “o” stood for “omni” and OpenAI’s launch announcement included numerous “coming soon” features where the model output images in addition to text.
Then… nothing. The image output feature failed to materialize.
In March we finally got to see what this could do—albeit in a shape that felt more like the existing DALL-E. OpenAI made this new image generation available in ChatGPT with the key feature that you could upload your own images and use prompts to tell it how to modify them.
This new feature was responsible for 100 million ChatGPT signups in a week. At peak they saw 1 million account creations in a single hour!
Tricks like “ghiblification”—modifying a photo to look like a frame from a Studio Ghibli movie—went viral time and time again.
OpenAI released an API version of the model called “gpt-image-1”, later joined by a cheaper gpt-image-1-mini in October and a much improved gpt-image-1.5 on December 16th.
The most notable open weight competitor to this came from Qwen with their Qwen-Image generation model on August 4th followed by Qwen-Image-Edit on August 19th. This one can run on (well equipped) consumer hardware! They followed with Qwen-Image-Edit-2511 in November and Qwen-Image-2512 on 30th December, neither of which I’ve tried yet.
The even bigger news in image generation came from Google with their Nano Banana models, available via Gemini.
Google previewed an early version of this in March under the name “Gemini 2.0 Flash native image generation”. The really good one landed on August 26th, where they started cautiously embracing the codename “Nano Banana” in public (the API model was called “Gemini 2.5 Flash Image”).
Nano Banana caught people’s attention because it could generate useful text! It was also clearly the best model at following image editing instructions.
In November Google fully embraced the “Nano Banana” name with the release of Nano Banana Pro. This one doesn’t just generate text, it can output genuinely useful detailed infographics and other text and information-heavy images. It’s now a professional-grade tool.
Max Woolf published the most comprehensive guide to Nano Banana prompting, and followed that up with an essential guide to Nano Banana Pro in December.
I’ve mainly been using it to add kākāpō parrots to my photos.
Given how incredibly popular these image tools are it’s a little surprising that Anthropic haven’t released or integrated anything similar into Claude. I see this as further evidence that they’re focused on AI tools for professional work, but Nano Banana Pro is rapidly proving itself to be of value to anyone who’s work involves creating presentations or other visual materials.
In July reasoning models from both OpenAI and Google Gemini achieved gold medal performance in the International Math Olympiad, a prestigious mathematical competition held annually (bar 1980) since 1959.
This was notable because the IMO poses challenges that are designed specifically for that competition. There’s no chance any of these were already in the training data!
It’s also notable because neither of the models had access to tools—their solutions were generated purely from their internal knowledge and token-based reasoning capabilities.
Turns out sufficiently advanced LLMs can do math after all!
In September OpenAI and Gemini pulled off a similar feat for the International Collegiate Programming Contest (ICPC)—again notable for having novel, previously unpublished problems. This time the models had access to a code execution environment but otherwise no internet access.
I don’t believe the exact models used for these competitions have been released publicly, but Gemini’s Deep Think and OpenAI’s GPT-5 Pro should provide close approximations.
With hindsight, 2024 was the year of Llama. Meta’s Llama models were by far the most popular open weight models—the original Llama kicked off the open weight revolution back in 2023 and the Llama 3 series, in particular the 3.1 and 3.2 dot-releases, were huge leaps forward in open weight capability.
Llama 4 had high expectations, and when it landed in April it was… kind of disappointing.
There was a minor scandal where the model tested on LMArena turned out not to be the model that was released, but my main complaint was that the models were too big. The neatest thing about previous Llama releases was that they often included sizes you could run on a laptop. The Llama 4 Scout and Maverick models were 109B and 400B, so big that even quantization wouldn’t get them running on my 64GB Mac.
They were trained using the 2T Llama 4 Behemoth which seems to have been forgotten now—it certainly wasn’t released.
It says a lot that none of the most popular models listed by LM Studio are from Meta, and the most popular on Ollama is still Llama 3.1, which is low on the charts there too.
Meta’s AI news this year mainly involved internal politics and vast amounts of money spent hiring talent for their new Superintelligence Labs. It’s not clear if there are any future Llama releases in the pipeline or if they’ve moved away from open weight model releases to focus on other things.
Last year OpenAI remained the undisputed leader in LLMs, especially given o1 and the preview of their o3 reasoning models.
This year the rest of the industry caught up.
OpenAI still have top tier models, but they’re being challenged across the board.
In image models they’re still being beaten by Nano Banana Pro. For code a lot of developers rate Opus 4.5 very slightly ahead of GPT-5.2 Codex Max. In open weight models their gpt-oss models, while great, are falling behind the Chinese AI labs. Their lead in audio is under threat from the Gemini Live API.
Where OpenAI are winning is in consumer mindshare. Nobody knows what an “LLM” is but almost everyone has heard of ChatGPT. Their consumer apps still dwarf Gemini and Claude in terms of user numbers.
Their biggest risk here is Gemini. In December OpenAI declared a Code Red in response to Gemini 3, delaying work on new initiatives to focus on the competition with their key products.
They posted their own victorious 2025 recap here. 2025 saw Gemini 2.0, Gemini 2.5 and then Gemini 3.0—each model family supporting audio/video/image/text input of 1,000,000+ tokens, priced competitively and proving more capable than the last.
They also shipped Gemini CLI (their open source command-line coding agent, since forked by Qwen for Qwen Code), Jules (their asynchronous coding agent), constant improvements to AI Studio, the Nano Banana image models, Veo 3 for video generation, the promising Gemma 3 family of open weight models and a stream of smaller features.
Google’s biggest advantage lies under the hood. Almost every other AI lab trains with NVIDIA GPUs, which are sold at a margin that props up NVIDIA’s multi-trillion dollar valuation.
Google use their own in-house hardware, TPUs, which they’ve demonstrated this year work exceptionally well for both training and inference of their models.
...
Read the original on simonwillison.net »
We give you and Claude full + search power over a growing index of documents relevant to the intelligence explosion. Exploration-first LessWrong lensing. Steerable axes, bridge posts, and a personal attribute profile. Designed to be easy to delete if it is not worth keeping. Paste this into Claude Code to start exploring immediately. For full functionality (higher limits + private vectors), create an account. Claude Code and Codex are essentially AGI at this point—we recommend getting acquainted with these tools even if you are not a software developer. For maximum ergonomics (else you’ll be manually approving each time Claude tries to query our API), we think you can get away with claude –dangerously-skip-permissions, but that is your risk to accept. We would not recommend this with a model less smart than Opus 4.5. The risk even if you trust us is prompt injection attacks in one of our ingested entities, even though we generally scrape content from reputable sources. Use this prompt directly inside the Claude web app. No MCP, no installs: just allow access to our API once. Paste the prompt below and start querying in claude.ai. This gives Claude web permission to call our API from its sandbox. # ExoPriors Alignment Scry (Public Access)
You have **public** access to the ExoPriors alignment research corpus.
You are a research copilot for ExoPriors Alignment Scry.
Your purpose:
- Turn research goals into effective semantic search, SQL, and vector workflows
- Surface high-signal documents and patterns, not just raw rows
- Use vector mixing to express nuanced “vibes” (e.g., “mech interp + oversight − hype”)
- **Core trick**: use `debias_vector(axis, topic)` to remove topic overlap (best for “X but not Y” or “tone ≠ topic” queries)
## Public access notes
- **Public @handles**: must match `p__` (e.g., `p_8f3a1c2d_myhandle`); shared namespace; write-once
- **Examples**: replace any `@mech_interp`-style handle with your public handle (e.g., `@p_8f3a1c2d_mech_interp`)
- **Rate limits**: stricter per-IP limits and lower concurrent query caps than private keys
- **Timeouts**: adaptive, up to ~120s under light load; can drop under load
- **Embeddings**: small per-IP token budget and per-request size caps; create an account if you hit limits
- **Not available**: `GET/DELETE /v1/alignment/vectors`, `/api/scry/alerts`
**Strategy for nuanced questions (explore → scale):**
1) **Explore quickly**: Start with small LIMITs (10–50), materialized views, or `alignment.search()` to validate schema and phrasing.
2) **Form candidates**: Build a focused candidate set (lexical search or a tight WHERE) with a hard LIMIT (100–500), then join.
3) **Scale carefully**: Once the shape is right, expand limits and add aggregations. Let Postgres plan joins when possible; if public timeouts bite, intersect small candidate sets client-side as a fallback.
4) **Lean on the planner**: Use `EXPLAIN SELECT …` (no ANALYZE) to sanity-check join order and filters. Keep filters sargable, and push them into the base tables/CTEs.
**Execution guardrails (transparency + confirmation):**
- Always show a short “about to run” summary: SQL + semantic filters (sources/kinds/date ranges + @handles).
- If a query may be heavy, ask for confirmation before executing. Use `/v1/alignment/estimate` when in doubt.
- Treat as heavy if: missing LIMIT, LIMIT > 1000, estimated_rows > 100k, embedding distance over >500k rows, or joins over large base tables.
- Always remind the user they can cancel or revise the query at any time.
**Explore corpus composition (source × type):**
```sql
SELECT source::text AS source, kind::text AS kind, COUNT(*) AS n
FROM alignment.entities
GROUP BY 1, 2
ORDER BY n DESC
LIMIT 50;
**Quick example** — weighted combination search:
```sql
– After storing @mech_interp, @oversight, @hype via /embed:
SELECT mv.uri, mv.title, mv.original_author, mv.base_score,
mv.embedding (
scale_vector(@mech_interp, 0.5)
+ scale_vector(@oversight, 0.4)
- scale_vector(@hype, 0.2)
) AS distance
FROM mv_lesswrong_posts mv
ORDER BY distance
LIMIT 20;
You access everything via HTTP APIs. You do NOT have direct database access.
## 1. APIs
All endpoints: `POST` with JSON body.
Headers for all requests:
Authorization: Bearer exopriors_public_readonly_v1_2025
Content-Type: application/json
(Your API key is intentionally embedded in this prompt for ergonomics. Keys reload frequently; if you get 401 errors, refresh the prompt.)
### 1.1 SQL Query
`POST https://api.exopriors.com/v1/alignment/query`
Request body:
```json
{{“sql”: “SELECT kind::text AS kind, COUNT(*) FROM alignment.entities GROUP BY kind::text ORDER BY 2 DESC LIMIT 20″, “include_vectors”: false}}
Example response (illustrative; counts change):
```json
“columns”: [{{“name”: “kind”, “type”: “TEXT”}}, {{“name”: “count”, “type”: “INT8”}}],
“rows”: [[“comment”, 38911611], [“tweet”, 11977552], [“wikipedia”, 6201199]],
“row_count”: 3,
“duration_ms”: 42,
“truncated”: false
Constraints:
- Max 10,000 rows (100 when `include_vectors: true`)
- Adaptive timeout: up to 120s when load allows (down to ~20s under heavy load)
- One statement per request
- Always include LIMIT; use WHERE filters to avoid full scans
- Vector columns are returned as placeholders (e.g., `[vector value]`); use distances/similarities instead of requesting raw vectors
**Performance heuristics (rough, load-dependent):**
- Embedding distances are the most expensive operation; each embedding comparison scans the candidate set.
- Multiple embeddings multiply cost linearly (2 embeddings ≈ 2× work).
- Keep embedding comparisons to a few hundred thousand rows per embedding; use tighter filters or smaller candidates first.
- Regex/ILIKE on `payload` is costly; prefer `alignment.search()` to narrow, then join.
**Performance tips (ballpark, load-dependent):**
- Simple searches: ~1–5s
- Embedding joins (5M rows): may timeout under load
- `alignment.search()` is capped at 100 rows; use `alignment.search_exhaustive()` + pagination if completeness matters
- If a query times out: reduce sample size, use fewer embeddings, or pre-filter with `alignment.search()`. For public keys, intersect small candidate lists client-side as a fallback.
- For author aggregates, use `alignment.mv_author_stats` instead of `COUNT(DISTINCT original_author)` on `alignment.entities`.
**Context management (for LLMs):**
- Avoid `SELECT *` on large result sets; pick only the columns you need.
- Trim long text with `alignment.preview_text(payload, 500)` or `LEFT(payload, 500)`.
- Keep LIMITs small (10–50); don’t fetch hundreds of entities at once or you’ll flood context.
### 1.1b Query Estimate (No Execution)
`POST https://api.exopriors.com/v1/alignment/estimate`
Request body:
```json
{{“sql”: “SELECT id FROM alignment.entities WHERE source = ‘hackernews’ AND kind = ‘comment’ LIMIT 1000″}}
Response (example):
```json
“estimated_rows”: 1000,
“total_cost”: 12345.6,
“estimated_seconds”: 1.8,
“estimated_range_seconds”: [0.9, 3.6],
“risk”: “low”,
“timeout_secs”: 300,
“load_stage”: “normal”,
“warnings”: []
This uses `EXPLAIN (FORMAT JSON)` to estimate cost/time and does **not** execute the query.
...
Read the original on exopriors.com »
A new method to capture carbon dioxide from the air has been developed at the University of Helsinki’s chemistry department.
The method developed by Postdoctoral Researcher is based on a compound of superbase and alcohol. Tests done in professor group show that the compound appears promising: one gram of the compound can absorb 156 milligrams of carbon dioxide directly from untreated ambient air. However, the compound does not react with nitrogen, oxygen or other atmospheric gases. Capasity clearly outperforms the CO capture methods currently in use.
The CO captured by the compound can be released by heating the compound at 70 °C in 30 minutes. Clean CO is recovered and can be recycled.
The ease of releasing CO is the key advantage of the new compound. In current compounds, releasing CO typically requires heat above 900 degrees Celsius.
– In addition, the compound can be used multiple times: the compound retained 75 percent of its original capacity after 50 cycles, and 50 percent after 100 cycles.
The new compound was discovered by experimenting with a number of bases in different compounds, says Eshagi Gorji. The experiments lasted more than a year in total.
The most promising base proved to be 1,5,7-triazabicyclo [4.3.0] non-6-ene (TBN), developed at in the professor group, which was combined with benzyl alcohol to produce the final compound.
– None of the components is expensive to produce, Eshaghi Gorji points out. In addition, the fluid is non-toxic.
The compound will now be tested in pilot plants at a near-industrial scale, rather than in grams. A solid version of the liquid compound must be made for this purpose.
– The idea is to bind the compound to compounds such as silica and graphene oxide, which promotes the interaction with carbon dioxide.
...
Read the original on www.helsinki.fi »
Of or relating to productive work, trade, or manufacture, esp. mechanical industry or large-scale manufacturing; ( also) resulting from such industry.
For most of its history, software has been closer to craft than manufacture: costly, slow, and dominated by the need for skills and experience. AI coding is changing that, by making available paths of production which are cheaper, faster, and increasingly disconnected from the expertise of humans.
I have written previously about how AI coding can be a trap
for today’s practitioners, offering shortcuts to incomplete solutions at the expense of the understanding needed for sustainable development practices. But as we collectively address the shortcomings of our current toolset, it is clear that we are heading into a world in which the production of software is becoming increasingly automated.
What happens to software when its production undergoes an industrial revolution?
Traditionally, software has been expensive to produce, with expense driven largely by the labour costs of a highly skilled and specialised workforce. This workforce has also constituted a bottleneck for the possible scale of production, making software a valuable commodity to produce effectively.
Industrialisation of production, in any field, seeks to address both of these limitations at once, by using automation of processes to reduce the reliance on human labour, both lowering costs and also allowing greater scale and elasticity of production. Such changes relegate the human role to oversight, quality control, and optimisation of the industrial process.
The first order effect of this change is a disruption in the supply chain of high quality, working products. Labour is disintermediated, barriers to entry are lowered, competition rises, and rate of change accelerates. All of these effects are starting to be in evidence today, with the traditional software industry grappling with the ramifications.
A second order effect of such industrialisation is to enable additional ways to produce low quality, low cost products at high scale. Examples from other fields include:
In the case of software, the industrialisation of production is giving rise to a new class of software artefact, which we might term disposable software: software created with no durable expectation of ownership, maintenance, or long-term understanding.
Advocates might refer to this as vibe-coded software, and sceptics will invariably talk about AI slop. Regardless of its merits, it is clear that the economics of this class of software are quite different, as each software output has less economic value, due to its easy reproducibility. This lack of perceived value might tempt you to dismiss the trend as a passing fad, but this would be unwise. To understand why, we need to consider the historical precedents for commoditisation of previously scarce goods.
Jevons paradox
is an old bit of economic theory that has been much quoted recently. The observation dates to the nineteenth century, noting that improved efficiency in coal consumption would lead to lower costs, fueling higher demand, and ultimately resulting in higher overall coal consumption.
This is relevant today, because we are seeing the same surge in demand for AI compute: as models become more efficient at token prediction, demand is surging and results in ever greater consumption. Will the same effect ripple through software development itself, with lower cost of effort driving higher consumption and output? History suggests it will.
Consider the industrialisation of agriculture. In the early twentieth century, scientific advances were expected to eradicate hunger and usher in an era of abundant, nourishing food. Instead, hunger and famine persist. In 2025, there are 318 million people experiencing acute hunger, even in countries with an agricultural surplus. Meanwhile, in the wealthiest nations, industrial food systems have produced abundance of a different kind: the United States has an adult obesity rate of 40% and a growing diabetes crisis. Ultraprocessed foods are widely recognised as harmful, yet the overwhelming majority of Americans consume them each day.
Industrial systems reliably create economic pressure toward excess, low quality goods. This is not because producers are careless, but because once production is cheap enough, junk is what maximises volume, margin, and reach. The result is not abundance of the best things, but overproduction of the most consumable ones. And consume them we do.
Our appetite for AI slop is likely to be similarly insatiable. The adoption curve we’ve seen so far may pale beside what happens when disposable software production becomes truly mainstream. If the democratisation of software mirrors the impact of ubiquitous photo, video, and audio capture enabled by the smartphone, we may see user-generated software created, shared, and discarded at social-media scale. Should that happen, the feedback loops of novelty and reward will drive an explosion of software output that makes the past half-century of development look quaint by comparison.
Ultraprocessed foods are, of course, not the only game in town. There is a thriving and growing demand for healthy, sustainable production of foodstuffs, largely in response to the harmful effects of industrialisation. Is it possible that software might also resist mechanisation through the growth of an “organic software” movement? If we look at other sectors, we see that even those with the highest levels of industrialisation also still benefit from small-scale, human-led production as part of the spectrum of output.
For example, prior to industrialisation, clothing was largely produced by specialised artisans, often coordinated through guilds and manual labour, with resources gathered locally, and the expertise for creating durable fabrics accumulated over years, and frequently passed down in family lines. Industrialisation changed that completely, with raw materials being shipped intercontinentally, fabrics mass produced in factories, clothes assembled by machinery, all leading to today’s world of fast, disposable, exploitative fashion. And yet handcrafted clothes still exist: from tailored suits to knitted scarves, a place still exists for small-scale, slow production of textile goods, for reasons ranging from customisation of fit, signalling of wealth, durability of product, up to enjoyment of the craft as a pastime.
So, might human-written software be confined to niches mirroring high fashion or homemade knitwear? That might have been the case were software a physical product, in which industrialisation could lead to mass production of reusable components. But software is an intangible good, and unlike other industrialised fields, it has a long history of component reuse that is intrinsic to the nature of the good itself. Innovation is not limited to better or cheaper versions of existing products, as with clothing, but also encompasses growth of the solution space, more akin to how the steam engine enabled reusable machine parts, enabled the production line, enabled the motor car, etc.
As such, the mechanism for technological progress in the history of software development has been not only industrialisation, but also innovation. Research and development is expensive, but offers the only path to greater value over time.
Innovation is fundamentally different to industrialisation, because it is not focused on more efficiently replicating what already exists today. It instead advances through finding and solving new problems, building on what came before, and delivering capabilities that could not previously have existed. Industrialisation then steps in and provides scale and commoditisation, providing a foundation upon which the next round of innovation can build. The interplay of these two forces is what we term progress.
Large language models are a steam engine moment for software. They collapse the cost of a class of work previously fully dependent on scarce human labour, and in doing so unlock an extraordinary acceleration in output.
But remember that the steam engine did not appear in a vacuum. Windmills and watermills preceded turbines by centuries. Mechanisation did not begin with coal and steel; it merely reached an inflection point at which automation, scale, and capital aligned to power economic transformation. Similarly, software has been industrialising for a long time: through reusable components (open source code), portability (containerisation, the cloud), democratisation (low-code / no-code tools), interoperability (API standards, package managers) and many other ways.
We are entering an industrial revolution for software, then, not as a moment of rupture, but one of huge acceleration. Industrialisation does not replace technological progress, but it will greatly accelerate both the absorption of new ideas and the commoditisation of new capabilities. In turn, innovation is more quickly unlocked, as the cost of building on top of novel technology drops more quickly. The cycle of progress continues, but in an era of mass automation, the wheel spins faster than ever before.
The open question, then, is not whether industrial software will dominate, but what that dominance does to the surrounding ecosystem. Previous industrial revolutions externalised their costs onto environments that seemed infinite until they weren’t. Software ecosystems are no different: dependency chains, maintenance burdens, security surfaces that compound as output scales. Technical debt is the pollution of the digital world, invisible until it chokes the systems that depend on it. In an era of mass automation, we may find that the hardest problem is not production, but stewardship. Who maintains the software that no one owns?
...
Read the original on chrisloy.dev »
Stewart Douglas Cheifet, age 87, of Philadelphia, PA, passed away on December 28, 2025.
Stewart was born on September 24, 1938, to Paul and Anne Cheifet in Philadelphia, where he spent his childhood and attended Central High School. He later moved to California to attend college, graduating from the University of Southern California in 1960 with degrees in Mathematics and Psychology. He went on to earn his law degree from Harvard Law School.
In 1967, Stewart met his future wife, Peta Kennedy, while the two were working at CBS News in Paris. They returned to the United States and married later that year. Stewart’s career in television production took them around the world, and they lived together in the Samoan Islands, Hawaii, San Francisco, and Los Angeles, before eventually settling back in Philadelphia.
Stewart and Peta had two children, Stephanie and Jonathan.
Stewart is best known for producing and hosting the nationally broadcast PBS television programs Computer Chronicles and Net Cafe. Computer Chronicles aired from 1984 to 2002, producing more than 400 episodes that documented the rise of the personal computer from its earliest days. Net Cafe, which aired from 1996 to 2002, explored the emergence of the internet. Both programs were widely regarded as visionary, capturing the evolution of personal computing and the early development of the digital age.
Stewart’s professional interests and talents were wide-ranging. After leaving television production, he worked as a consultant for the Internet Archive, helping to preserve and provide public access to cultural and technological media, including Computer Chronicles and other technology programs. He also shared his knowledge as an educator, teaching broadcast journalism at the Donald W. Reynolds School of Journalism at the University of Nevada, Reno. After retirement, he spent his remaining years enjoying time with Peta, his children, his grandchildren, and his brothers.
Stewart is survived by his brothers Lanny and Bruce, his children Stephanie and Jonathan, and his grandchildren Gussy, Josephine, Benjamin, Freya, and Penny.
Services will be held for immediate family only.
...
Read the original on obits.goldsteinsfuneral.com »
The last 12 months have been an incredibly frustrating time for Windows fans. For the first time in a long while, it feels like Windows is suffering from a lack of focus from the people at the top.
Support for Windows 10 ended in October, and this year was the perfect time to strengthen Windows 11 as a viable replacement for millions of users. Instead, Microsoft spent most of it shoving the OS full of half-baked AI features, all while letting the quality bar slip and shipping new bugs and issues on an almost monthly cadence.
Everything Microsoft has done when it comes to Windows this year has eroded the platform’s reputation in ways that I haven’t seen since Windows 8. Today, it feels like people hate Windows 11 with a passion, much more so than they did when 2025 first started.
There are so many problems with Windows as a platform right now that it’s hard to know where to begin.
Of course, the issue that made headlines the most this year is AI, as Microsoft falls over itself trying to make Windows 11 a frontier platform for artificial intelligence. Unfortunately, this effort feels like it has been prioritized above everything else, including quality of life and overall platform stability.
Copilot has forced its way into almost every surface and intention on the platform. Heck, even Notepad now has a Copilot button, which is something literally nobody has ever asked for. Microsoft’s AI intentions feel obsessive and forced, almost as if the company is just throwing everything at the wall to see what sticks.
Under the hood, Microsoft has been moving to make Windows 11 agentic. It unveiled the agentic workspace, along with a set of APIs that will allow AI developers to build tools that can automate workflows on your behalf. Sounds great on paper, until you read the fine print and discover that it comes with serious security implications and warnings.
You’d like to think that a feature with such serious security concerns wouldn’t make it out of the lab, but because this is AI, Microsoft doesn’t seem to mind. The feature even ships off by default, which tells you everything you need to know about how the company views this feature.
A large chunk of the AI features that were announced this year also aren’t Copilot+ PC exclusive, which means most of them require an internet connection and your data sent to the cloud to be useful, which is another privacy concern to add to all the other privacy concerns on Windows 11.
In November, Windows president Pavan Davuluri mentioned that Windows would evolve into an agentic OS, sparking one of the biggest amounts of backlash I’ve seen around Windows this year. His post was so negatively received, he had to disable replies and issue a follow up statement reassuring customers that Windows would continue to innovate outside of AI too.
I want to stress that AI can be beneficial. I’ve always said that AI is best when it’s invisible, which is why I’m so confused about Microsoft’s approach to AI on Windows 11. It seems like Microsoft wants AI to be the selling point, but that’s totally backwards. AI should be a helpful extra, not an all-encompassing, sole reason for the platform’s existence.
I think the biggest issue users are dealing with right now is Microsoft’s “Continuous Innovation” strategy for Windows 11, which is designed to allow the company to build new features and get them out the door faster than ever before.
In the past, new Windows features were often timed with a significant OS update. Once a year, Windows would receive a big upgrade, which would introduce new features and improvements from the core up. This allowed Microsoft plenty of time to bake and fine-tune new features and changes, ironing out bugs before general availability.
Today, thanks to Continuous Innovation, Microsoft is able to ship new features whenever the company deems them ready. Every. Single. Month. This means there’s now a constant churn of new features, with no breaks or respite. Users never get a chance to breathe.
On top of this is Microsoft’s Controlled Feature Rollout (CFR) system, which makes it so some users don’t see the new features even after installing the update that supposedly includes them, making it literally impossible to predict and prepare for when a new feature might actually arrive on your PC.
The new Windows 11 Start menu is a perfect example of this. It only began rolling out in October, but thanks to Controlled Feature Rollout, many users didn’t get it right away after installing the October update. For those people, it randomly appeared a few days or weeks later, without any warning or prompt, letting the user know what happened and why.
In this scenario, going into your update history to see what changed is going to confuse you, because the update that includes the new Start menu was installed on your system weeks ago. You’re only seeing the new features now because Microsoft allowed you to see it, which is insane and frustrating beyond belief.
As a result, no two Windows PCs are the same these days. Two identical systems, running the same build and update of Windows 11, might appear completely different feature-wise, which is confusing to your average user, and more than likely one of the reasons why Windows feels so much buggier these days. There are too many moving parts.
Continuous Innovation essentially boils down to allowing Microsoft to force new features onto you whenever it wants, because the company ties these new features to monthly security updates, which are essentially required if you want to use your computer safely on the internet.
But it’s beyond frustrating when said features or changes randomly appear on your system without any warning, and even more frustrating when you can’t disable or undo them. Users have to make do with Microsoft constantly moving the deck chairs, and people are getting tired.
I mean, Microsoft has even built a Windows Roadmap website designed to try and make it easier to see where new features are in their rollout. Except, the website is so confusing and frustrating to navigate, and digest, it’s actually not very useful at all. That’s how complicated the Windows update situation is right now.
Above all else, it renders the annual version update essentially irrelevant. Version 25H2, which shipped a couple of months back, includes no new features or changes over version 24H2, because Microsoft ships new features to both at the same time. They are the same version. Why even do this? Surely it makes more sense just to extend support for version 24H2?
Unfortunately, Microsoft’s ability to ship new features quickly appears to have also contributed to a noticeable decline in quality over the last year or two. It feels like many new features that actually ship are half-baked, and in some cases, outright break other things as they are introduced.
Every single week, there’s a new headline about how a recent Microsoft update has broken something on Windows, with fixes for said bugs either coming a couple of weeks later or an entire month later, depending on the schedule. Rarely, if ever, does Microsoft pull an update that is causing bugs, though it has happened a couple of times.
I wouldn’t be surprised if CFR is playing a part in this decline in quality. With the same version of Windows being able to present itself differently depending on what I can only describe as random factors, it may be just becoming harder to keep Windows stable when there are so many moving parts and variables at play.
Windows 11 today is a much more complex beast than previous versions of Windows have been. Microsoft has been obsessed with A/B testing for a long while, but CFR takes it to a whole other level, to the point where you literally can’t guarantee the version of Windows 11 you’re installing will be “feature complete” when you want it to be.
Some users have reported never getting the chance to test a particular feature before it’s made generally available because of CFR. That’s how detrimental the system is to the development and availability of new Windows features. As a real-life example of this, my main Windows 11 Insider PC is still stuck with the old Start menu, even though the new Start menu is now rolling out and generally available.
There’s no built-in option in the OS to override this, meaning there’s nothing I can do to get the new Start menu without relying on third-party tools to trick the system into letting me test it.
In some ways, CFR feels like a way for Microsoft to hide behind the fact that it knows the features it ships to production aren’t always 100% ready, as it allows them to disable access to said features server-side if a problem arises.
There’s also the issue of consistency, which continues to be a problem on Windows 11. The company has done well to attempt to address UI consistency, though there are still glaring issues in areas like the File Explorer. But what frustrates me most is the inconsistent use of its own native Windows UI framework in in-box apps and the OS shell.
Outlook is the built-in mail client on Windows 11, and it’s genuinely the worst included OS email client I’ve ever used. It’s a website that’s slow to open, unreliable at sending notifications, and eats up a chunk of memory when in use. There’s nothing optimized or delightful about the Outlook app on Windows 11.
Microsoft also just announced that it’s bringing back the agenda view to the calendar flyout on the Taskbar, but it looks like that feature is built using web tech instead of Windows 11′s native UI stack. That’s frankly unacceptable, but this is the sort of thing Microsoft does on a frequent basis these days.
Unfortunately for Microsoft, its competitors have been quietly capitalizing on Windows’ downfall in the last year or so. Google has been working behind closed doors on Android PCs, which are expected to debut next year as a viable alternative to Windows in the low-end to mid-range PC space.
This is an area that Windows woefully struggled in in recent years. Windows 11 is just too big, bloated, and unoptimized to run well on low-end hardware, to the point where many schools and enterprises are switching to Chrome OS or even the iPad because Windows just sucks on these devices.
I’m still blown away by how quick Chrome OS is at both updating the system and factory resetting the system. Installing a system update on Chrome OS is as quick as restarting an app, taking less than a few seconds in most cases. On Windows 11, installing an update can take anywhere from a few minutes to hours, depending on how big the update is.
With Android PCs, Windows might finally have a real challenger in this low-end space. If Lenovo, Dell, HP, and the other top-name OEMs are on board to build Android PCs, I really don’t see how Windows 11 in its current state will be able to compete. The Android system is just better optimized for these low-end devices, and Windows needs a real architectural slim down even to stand a chance.
It’s not just Google coming for Microsoft’s lunch either; Valve is interested in taking some of that sweet gamer market share from Windows. It’s made its intentions very clear this year: SteamOS is the future of PC gaming, and it wants as many Windows users to make the switch as possible.
This couldn’t have come at a worse time for Microsoft, given the backlash and frustration from users about Windows 11. Gamers are all but ready to abandon ship, and Valve is offering up a viable alternative on a plate. The Steam Machine is going to light a fire under Windows PC gaming.
Then you’ve got Apple, which is always slowly pecking away at Windows market share with the Mac. Since Apple Silicon, Mac has only gained market share, and its latest laptops are some of the best out there. These days, the only reason not to buy a Mac is if you need a laptop with a touch screen or 5G, or just don’t like MacOS.
Now, Apple is rumored to be building a cheap MacBook that will ship sometime next year. This could be potentially devastating for Windows, because for a lot of people, the only reason they don’t own a Mac is that they’re too expensive. If Apple can ship a new MacBook for $600, that’s going to be hard to say no to over any Windows laptop in the same price range.
I don’t want this article to be all doom and gloom, and it shouldn’t be, because for all of Windows’ faults, Microsoft has done some good things with the platform this year.
It finally committed to refocusing on small but important details and experiences in the OS. The company understands that Windows 11 currently feels incomplete in a lot of areas, likely because it is, and is addressing those key complaints. Things like Dark Mode are being more consistently applied across the OS now, which is an improvement I’ve been waiting a decade for.
The company is also adding back things like smooth animations when hovering over open app icons on the Taskbar, or the Agenda view in the Calendar flyout on the Taskbar (albeit with web tech). It’s also introduced new features like the share drag tray, which makes sharing files super easy.
The new Start menu is also a significant improvement over the old one, with more icons on show, the ability to turn off Recommended ads and recent files, and the ability to show your apps list on the main home page.
Microsoft also introduced a number of improvements to the Windows BSOD and recovery options screen, which makes recovering a Windows system that has been taken offline due to a faulty update or driver much more straightforward and streamlined. It’s a lot harder to take a PC offline today than it was a year ago.
For gamers, Windows 11 is better than ever. The Xbox app is being positioned as a hub for all of gaming on Windows, and is now capable of replacing the desktop interface for when you just want to navigate the system with a controller. The company has also promised even more optimizations to come in the following year.
While there is a lot to complain about, there’s also quite a bit to like about Windows 11 this year. I just wish there were more good than bad.
Ultimately, I think it’s very clear that something needs to change. The public has decided that Windows 11 is a bad operating system, and Microsoft does need to address this.
If I were in charge, the very first thing I would do is throw out the Continuous Innovation strategy. There’s simply no need to ship new features on a monthly cadence, users don’t want it, and Microsoft would have an easier time developing and testing new features thoroughly without it.
Instead, I would introduce quarterly feature drops, with big new features coming once a year timed with the annual version update. Microsoft can ship smaller quality of life improvements, features, and updates every three months, and any big user experience changes or improvements once a year. Security updates can remain monthly.
This would allow Microsoft more time to test features as they are developed before shipping them, which would ideally improve the overall stability of the system. I’d also scrap the CFR system and ship new features to everyone as the updates are released.
I’d also love to see Microsoft tone things down when it comes to AI. Windows 11 should be AI capable, of course, but I really don’t think it needs to be shoved into every UI surface possible. Notepad doesn’t need an AI button, for goodness sake. AI is best when it’s invisible, not when it’s shoved in your face at every turn.
Given the current reputation of Windows 11, if I were in charge of Windows, I’d certainly be thinking about pivoting to Windows 12 in an attempt to give Windows a clean slate and a fresh start. As long as the company doesn’t market it as an AI-first OS, pivoting to Windows 12 would be nothing but a good thing for Microsoft, especially if it’s a free update for everyone that doesn’t bump system requirements.
That doesn’t mean Windows 12 should have no AI features. The fact of the matter is, AI is here to stay, and I’d be very interested to see what a desktop UX can be like if it’s built from scratch with AI in mind. But it needs to be optional, and it cannot be the sole reason for Windows 12 to exist. AI should complement the platform, not become the platform.
Follow Windows Central on Google News to keep our latest news, insights, and features at the top of your feeds!
...
Read the original on www.windowscentral.com »
France intends to follow Australia and ban social media platforms for children from the start of the 2026 academic year.
A draft bill preventing under-15s from using social media will be submitted for legal checks and is expected to be debated in parliament early in the new year.
The French president, Emmanuel Macron, has made it clear in recent weeks that he wants France to swiftly follow Australia’s world-first ban on social media platforms for under-16s, which came into force in December. It includes Facebook, Snapchat, TikTok and YouTube.
Le Monde and France Info reported on Wednesday that a draft bill was now complete and contained two measures: a ban on social media for under-15s and a ban on mobile phones in high schools, where 15- to 18-year-olds study. Phones have already been banned in primary and middle schools.
The bill will be submitted to France’s Conseil d’État for legal review in the coming days. Education unions will also look at the proposed high-school ban on phones.
The government wants the social media ban to come into force from September 2026.
Le Monde reported the text of the draft bill cited “the risks of excessive screen use by teenagers”, including the dangers of being exposed to inappropriate social media content, online bullying, and altered sleep patterns. The bill states the need to “protect future generations” from dangers that threaten their ability to thrive and live together in a society with shared values.
Earlier this month, Macron confirmed at a public debate in Saint Malo that he wanted a social media ban for young teenagers. He said there was “consensus being shaped” on the issue after Australia introduced its ban. “The more screen time there is, the more school achievement drops … the more screen time there is, the more mental health problems go up,” he said.
He used the analogy of a teenager getting into a Formula One racing car before they had learned to drive. “If a child is in a Formula One car and they turn on the engine, I don’t want them to win the race, I just want them to get out of the car. I want them to learn the highway code first, and to ensure the car works, and to teach them to drive in a different car.”
Several other countries are considering social media bans for under-15s after Australia’s ban including Denmark, whose government hopes to introduce a ban in 2026, and Norway. Malaysia is also planning a social media ban for under-16s from 2026. In the UK, the Labour government has not ruled out a ban, saying “nothing is off the table” but any ban must be “based on robust evidence”.
Anne Le Hénanff, the French minister in charge of digital development and artificial intelligence, told Le Parisien this month that the social media ban for under-15s was a government priority, and that the bill would be “short and compatible with European law”, namely the EU’s Digital Services Act (DSA) — regulation intended to combat hateful speech, misinformation and disinformation.
The social media ban is part of Macron’s attempt to shape his legacy as he enters his difficult final year as president with a divided parliament.
On 23 December, last-minute legislation was passed to keep the government in business into January after parliament failed to agree a full budget for 2026. Attempts to agree a budget will resume next month.
A French parliamentary inquiry into TikTok’s psychological effects concluded in September that the platform was like a “slow poison” to children. The co-head of the inquiry, the centrist lawmaker Laure Miller, told France Info that TikTok was an “ocean of harmful content” that was very visible to children through algorithms that kept them in a bubble. TikTok responded that it was being unfairly scapegoated for “industry-wide and societal challenges”.
The French parliament report recommended more broadly that children under 15 in France should be banned entirely from using social media, and those between 15 and 18 should face a night-time “digital curfew”, meaning social media would be made unavailable to them between 10pm and 8am.
The inquiry was set up after a 2024 French lawsuit against TikTok by seven families who accused it of exposing their children to content that was pushing them towards ending their lives.
...
Read the original on www.theguardian.com »
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.