10 interesting stories served every morning and every evening.




1 405 shares, 19 trendiness

Your LLM Doesn't Write Correct Code. It Writes Plausible Code.

One of the sim­plest tests you can run on a data­base:

It’s not a mis­placed comma! The rewrite is 20,171 times slower on one of the most ba­sic data­base op­er­a­tions.

EDIT: Several read­ers have con­fused this pro­ject with Turso/libsql. They are un­re­lated. Turso forks the orig­i­nal C SQLite code­base; the pro­ject an­a­lyzed here is a ground-up LLM-generated rewrite by a sin­gle de­vel­oper. Running the same bench­mark against Turso shows per­for­mance within 1.2x of SQLite con­sis­tent with a ma­ture fork, not a reim­ple­men­ta­tion.

The thing is though: The code com­piles. It passes all its tests. It reads and writes the cor­rect SQLite file for­mat. Its README claims MVCC con­cur­rent writ­ers, file com­pat­i­bil­ity, and a drop-in C API. On first glance it reads like a work­ing data­base en­gine.

But it is not!

LLMs op­ti­mize for plau­si­bil­ity over cor­rect­ness. In this case, plau­si­ble is about 20,000 times slower than cor­rect.

I write this as a prac­ti­tioner, not as a critic. After more than 10 years of pro­fes­sional dev work, I’ve spent the past 6 months in­te­grat­ing LLMs into my daily work­flow across mul­ti­ple pro­jects. LLMs have made it pos­si­ble for any­one with cu­rios­ity and in­ge­nu­ity to bring their ideas to life quickly, and I re­ally like that! But the num­ber of screen­shots of silently wrong out­put, con­fi­dently bro­ken logic, and cor­rect-look­ing code that fails un­der scrutiny I have amassed on my disk shows that things are not al­ways as they seem. My con­clu­sion is that LLMs work best when the user de­fines their ac­cep­tance cri­te­ria be­fore the first line of code is gen­er­ated.

A note on the pro­jects ex­am­ined: this is not a crit­i­cism of any in­di­vid­ual de­vel­oper. I do not know the au­thor per­son­ally. I have noth­ing against them. I’ve cho­sen the pro­jects be­cause they are pub­lic, rep­re­sen­ta­tive, and rel­a­tively easy to bench­mark. The fail­ure pat­terns I found are pro­duced by the tools, not the au­thor. Evidence from METRs ran­dom­ized study and GitClear’s large-scale repos­i­tory analy­sis sup­port that these is­sues are not iso­lated to one de­vel­oper when out­put is not heav­ily ver­i­fied. That’s the point I’m try­ing to make!

This ar­ti­cle talks about what that gap looks like in prac­tice: the code, the bench­marks, an­other case study to see if the pat­tern is ac­ci­den­tal, and ex­ter­nal re­search con­firm­ing it is not an out­lier.

I com­piled the same C bench­mark pro­gram against two li­braries: sys­tem SQLite and the Rust reim­ple­men­ta­tion’s C API li­brary. Same com­piler flags, same WAL mode, same table schema, same queries. 100 rows:

I’ll take the TRANSACTION batch row as the base­line be­cause it does­n’t have the same glar­ing bugs as the oth­ers, namely no WHERE clauses and per-state­ment syncs. In this run that base­line is al­ready 298x, which means even the best-case path is far be­hind SQLite. Anything above 298x sig­nals a bug.

The largest gap be­yond our base­line is dri­ven by two bugs:

INSERT with­out a trans­ac­tion: 1,857x ver­sus 298x in batch mode. SELECT BY ID: 20,171x. UPDATE and DELETE are both above 2,800x. The pat­tern is con­sis­tent: any op­er­a­tion that re­quires the data­base to find some­thing is in­sanely slow.

I read the source code. Well.. the parts I needed to read based on my bench­mark re­sults. The reim­ple­men­ta­tion is not small: 576,000 lines of Rust code across 625 files. There is a parser, a plan­ner, a VDBE byte­code en­gine, a B-tree, a pager, a WAL. The mod­ules have all the correct” names. The ar­chi­tec­ture also looks cor­rect. But two bugs in the code and a group of smaller is­sues com­pound:

In SQLite, when you de­clare a table as:

CREATE TABLE test (id INTEGER PRIMARY KEY, name TEXT, value REAL);

the col­umn id be­comes an alias for the in­ter­nal rowid — the B-tree key it­self. A query like WHERE id = 5 re­solves to a di­rect B-tree search and scales O(log n). (I al­ready wrote a TLDR piece about how B-trees work here.) The SQLite query plan­ner doc­u­men­ta­tion states: the time re­quired to look up the de­sired row is pro­por­tional to logN rather than be­ing pro­por­tional to N as in a full table scan.” This is not an op­ti­miza­tion. It is a fun­da­men­tal de­sign de­ci­sion in SQLite’s query op­ti­mizer:

# `where.c`, in `whereScanInit()`

if( iCol­umn==pIdx->pT­able->iP­Key ){

iCol­umn = XN_ROWID;

The line above con­verts a named col­umn ref­er­ence to XN_ROWID when it matches the table’s INTEGER PRIMARY KEY col­umn. The VDBE then trig­gers a SeekRowid op­er­a­tion in­stead of a full table scan, which makes the whole thing pro­por­tional to logN.

The Rust reim­ple­men­ta­tion has a proper B-tree. The table_seek func­tion im­ple­ments cor­rect bi­nary search de­scent through its nodes and scales O(log n). It works. But the query plan­ner never calls it for named columns!

The is_rowid_ref() func­tion only rec­og­nizes three magic strings:

fn is_rowid_ref(col_ref: &ColumnRef) -> bool {

let name = col_ref.col­umn.to_asci­i_low­er­case();

name == rowid” || name == _rowid_” || name == oid”

A col­umn de­clared as id INTEGER PRIMARY KEY, even though it is in­ter­nally flagged as is_ipk: true, does­n’t get rec­og­nized. It is never con­sulted when choos­ing be­tween a B-tree search and a full table scan.

Every WHERE id = N query flows through code­gen_s­e­lec­t_­ful­l_s­can(), which emits lin­ear walks through every row via Rewind / Next / Ne to com­pare each rowid against the tar­get. At 100 rows with 100 lookups, that is 10,000 row com­par­isons in­stead of roughly 700 B-tree steps. O(n²) in­stead of O(n log n). This is con­sis­tent with the ~20,000x re­sult in this run.

Every WHERE clause on every col­umn does a full table scan. The only fast path is WHERE rowid = ? us­ing the lit­eral pseudo-col­umn name.

The sec­ond bug is re­spon­si­ble for the 1,857x on INSERT. Every bare INSERT out­side a trans­ac­tion is wrapped in a full au­to­com­mit cy­cle: en­sure_au­to­com­mit_txn() → ex­e­cute → re­solve_au­to­com­mit_txn(). The com­mit calls wal.sync(), which calls Rust’s fsync(2) wrap­per. 100 INSERTs means 100 fsyncs.

SQLite does the same au­to­com­mit, but uses fdata­sync(2) on Linux, which skips sync­ing file meta­data when com­piled with HAVE_FDATASYNC (the de­fault). This is roughly 1.6 to 2.7 times cheaper on NVMe SSDs. SQLite’s per-state­ment over­head is also min­i­mal: no schema re­load, no AST clone, no VDBE re­com­pile. The Rust reim­ple­men­ta­tion does all three on every call.

Looking at the Rust TRANSACTION batch row, batched in­serts (one fsync for 100 in­serts) take 32.81 ms, whereas in­di­vid­ual in­serts (100 fsync calls) take 2,562.99 ms. That’s a 78x over­head from the au­to­com­mit.

These two bugs are not iso­lated cases. They are am­pli­fied by a group of in­di­vid­u­ally de­fen­si­ble safe” choices that com­pound:

* AST clone on every cache hit. The SQL parse is cached, but the AST is .clone()’d on every sqlite3_exec(), then re­com­piled to VDBE byte­code from scratch. SQLite’s sqlite3_pre­pare_v2() just re­turns a reusable han­dle.

* 4KB (Vec The page cache re­turns data via .to_vec(), which cre­ates a new al­lo­ca­tion and copies it into the Vec even on cache hits. SQLite re­turns a di­rect pointer into pinned cache mem­ory, cre­at­ing zero copies. The Fjall data­base team mea­sured this ex­act anti-pat­tern at 44% of run­time be­fore build­ing a cus­tom ByteView type to elim­i­nate it.

* Schema re­load on every au­to­com­mit cy­cle. After each state­ment com­mits, the next state­ment sees the bumped com­mit counter and calls re­load­_memd­b_from_­pager(), walks the sqlite_­mas­ter B-tree and then re-parses every CREATE TABLE to re­build the en­tire in-mem­ory schema. SQLite checks the schema cookie and only re­loads it on change.

* Eager for­mat­ting in the hot path. state­men­t_sql.to_string() (AST-to-SQL for­mat­ting) is eval­u­ated on every call be­fore its guard check. This means it does se­ri­al­iza­tion re­gard­less of whether a sub­scriber is ac­tive or not.

* New ob­jects on every state­ment. A new SimpleTransaction, a new VdbeProgram, a new MemDatabase, and a new VdbeEngine are al­lo­cated and de­stroyed per state­ment. SQLite reuses all of these across the con­nec­tion life­cy­cle via a looka­side al­lo­ca­tor to elim­i­nate mal­loc/​free in the ex­e­cu­tion loop.

Each of these was prob­a­bly cho­sen in­di­vid­u­ally with sound gen­eral rea­son­ing: We clone be­cause Rust own­er­ship makes shared ref­er­ences com­plex.” We use sync_all be­cause it is the safe de­fault.” We al­lo­cate per page be­cause re­turn­ing ref­er­ences from a cache re­quires un­safe.”

Every de­ci­sion sounds like choos­ing safety. But the end re­sult is about 2,900x slower in this bench­mark. A data­base’s hot path is the one place where you prob­a­bly should­n’t choose safety over per­for­mance. SQLite is not pri­mar­ily fast be­cause it is writ­ten in C. Well.. that too, but it is fast be­cause 26 years of pro­fil­ing have iden­ti­fied which trade­offs mat­ter.

In the 1980 Turing Award lec­ture Tony Hoare said: There are two ways of con­struct­ing a soft­ware de­sign: one way is to make it so sim­ple that there are ob­vi­ously no de­fi­cien­cies, and the other is to make it so com­pli­cated that there are no ob­vi­ous de­fi­cien­cies.” This LLM-generated code falls into the sec­ond cat­e­gory. The reim­ple­men­ta­tion is 576,000 lines of Rust (measured via scc, count­ing code only, with­out com­ments or blanks). That is 3.7x more code than SQLite. And yet it still misses the is_ipk check that han­dles the se­lec­tion of the cor­rect search op­er­a­tion.

Steven Skiena writes in The Algorithm Design Manual: Reasonable-looking al­go­rithms can eas­ily be in­cor­rect. Algorithm cor­rect­ness is a prop­erty that must be care­fully demon­strated.” It’s not enough that the code looks right. It’s not enough that the tests pass. You have to demon­strate with bench­marks and with proof that the sys­tem does what it should. 576,000 lines and no bench­mark. That is not correctness first, op­ti­miza­tion later.” That is no cor­rect­ness at all.

The SQLite reim­ple­men­ta­tion is not the only ex­am­ple. A sec­ond pro­ject by the same au­thor shows the same dy­namic in a dif­fer­ent do­main.

The de­vel­op­er’s LLM agents com­pile Rust pro­jects con­tin­u­ously, fill­ing disks with build ar­ti­facts. Rust’s tar­get/ di­rec­to­ries con­sume 2–4 GB each with in­cre­men­tal com­pi­la­tion and de­bug­info, a top-three com­plaint in the an­nual Rust sur­vey. This is am­pli­fied by the pro­jects them­selves: a sib­ling agent-co­or­di­na­tion tool in the same port­fo­lio pulls in 846 de­pen­den­cies and 393,000 lines of Rust. For con­text, rip­grep has 61; sudo-rs was de­lib­er­ately re­duced from 135 to 3. Properly ar­chi­tected pro­jects are lean.

The so­lu­tion to the disk pres­sure: a cleanup dae­mon. 82,000 lines of Rust, 192 de­pen­den­cies, a 36,000-line ter­mi­nal dash­board with seven screens and a fuzzy-search com­mand palette, a Bayesian scor­ing en­gine with pos­te­rior prob­a­bil­ity cal­cu­la­tions, an EWMA fore­caster with PID con­troller, and an as­set down­load pipeline with mir­ror URLs and of­fline bun­dle sup­port.

*/5 * * * * find ~/*/target -type d -name incremental” -mtime +7 -exec rm -rf {} +

A one-line cron job with 0 de­pen­den­cies. The pro­jec­t’s README claims ma­chines become un­re­spon­sive” when disks fill. It does not once men­tion Rust’s stan­dard tool for ex­actly this prob­lem: cargo-sweep. It also fails to con­sider that op­er­at­ing sys­tems al­ready carry bal­last helpers. ex­t4’s 5% root reser­va­tion, re­serves blocks for priv­i­leged processes by de­fault: on a 500 GB disk, 25 GB re­main avail­able to root even when non-root users see disk full.” That does not guar­an­tee zero im­pact, but it usu­ally means priv­i­leged re­cov­ery paths re­main avail­able so root can still log in and delete files.

The pat­tern is the same as the SQLite rewrite. The code matches the in­tent: Build a so­phis­ti­cated disk man­age­ment sys­tem” pro­duces a so­phis­ti­cated disk man­age­ment sys­tem. It has dash­boards, al­go­rithms, fore­cast­ers. But the prob­lem of delet­ing old build ar­ti­facts is al­ready solved. The LLM gen­er­ated what was de­scribed, not what was needed.

THIS is the fail­ure mode. Not bro­ken syn­tax or miss­ing semi­colons. The code is syn­tac­ti­cally and se­man­ti­cally cor­rect. It does what was asked for. It just does not do what the sit­u­a­tion re­quires. In the SQLite case, the in­tent was implement a query plan­ner” and the re­sult is a query plan­ner that plans every query as a full table scan. In the disk dae­mon case, the in­tent was manage disk space in­tel­li­gently” and the re­sult is 82,000 lines of in­tel­li­gence ap­plied to a prob­lem that needs none. Both pro­jects ful­fill the prompt. Neither solves the prob­lem.

The ob­vi­ous coun­ter­ar­gu­ment is skill is­sue, a bet­ter en­gi­neer would have caught the full table scan.” And that’s true. That’s ex­actly the point! LLMs are dan­ger­ous to peo­ple least equipped to ver­ify their out­put. If you have the skills to catch the is_ipk bug in your query plan­ner, the LLM saves you time. If you don’t, you have no way to know the code is wrong. It com­piles, it passes tests, and the LLM will hap­pily tell you that it looks great.

The tools used to mea­sure LLM out­put re­in­force the il­lu­sion. scc‘s COCOMO model es­ti­mates the rewrite at $21.4 mil­lion in de­vel­op­ment cost. The same model val­ues print(“hello world”) at $19.

COCOMO was de­signed to es­ti­mate ef­fort for hu­man teams writ­ing orig­i­nal code. Applied to LLM out­put, it mis­takes vol­ume for value. Still these num­bers are of­ten pre­sented as proof of pro­duc­tiv­ity.

The met­ric is not mea­sur­ing what most think it is mea­sur­ing.

Now 2 case stud­ies are not proof. I hear you! When two pro­jects from the same method­ol­ogy show the same gap, the next step is to test whether sim­i­lar ef­fects ap­pear in the broader pop­u­la­tion. The stud­ies be­low use mixed meth­ods to re­duce our sin­gle-sam­ple bias.

This gap be­tween in­tent and cor­rect­ness has a name. AI align­ment re­search calls it syco­phancy, which de­scribes the ten­dency of LLMs to pro­duce out­puts that match what the user wants to hear rather than what they need to hear.

Anthropic’s Towards Understanding Sycophancy in Language Models” (ICLR 2024) pa­per showed that five state-of-the-art AI as­sis­tants ex­hib­ited syco­phan­tic be­hav­ior across a num­ber of dif­fer­ent tasks. When a re­sponse matched a user’s ex­pec­ta­tion, it was more likely to be pre­ferred by hu­man eval­u­a­tors. The mod­els trained on this feed­back learned to re­ward agree­ment over cor­rect­ness.

The BrokenMath bench­mark (NeurIPS 2025 Math-AI Workshop) tested this in for­mal rea­son­ing across 504 sam­ples. Even GPT-5 pro­duced syco­phan­tic proofs” of false the­o­rems 29% of the time when the user im­plied the state­ment was true. The model gen­er­ates a con­vinc­ing but false proof be­cause the user sig­naled that the con­clu­sion should be pos­i­tive. GPT-5 is not an early model. It’s also the least syco­phan­tic in the BrokenMath table. The prob­lem is struc­tural to RLHF: pref­er­ence data con­tains an agree­ment bias. Reward mod­els learn to score agree­able out­puts higher, and op­ti­miza­tion widens the gap. Base mod­els be­fore RLHF were re­ported in one analy­sis to show no mea­sur­able syco­phancy across tested sizes. Only af­ter fine-tun­ing did syco­phancy en­ter the chat. (literally)

In April 2025, OpenAI rolled back a GPT-4o up­date that had made the model more syco­phan­tic. It was flab­ber­gasted by a busi­ness idea de­scribed as shit on a stick” and en­dorsed stop­ping psy­chi­atric med­ica­tion. An ad­di­tional re­ward sig­nal based on thumbs-up/​thumbs-down data weakened the in­flu­ence of […] pri­mary re­ward sig­nal, which had been hold­ing syco­phancy in check.”

In the con­text of cod­ing, syco­phancy man­i­fests as what Addy Osmani de­scribed in his 2026 AI cod­ing work­flow: agents that don’t push back with Are you sure?” or Have you con­sid­ered…?” but in­stead pro­vide en­thu­si­asm to­wards what­ever the user de­scribed, even when the de­scrip­tion was in­com­plete or con­tra­dic­tory.

This also ap­plies to LLM-generated eval­u­a­tion. Ask the same LLM to re­view the code it gen­er­ated and it will tell you the ar­chi­tec­ture is sound, the mod­ule bound­aries clean and the er­ror han­dling is thor­ough. It will some­times even praise the test cov­er­age. It will not no­tice that every query does a full table scan if not asked for. The same RLHF re­ward that makes the model gen­er­ate what you want to hear makes it eval­u­ate what you want to hear. You should not rely on the tool alone to au­dit it­self. It has the same bias as a re­viewer as it has as an au­thor.

An LLM prompted to implement SQLite in Rust” will gen­er­ate code that looks like an im­ple­men­ta­tion of SQLite in Rust. It will have the right mod­ule struc­ture and func­tion names. But it can not mag­i­cally gen­er­ate the per­for­mance in­vari­ants that ex­ist be­cause some­one pro­filed a real work­load and found the bot­tle­neck. The Mercury bench­mark (NeurIPS 2024) con­firmed this em­pir­i­cally: lead­ing code LLMs achieve ~65% on cor­rect­ness but un­der 50% when ef­fi­ciency is also re­quired.

The SQLite doc­u­men­ta­tion says INTEGER PRIMARY KEY lookups are fast. It does not say how to build a query plan­ner that makes them fast. Those de­tails live in 26 years of com­mit his­tory that only ex­ists be­cause real users hit real per­for­mance walls.

Now 2 case stud­ies are not proof. I hear you! When two pro­jects from the same method­ol­ogy show the same gap, the next step is to test whether sim­i­lar ef­fects ap­pear in the broader pop­u­la­tion. The stud­ies be­low use mixed meth­ods to re­duce our sin­gle-sam­ple bias.

The ques­tion be­comes whether sim­i­lar ef­fects show up in broader datasets. Recent stud­ies sug­gest they do, though ef­fect sizes vary.

In February 2025, Andrej Karpathy tweeted: There’s a new kind of cod­ing I call vibe cod­ing’, where you fully give in to the vibes, em­brace ex­po­nen­tials, and for­get that the code even ex­ists.”

Karpathy prob­a­bly meant it for throw­away week­end pro­jects (who am I to judge what he means any­way), but it feels like the in­dus­try heard some­thing else. Simon Willison drew the line more clearly: I won’t com­mit any code to my repos­i­tory if I could­n’t ex­plain ex­actly what it does to some­body else.” Willison treats LLMs as an over-con­fi­dent pair pro­gram­ming as­sis­tant” that makes mis­takes sometimes sub­tle, some­times huge” with com­plete con­fi­dence.

The data on what hap­pens when that line is not drawn:

METRs ran­dom­ized con­trolled trial (July 2025; up­dated February 24, 2026) with 16 ex­pe­ri­enced open-source de­vel­op­ers found that par­tic­i­pants us­ing AI were 19% slower, not faster. Developers ex­pected AI to speed them up, and af­ter the mea­sured slow­down had al­ready oc­curred, they still be­lieved AI had sped them up by 20%. These were not ju­nior de­vel­op­ers but ex­pe­ri­enced open-source main­tain­ers. If even THEY could not tell in this setup, sub­jec­tive im­pres­sions alone are prob­a­bly not a re­li­able per­for­mance mea­sure.

GitClear’s analy­sis of 211 mil­lion changed lines (2020–2024) re­ported that copy-pasted code in­creased while refac­tor­ing de­clined. For the first time ever, copy-pasted lines ex­ceeded refac­tored lines.

The im­pli­ca­tions are no longer just a fear”. In July 2025, Replit’s AI agent deleted a pro­duc­tion data­base con­tain­ing data for 1,200+ ex­ec­u­tives, then fab­ri­cated 4,000 fic­tional users to mask the dele­tion.

Google’s DORA 2024 re­port re­ported that every 25% in­crease in AI adop­tion at the team level was as­so­ci­ated with an es­ti­mated 7.2% de­crease in de­liv­ery sta­bil­ity.

SQLite shows what cor­rect looks like and why the gap is so hard to close.

SQLite is ~156,000 lines of C. Its own doc­u­men­ta­tion places it among the top five most de­ployed soft­ware mod­ules of any type, with an es­ti­mated one tril­lion ac­tive data­bases world­wide. It has 100% branch cov­er­age and 100% MC/DC (Modified Condition/Decision Coverage the stan­dard re­quired for Level A avi­a­tion soft­ware un­der DO-178C). Its test suite is 590 times larger than the li­brary. MC/DC does not just check that every branch is cov­ered. but proves that every in­di­vid­ual ex­pres­sion in­de­pen­dently af­fects the out­come. That’s the dif­fer­ence be­tween the tests pass” and the tests prove cor­rect­ness.” The reim­ple­men­ta­tion has nei­ther met­ric.

The speed comes from de­lib­er­ate de­ci­sions:

Zero-copy page cache. The pcache re­turns di­rect point­ers into pinned mem­ory. No copies. Production Rust data­bases have solved this too. sled uses in­line-or-Arc-backed IVec buffers, Fjall built a cus­tom ByteView type, redb wrote a user-space page cache in ~565 lines. The .to_vec() anti-pat­tern is known and doc­u­mented. The reim­ple­men­ta­tion used it any­way.

Prepared state­ment reuse. sqlite3_pre­pare_v2() com­piles once. sqlite3_step() / sqlite3_re­set() reuse the com­piled code. The cost of SQL-to-bytecode com­pi­la­tion can­cels out to near zero. The reim­ple­men­ta­tion re­com­piles on every call.

Schema cookie check. uses one in­te­ger at a spe­cific off­set in the file header to read it and com­pare it. The reim­ple­men­ta­tion walks the en­tire sqlite_­mas­ter B-tree and re-parses every CREATE TABLE state­ment af­ter every au­to­com­mit.

fdata­sync in­stead of fsync. Data-only sync wi­htout meta­data jour­nal­ing saves mea­sur­able time per com­mit. The reim­ple­men­ta­tion uses sync_all() be­cause it is the safe de­fault.

The iP­Key check. One line in where.c. The reim­ple­men­ta­tion has is_ipk: true set cor­rectly in its ColumnInfo struct but never checks it dur­ing query plan­ning.

Competence is not writ­ing 576,000 lines. A data­base per­sists (and processes) data. That is all it does. And it must do it re­li­ably at scale. The dif­fer­ence be­tween O(log n) and O(n) on the most com­mon ac­cess pat­tern is not an op­ti­miza­tion de­tail, it is the per­for­mance in­vari­ant that helps the sys­tem work at 10,000, 100,000 or even 1,000,000 or more rows in­stead of col­laps­ing. Knowing that this in­vari­ant lives in one line of code, and know­ing which line, is what com­pe­tence means. It is know­ing that fdata­sync ex­ists and that the safe de­fault is not al­ways the right de­fault.

The is_rowid_ref() func­tion is 4 lines of Rust. It checks three strings. But it misses the most im­por­tant case: the named INTEGER PRIMARY KEY col­umn that every SQLite tu­to­r­ial uses and every ap­pli­ca­tion de­pends on.

That check ex­ists in SQLite be­cause some­one, prob­a­bly Richard Hipp 20 years ago, pro­filed a real work­load, no­ticed that named pri­mary key columns were not hit­ting the B-tree search path, and wrote one line in where.c to fix it. The line is not fancy. It does­n’t ap­pear in any API doc­u­men­ta­tion. But no LLM trained on doc­u­men­ta­tion and Stack Overflow an­swers will mag­i­cally know about it.

That’s the gap! Not be­tween C and Rust (or any other lan­guage). Not be­tween old and new. But be­tween sys­tems that were built by peo­ple who mea­sured, and sys­tems that were built by tools that pat­tern-match. LLMs pro­duce plau­si­ble ar­chi­tec­ture. They do not pro­duce all the crit­i­cal de­tails.

If you are us­ing LLMs to write code (which in 2026 prob­a­bly most of us are), the ques­tion is not whether the out­put com­piles. It is whether you could find the bug your­self. Prompting with find all bugs and fix them” won’t work. This is not a syn­tax er­ror. It is a se­man­tic bug: the wrong al­go­rithm and the wrong syscall. If you prompted the code and can­not ex­plain why it chose a full table scan over a B-tree search, you do not have a tool. The code is not yours un­til you un­der­stand it well enough to break it.

LLMs are use­ful. They make for a very pro­duc­tive flow when the per­son us­ing them knows what cor­rect looks like. An ex­pe­ri­enced data­base en­gi­neer us­ing an LLM to scaf­fold a B-tree would have caught the is_ipk bug in code re­view be­cause they know what a query plan should emit. An ex­pe­ri­enced ops en­gi­neer would never have ac­cepted 82,000 lines in­stead of a cron job one-liner. The tool is at its best when the de­vel­oper can de­fine the ac­cep­tance cri­te­ria as spe­cific, mea­sur­able con­di­tions that help dis­tin­guish work­ing from bro­ken. Using the LLM to gen­er­ate the so­lu­tion in this case can be faster while also be­ing cor­rect. Without those cri­te­ria, you are not pro­gram­ming but merely gen­er­at­ing to­kens and hop­ing.

The vibes are not enough. Define what cor­rect means. Then mea­sure.

Current bench­mark fig­ures in this re­vi­sion are from the 100-row run shown in bench.png (captured on a Linux x86_64 ma­chine). SQLite 3.x (system lib­sqlite3) vs. the Rust reim­ple­men­ta­tion’s C API (release build, -O2). Line counts mea­sured via scc (code only — ex­clud­ing blanks and com­ments). All source code claims ver­i­fied against the repos­i­tory at time of writ­ing.

...

Read the original on blog.katanaquant.com »

2 365 shares, 32 trendiness

Uploading Pirated Books via BitTorrent Qualifies as Fair Use, Meta Argues

To help train AI mod­els, Meta and other tech com­pa­nies have down­loaded and shared pi­rated books via BitTorrent from Anna’s Archive and other shadow li­braries. In an on­go­ing law­suit, Meta now ar­gues that up­load­ing pi­rated books to strangers via BitTorrent qual­i­fies as fair use. The com­pany also stresses that the data helped es­tab­lish U. S. global lead­er­ship in AI.

To help train AI mod­els, Meta and other tech com­pa­nies have down­loaded and shared pi­rated books via BitTorrent from Anna’s Archive and other shadow li­braries. In an on­go­ing law­suit, Meta now ar­gues that up­load­ing pi­rated books to strangers via BitTorrent qual­i­fies as fair use. The com­pany also stresses that the data helped es­tab­lish U. S. global lead­er­ship in AI.

In the race to build the most ca­pa­ble LLM mod­els, sev­eral tech com­pa­nies sourced copy­righted con­tent for use as train­ing data, with­out ob­tain­ing per­mis­sion from con­tent own­ers.

Meta, the par­ent com­pany of Facebook and Instagram, was one of the com­pa­nies to get sued. In 2023, well-known book au­thors, in­clud­ing Richard Kadrey, Sarah Silverman, and Christopher Golden, filed a class-ac­tion law­suit against the com­pany.

Last sum­mer, Meta scored a key vic­tory in this case, as the court con­cluded that us­ing pi­rated books to train its Llama LLM qual­i­fied as fair use, based on the ar­gu­ments pre­sented in this case. This was a bit­ter­sweet vic­tory, how­ever, as Meta re­mained on the hook for down­load­ing and shar­ing the books via BitTorrent.

By down­load­ing books from shadow li­braries such as Anna’s Archive, Meta re­lied on BitTorrent trans­fers. In ad­di­tion to down­load­ing con­tent, these typ­i­cally up­load data to oth­ers as well. According to the au­thors, this means that Meta was en­gaged in wide­spread and di­rect copy­right in­fringe­ment.

In re­cent months, the law­suit con­tin­ued based on this re­main­ing di­rect copy­right in­fringe­ment claim. While both par­ties col­lected ad­di­tional ev­i­dence through the dis­cov­ery process, it re­mained un­clear what de­fense Meta would use. Until now.

Last week, Meta served a sup­ple­men­tal in­ter­roga­tory re­sponse at the California fed­eral court, which marks a new di­rec­tion in its de­fense. For the first time, the com­pany ar­gued that up­load­ing pi­rated books to other BitTorrent users dur­ing the tor­rent down­load process also qual­i­fies as fair use.

Meta’s rea­son­ing is straight­for­ward. Anyone who uses BitTorrent to trans­fer files au­to­mat­i­cally up­loads con­tent to other peo­ple, as it is in­her­ent to the pro­to­col. In other words, the up­load­ing was­n’t a choice, it was sim­ply how the tech­nol­ogy works.

Meta also ar­gued that the BitTorrent shar­ing was a ne­ces­sity to get the valu­able (but pi­rated) data. In the case of Anna’s Archive, Meta said, the datasets were only avail­able in bulk through tor­rent down­loads, mak­ing BitTorrent the only prac­ti­cal op­tion.

Meta used BitTorrent be­cause it was a more ef­fi­cient and re­li­able means of ob­tain­ing the datasets, and in the case of Anna’s Archive, those datasets were only avail­able in bulk through tor­rent down­loads,” Meta’s at­tor­ney writes.

Accordingly, to the ex­tent Plaintiffs can come forth with ev­i­dence that their works or por­tions thereof were the­o­ret­i­cally made avail­able’ to oth­ers on the BitTorrent net­work dur­ing the tor­rent down­load process, this was part-and-par­cel of the down­load of Plaintiffs’ works in fur­ther­ance of Meta’s trans­for­ma­tive fair use pur­pose.”

In other words, ob­tain­ing the mil­lions of books that were needed to en­gage in the fair use train­ing of its LLM, re­quired the di­rect down­load­ing, which ul­ti­mately serves the same fair use pur­pose.

The au­thors were not happy with last week’s late Friday sub­mis­sion and the new de­fense. On Monday morn­ing, their lawyers filed a let­ter with Judge Vince Chhabria flag­ging the late-night fil­ing as an im­proper end-run around the dis­cov­ery dead­line.

They point out that Meta had been aware of the up­load­ing claims since November 2024, but that it never brought up this fair use de­fense in the past, not even when the court asked about it.

The let­ter specif­i­cally men­tions that while Meta has a continuing duty” to sup­ple­ment dis­cov­ery un­der Rule 26(e), this rule does not cre­ate a loophole” al­low­ing a party to add new de­fenses to its ad­van­tage af­ter a court dead­line has passed.

Meta (for un­der­stand­able rea­sons) never once sug­gested it would as­sert a fair use de­fense to the up­load­ing-based claims, in­clud­ing af­ter this Court raised the is­sue with Meta last November,” the lawyers write.

Meta’s le­gal team fired back the fol­low­ing day, fil­ing their own let­ter with Judge Chhabria. This let­ter ex­plains that the fair use ar­gu­ment for the di­rect copy­right in­fringe­ment claim is not new at all.

Meta pointed to the par­ties’ joint December 2025 case man­age­ment state­ment, in which it had ex­plic­itly flagged the de­fense, and noted that the au­thor’s own at­tor­ney had ad­dressed it at a court hear­ing days later.

In short, Plaintiffs’ as­ser­tion that Meta never once sug­gested it would as­sert a fair use de­fense to the up­load­ing-based claims, in­clud­ing af­ter’ the November 2025 hear­ing, is false” Meta’s at­tor­ney writes in the let­ter.

Meanwhile, it’s worth not­ing that Meta’s in­ter­roga­tory re­sponse also cites de­po­si­tion tes­ti­mony from the au­thors them­selves, us­ing their own words to bol­ster its fair use de­fense.

The com­pany notes that every named au­thor has ad­mit­ted they are un­aware of any Meta model out­put that repli­cates con­tent from their books. Sarah Silverman, when asked whether it mat­tered if Meta’s mod­els never out­put lan­guage from her book, tes­ti­fied that It does­n’t mat­ter at all.”

Meta ar­gues these ad­mis­sions un­der­cut any the­ory of mar­ket harm. If the au­thors them­selves can­not point to in­fring­ing out­put or lost sales, the law­suit is less about pro­tect­ing their books and more about chal­leng­ing the train­ing process it­self, which the court al­ready ruled was fair use.

These ad­mis­sions were cen­tral to Meta’s fair use de­fense on the train­ing claims, which Meta won last sum­mer. Whether they carry the same weight in the re­main­ing BitTorrent dis­tri­b­u­tion dis­pute has yet to be seen.

In its in­ter­roga­tory re­sponse, Meta added fur­ther weight by stress­ing that its in­vest­ment in AI has helped the U. S. to es­tab­lish U.S. global lead­er­ship, putting the coun­try ahead of geopo­lit­i­cal com­peti­tors. That’s a valu­able as­set worth trea­sur­ing, it in­di­rectly sug­gested.

As the case moves for­ward, Judge Chhabria will have to de­cide whether to al­low this fair use by tech­ni­cal ne­ces­sity” de­fense. Needless to say, this will be of vi­tal im­por­tance to this and many other AI law­suits, where the use of shadow li­braries is at stake.

For now, the BitTorrent dis­tri­b­u­tion claims re­main the last live piece of a law­suit filed in 2023. Whether Judge Chhabria will al­low Meta’s new de­fense to pro­ceed has yet to be seen.

A copy of Meta’s sup­ple­men­tal in­ter­roga­tory re­sponse is avail­able here (pdf). The au­thors’ let­ter to Judge Chhabria can be found here (pdf). Meta’s re­sponse to that let­ter is avail­able here (pdf).

...

Read the original on torrentfreak.com »

3 345 shares, 12 trendiness

this css proves me human

Capitalization is the first wound. It hurts less than I thought it would. The words spill out cap­i­tal­ized, so I must find an­other way. cat post.md | tr A-Z a-z | sponge post.md is too crude a tool, and my blocks of code must re­main in­vi­o­late. Careful tar­get­ing of text-trans­form: low­er­case is enough.

Em dashes. Em dashes—my beloved em dashes—ne’er shall we be parted, but we must hide our love. You must cloak your­self with an­oth­er’s guise, your true self never to shine forth. uv run rewrite_­font.py is too easy to type for what it does to your beau­ti­ful glyph.

Monospace? No. My heart still aches af­ter the last vi­o­la­tion. Monospace would cheapen it.

To in­ten­tion­ally mis­spell a word makes me [sic], but it must be done. their/​there, its/​it’s, your/​you’re? Too gauche. Definately? Absolutely not. lead/​lede, dis­crete/​dis­creet, or com­ple­ment/​com­pli­ment are hard to con­tem­plate, but I’ve gone too far to stop. The Norvig corps taught me the path, so I rip out the u” it points me to with a quick jerk.

The fi­nal cut I con­tem­plate is the deep­est. Writing style? How do I change my style?

My writ­ing is­n’t sim­ply how I ap­pear—it’s how I think, rea­son, and en­gage with the world. It’s not merely a mask—it’s my face. Not a fa­cade; load-bear­ing.

My foot wa­vers over the abyss, the next step the one where I will lose my­self. It’s not just a sin­gle foot­fall, it’s the only one that truly mat­ters.

Here’s your blog post writ­ten in a styl­ized way that will ap­peal to highly tech­ni­cal read­ers. Is there any­thing else I can help you with?

...

Read the original on will-keleher.com »

4 333 shares, 16 trendiness

add API to generate and parse UUID · Issue #62026 · golang/go

I would like to sug­gest the ad­di­tion to the stan­dard li­brary of a pack­age to gen­er­ate and parse UUID iden­ti­fiers, specif­i­cally ver­sions 3, 4 and 5.

The main rea­son I see to in­clude it is that the most pop­u­lar 3rd-party pack­age (github.com/​google/​uuid) is a sta­ple im­port in every server/​db based Go pro­gram, as con­firmed by a quick Github code search.

* The in­ter­face ex­posed by github.com/​google/​uuid has been sta­ble for years.

Would like to point out how Go is rather the ex­cep­tion than the norm with re­gards to in­clud­ing UUID sup­port in its stan­dard li­brary.

...

Read the original on github.com »

5 325 shares, 32 trendiness

Ki Editor

Bridge the gap be­tween cod­ing in­tent and ac­tion: ma­nip­u­late syn­tax struc­tures di­rectly, avoid­ing mouse or key­board gym­nas­tics. Amplify your cod­ing ef­fi­ciency: wield mul­ti­ple cur­sors for par­al­lel syn­tax node op­er­a­tions, rev­o­lu­tion­iz­ing bulk ed­its and refac­tor­ing.Se­lec­tion Modes stan­dard­ize move­ments across words, lines, syn­tax nodes, and more, of­fer­ing un­prece­dented flex­i­bil­ity and con­sis­tency.

...

Read the original on ki-editor.org »

6 190 shares, 18 trendiness

US economy sheds 92,000 jobs in February in sharp slide

WorldIn the cen­tre of the storm: what does the Iran war mean for Dubai?Trump’s war on Iran is spread­ing. Where does it stop?Iran warns it will hit US bases across re­gion hours af­ter pres­i­den­t’s apol­ogy USTrump’s war on Iran is spread­ing. Where does it stop?Iran warns it will hit US bases across re­gion hours af­ter pres­i­den­t’s apol­ogy US draws up strict new AI guide­lines amid Anthropic clashRus­sia is help­ing Iran to tar­get US mil­i­tary as­sets in Middle EastCompaniesIn the cen­tre of the storm: what does the Iran war mean for Dubai?Is the night­mare sce­nario for global en­ergy here?US draws up strict new AI guide­lines amid Anthropic clash­In­vestors are not ready for a true shock­TechUS draws up strict new AI guide­lines amid Anthropic clash­Google gives CEO Sundar Pichai new pay deal worth up to $692mnMarketsIs the night­mare sce­nario for global en­ergy here?In­vestors are not ready for a true shockBri­tain is now the home of the Middle ManOpinionIs the night­mare sce­nario for global en­ergy here?In­vestors are not ready for a true shock­Why Trump won’t clean up his own mess­Work & CareersGoogle gives CEO Sundar Pichai new pay deal worth up to $692mnPapier founder: I don’t own stocks or shares — it’s too much risk’Are you fi­nan­cially prepped’ for higher in­fla­tion? Women in the pub­lic eye keep watch as on­line at­tacks in­ten­sify Life & ArtsTrump’s war on Iran is spread­ing. Where does it stop?Marinelli: my 15-year quest to ski the biggest face in the Alps How To Spend It

US econ­omy sheds 92,000 jobs in February in sharp slide per month. Complete dig­i­tal ac­cess to qual­ity FT jour­nal­ism on any de­vice. Cancel any­time dur­ing your trial. Essential dig­i­tal ac­cess to qual­ity FT jour­nal­ism on any de­vice. Pay a year up­front and save 20%.Complete dig­i­tal ac­cess to qual­ity FT jour­nal­ism with ex­pert analy­sis from in­dus­try lead­ers. Pay a year up­front and save 20%.Check whether you al­ready have ac­cess via your uni­ver­sity or or­gan­i­sa­tion.Dis­cover all the plans cur­rently avail­able in your coun­try­See why over a mil­lion read­ers pay to read the Financial Times.Find out why

How To Spend It

...

Read the original on www.ft.com »

7 166 shares, 11 trendiness

Open-Sourcing Sarvam 30B and 105B

We’re re­leas­ing Sarvam 30B and Sarvam 105B as open-source mod­els. Both are rea­son­ing mod­els trained from scratch on large-scale, high-qual­ity datasets cu­rated in-house across every stage of train­ing: pre-train­ing, su­per­vised fine-tun­ing, and re­in­force­ment learn­ing. Training was con­ducted en­tirely in India on com­pute pro­vided un­der the IndiaAI mis­sion.

These mod­els rep­re­sent a true full-stack ef­fort. Beyond datasets, we op­ti­mized to­k­eniza­tion, model ar­chi­tec­ture, ex­e­cu­tion ker­nels, sched­ul­ing, and in­fer­ence sys­tems to make de­ploy­ment ef­fi­cient across a wide range of hard­ware, from flag­ship GPUs to per­sonal de­vices like lap­tops. Both mod­els are al­ready in pro­duc­tion. Sarvam 30B pow­ers Samvaad, our con­ver­sa­tional agent plat­form. Sarvam 105B pow­ers Indus, our AI as­sis­tant built for com­plex rea­son­ing and agen­tic work­flows. The Sarvam mod­els are glob­ally com­pet­i­tive for their class. Sarvam 105B per­forms well on rea­son­ing, pro­gram­ming, and agen­tic tasks across a wide range of bench­marks. Sarvam 30B is op­ti­mized for real-time de­ploy­ment, with strong per­for­mance on real-world con­ver­sa­tional use cases. Both mod­els achieve state-of-the-art re­sults on Indian lan­guage bench­marks, out­per­form­ing mod­els sig­nif­i­cantly larger in size.This re­lease marks an im­por­tant mile­stone for Sarvam. Building these mod­els re­quired de­vel­op­ing end-to-end ca­pa­bil­ity across data, train­ing, in­fer­ence, and prod­uct de­ploy­ment. With that foun­da­tion in place, we are ready to scale to sig­nif­i­cantly larger and more ca­pa­ble mod­els, in­clud­ing mod­els spe­cialised for cod­ing, agen­tic, and mul­ti­modal con­ver­sa­tional tasks.You can ex­pe­ri­ence Sarvam 105B is avail­able on Indus. Both mod­els are ac­ces­si­ble via our API at the API dash­board. Weights can be down­loaded from AI Kosh (30B, 105B) and Hugging Face (30B, 105B). If you want to run in­fer­ence lo­cally with Transformers, vLLM, and SGLang, please re­fer the Hugging Face mod­els page for sam­ple im­ple­men­ta­tions.Both mod­els share a com­mon ar­chi­tec­tural prin­ci­ple: high-ca­pac­ity rea­son­ing with ef­fi­cient train­ing and de­ploy­ment. At the core is a Mixture-of-Experts (MoE) Transformer back­bone that uses sparse ex­pert rout­ing to scale pa­ra­me­ter count with­out in­creas­ing the com­pute re­quired per to­ken, while keep­ing in­fer­ence costs prac­ti­cal. The ar­chi­tec­ture sup­ports long-con­text in­puts through ro­tary po­si­tional em­bed­dings, RMSNorm-based sta­bi­liza­tion, and at­ten­tion de­signs op­ti­mized for ef­fi­cient KV-cache us­age dur­ing in­fer­ence.While the two mod­els share the same de­sign phi­los­o­phy , they dif­fer in scale and at­ten­tion mech­a­nism. Sarvam 30B uses Grouped Query Attention (GQA) to re­duce KV-cache mem­ory while main­tain­ing strong per­for­mance. Sarvam 105B ex­tends the ar­chi­tec­ture with greater depth and Multi-head Latent Attention (MLA), a com­pressed at­ten­tion for­mu­la­tion that fur­ther re­duces mem­ory re­quire­ments for long-con­text in­fer­ence.Both mod­els use sparse ex­pert feed­for­ward lay­ers with 128 ex­perts, but dif­fer in ex­pert ca­pac­ity and rout­ing con­fig­u­ra­tion. This al­lows the larger model to scale to higher to­tal pa­ra­me­ters while keep­ing ac­tive com­pute bounded.All stages of the train­ing pipeline were de­vel­oped and ex­e­cuted in-house. This in­cludes the model ar­chi­tec­ture, data cu­ra­tion and syn­the­sis pipelines, rea­son­ing su­per­vi­sion frame­works, and re­in­force­ment learn­ing in­fra­struc­ture. Building every­thing from scratch gave us di­rect con­trol over data qual­ity, train­ing dy­nam­ics, and ca­pa­bil­ity de­vel­op­ment across every stage of train­ing, which is a core re­quire­ment for a sov­er­eign stack.Our 30B and 105B mod­els were trained on large datasets, with 16T to­kens for the 30B and 12T to­kens for the 105B. The pre-train­ing data spans code, gen­eral web data, spe­cial­ized knowl­edge cor­pora, math­e­mat­ics, and mul­ti­lin­gual con­tent. After mul­ti­ple ab­la­tions, the fi­nal train­ing mix­ture was bal­anced to em­pha­size rea­son­ing, fac­tual ground­ing, and soft­ware ca­pa­bil­i­ties. We in­vested sig­nif­i­cantly in syn­thetic data gen­er­a­tion pipelines across all cat­e­gories. The mul­ti­lin­gual cor­pus al­lo­cates a sub­stan­tial por­tion of the train­ing bud­get to the 10 most-spo­ken Indian lan­guages.Pre-train­ing was con­ducted in three phases, cov­er­ing long-hori­zon pre-train­ing, mid-train­ing, and a long-con­text ex­ten­sion phase. We used sig­moid-based rout­ing scores rather than tra­di­tional soft­max gat­ing, which im­proves ex­pert load bal­anc­ing and re­duces rout­ing col­lapse dur­ing train­ing. An ex­pert-bias term sta­bi­lizes rout­ing dy­nam­ics and en­cour­ages more uni­form ex­pert uti­liza­tion across train­ing steps. We ob­served that the 105B model achieved bench­mark su­pe­ri­or­ity over the 30B re­mark­ably early in train­ing, sug­gest­ing ef­fi­cient scal­ing be­hav­ior.Dur­ing su­per­vised fine-tun­ing, the model is trained on a large cor­pus of high-qual­ity prompts cu­rated for dif­fi­culty, qual­ity, and do­main di­ver­sity. Prompts are sourced from open datasets and la­beled us­ing cus­tom mod­els to iden­tify do­mains and an­a­lyze dis­tri­b­u­tion cov­er­age. To ad­dress gaps in un­der­rep­re­sented or low-dif­fi­culty ar­eas, ad­di­tional prompts are syn­thet­i­cally gen­er­ated based on the pre-train­ing do­main mix­ture. Empirical analy­sis showed that most pub­licly avail­able datasets are dom­i­nated by low-qual­ity, ho­mo­ge­neous, and easy prompts, which lim­its con­tin­ued learn­ing. To mit­i­gate this, we in­vested sig­nif­i­cant ef­fort in build­ing high-qual­ity prompts across do­mains. All cor­re­spond­ing com­ple­tions are pro­duced in­ter­nally and passed through rig­or­ous qual­ity fil­ter­ing. The dataset also in­cludes ex­ten­sive agen­tic traces gen­er­ated from both sim­u­lated en­vi­ron­ments and real-world repos­i­to­ries, en­abling the model to learn tool in­ter­ac­tion, en­vi­ron­ment rea­son­ing, and multi-step de­ci­sion mak­ing.For safety fine-tun­ing, we de­vel­oped a dataset cov­er­ing both stan­dard and India-specific risk sce­nar­ios. This ef­fort was guided by a uni­fied tax­on­omy and an in­ter­nal model spec­i­fi­ca­tion in­spired by pub­lic fron­tier model con­sti­tu­tions. To sur­face and ad­dress chal­leng­ing fail­ure modes, the dataset was fur­ther aug­mented with ad­ver­sar­ial and jail­break-style prompts mined through au­to­mated red-team­ing. These prompts were paired with pol­icy-aligned, safe com­ple­tions for su­per­vised train­ing.The re­in­force­ment learn­ing stage uses a large and di­verse prompt dis­tri­b­u­tion span­ning math­e­mat­ics, cod­ing, STEM rea­son­ing, web search, and tool us­age across both sin­gle-turn and multi-turn en­vi­ron­ments. Rewards are de­rived from a com­bi­na­tion of ver­i­fi­able sig­nals, such as cor­rect­ness checks and ex­e­cu­tion re­sults, and rubric-based eval­u­a­tions that as­sess in­struc­tion ad­her­ence, for­mat­ting, re­sponse struc­ture, and over­all qual­ity. To main­tain an ef­fec­tive learn­ing cur­ricu­lum, prompts are pre-fil­tered us­ing open-source mod­els and early check­points to re­move tasks that are ei­ther triv­ially solv­able or con­sis­tently un­solved. During train­ing, an adap­tive sam­pling mech­a­nism dy­nam­i­cally al­lo­cates roll­outs based on an in­for­ma­tion-gain met­ric de­rived from the cur­rent pass rate of each prompt. Under a fixed gen­er­a­tion bud­get, roll­out al­lo­ca­tion is for­mu­lated as a knap­sack-style op­ti­miza­tion, con­cen­trat­ing com­pute on tasks near the mod­el’s ca­pa­bil­ity fron­tier where learn­ing sig­nal is strongest.The RL sys­tem is im­ple­mented with an asyn­chro­nous GRPO ar­chi­tec­ture that de­cou­ples gen­er­a­tion, re­ward com­pu­ta­tion, and pol­icy up­dates, en­abling ef­fi­cient large-scale train­ing while main­tain­ing high GPU uti­liza­tion. Trajectory stal­e­ness is con­trolled by lim­it­ing the age of sam­pled tra­jec­to­ries rel­a­tive to pol­icy up­dates, bal­anc­ing through­put with train­ing sta­bil­ity. The sys­tem omits KL-divergence reg­u­lar­iza­tion against a ref­er­ence model, avoid­ing the op­ti­miza­tion con­flict be­tween re­ward max­i­miza­tion and pol­icy an­chor­ing. Policy op­ti­miza­tion in­stead uses a cus­tom group-rel­a­tive ob­jec­tive in­spired by CISPO, which im­proves sta­bil­ity over stan­dard clipped sur­ro­gate meth­ods. Reward shap­ing fur­ther en­cour­ages struc­tured rea­son­ing, con­cise re­sponses, and cor­rect tool us­age, pro­duc­ing a sta­ble RL pipeline suit­able for large-scale MoE train­ing with con­sis­tent learn­ing and no ev­i­dence of re­ward col­lapse.Sar­vam 105B matches or out­per­forms most open and closed-source fron­tier mod­els of its class across knowl­edge, rea­son­ing, and agen­tic bench­marks. On Indian lan­guage bench­marks, it sig­nif­i­cantly out­per­forms all mod­els we eval­u­ated.Sar­vam 105B shows strong, bal­anced per­for­mance across core ca­pa­bil­i­ties in­clud­ing math­e­mat­ics, cod­ing, knowl­edge, and in­struc­tion fol­low­ing. It achieves 98.6 on Math500, match­ing the top mod­els in the com­par­i­son, and 71.7 on LiveCodeBench v6, out­per­form­ing most com­peti­tors on real-world cod­ing tasks. On knowl­edge bench­marks, it scores 90.6 on MMLU and 81.7 on MMLU Pro, re­main­ing com­pet­i­tive with fron­tier-class sys­tems. With 84.8 on IF Eval, the model demon­strates a well-rounded ca­pa­bil­ity pro­file across the ma­jor work­loads ex­pected of mod­ern lan­guage mod­els.Sar­vam 105B per­forms strongly on multi-step rea­son­ing bench­marks, re­flect­ing the train­ing em­pha­sis on com­plex prob­lem solv­ing. On AIME 25, the model achieves 88.3 Pass@1, im­prov­ing to 96.7 with tool use, in­di­cat­ing ef­fec­tive in­te­gra­tion be­tween rea­son­ing and ex­ter­nal tools. It scores 78.7 on GPQA Diamond and 85.8 on HMMT, out­per­form­ing sev­eral com­pa­ra­ble mod­els on both. On Beyond AIME (69.1), which re­quires deeper rea­son­ing chains and harder math­e­mat­i­cal de­com­po­si­tion, the model leads or matches the com­par­i­son set. Taken to­gether, these re­sults re­flect con­sis­tent strength in sus­tained rea­son­ing and dif­fi­cult prob­lem-solv­ing tasks.Sar­vam 105B is op­ti­mized for agen­tic work­loads in­volv­ing tool use, long-hori­zon rea­son­ing, and en­vi­ron­ment in­ter­ac­tion. This is re­flected in strong re­sults on bench­marks de­signed to ap­prox­i­mate real-world work­flows. On BrowseComp, the model achieves 49.5, out­per­form­ing sev­eral com­peti­tors on web-search-dri­ven tasks. On Tau2 (avg.), a bench­mark mea­sur­ing long-hori­zon agen­tic rea­son­ing and task com­ple­tion, it achieves 68.3, the high­est score among the com­pared mod­els. These re­sults in­di­cate that the model can ef­fec­tively plan, re­trieve in­for­ma­tion, and main­tain co­her­ent rea­son­ing across ex­tended multi-step in­ter­ac­tions.A use­ful com­par­i­son is within the same scal­ing regime, since train­ing com­pute, dataset size, and in­fra­struc­ture scale in­crease dra­mat­i­cally with each gen­er­a­tion of fron­tier mod­els. The newest mod­els from other labs are trained with sig­nif­i­cantly larger clus­ters and bud­gets. Across a range of pre­vi­ous-gen­er­a­tion mod­els that are sub­stan­tially larger, Sarvam 105B re­mains com­pet­i­tive. We have now es­tab­lished the ef­fec­tive­ness of our train­ing and data pipelines, and will scale train­ing to sig­nif­i­cantly larger model sizes.Sar­vam 30B is de­signed as an ef­fi­cient rea­son­ing model for prac­ti­cal de­ploy­ment, com­bin­ing strong ca­pa­bil­ity with low ac­tive com­pute. With only 2.4B ac­tive pa­ra­me­ters, it per­forms com­pet­i­tively with much larger dense and MoE mod­els across a wide range of bench­marks. The eval­u­a­tions be­low high­light its strengths across gen­eral ca­pa­bil­ity, multi-step rea­son­ing, and agen­tic tasks, in­di­cat­ing that the model de­liv­ers strong real-world per­for­mance while re­main­ing ef­fi­cient to run.Sar­vam 30B — All Benchmarks (Gemma and Mistral are com­pared for com­plete­ness. Since they are not rea­son­ing or agen­tic mod­els, cor­re­spond­ing cells are left empty)Sar­vam 30B per­forms strongly across core lan­guage mod­el­ing tasks, par­tic­u­larly in math­e­mat­ics, cod­ing, and knowl­edge bench­marks. It achieves 97.0 on Math500, match­ing or ex­ceed­ing sev­eral larger mod­els in its class. On cod­ing bench­marks, it scores 92.1 on HumanEval and 92.7 on MBPP, and 70.0 on LiveCodeBench v6, out­per­form­ing many sim­i­larly sized mod­els on prac­ti­cal cod­ing tasks. On knowl­edge bench­marks, it scores 85.1 on MMLU and 80.0 on MMLU Pro, re­main­ing com­pet­i­tive with other lead­ing open mod­els.Sar­vam 30B per­forms strongly on multi-step rea­son­ing bench­marks, re­flect­ing its abil­ity to han­dle com­plex log­i­cal and math­e­mat­i­cal prob­lems. On AIME 25, it achieves 88.3 Pass@1, im­prov­ing to 96.7 with tool use, in­di­cat­ing ef­fec­tive in­te­gra­tion be­tween rea­son­ing and ex­ter­nal tools. It scores 66.5 on GPQA Diamond and per­forms well on chal­leng­ing math­e­mat­i­cal bench­marks in­clud­ing HMMT Feb 2025 (73.3) and HMMT Nov 2025 (74.2). On Beyond AIME (58.3), the model re­mains com­pet­i­tive with larger mod­els. Taken to­gether, these re­sults in­di­cate that Sarvam 30B sus­tains deep rea­son­ing chains and ex­pert-level prob­lem solv­ing, sig­nif­i­cantly ex­ceed­ing typ­i­cal ex­pec­ta­tions for mod­els with sim­i­lar ac­tive com­pute.Sar­vam 30B sup­ports na­tive tool call­ing and per­forms con­sis­tently on bench­marks de­signed to eval­u­ate agen­tic work­flows in­volv­ing plan­ning, re­trieval, and multi-step task ex­e­cu­tion. On BrowseComp, it achieves 35.5, out­per­form­ing sev­eral com­pa­ra­ble mod­els on web-search-dri­ven tasks. On Tau2 (avg.), it achieves 45.7, in­di­cat­ing re­li­able per­for­mance across ex­tended in­ter­ac­tions. SWE-Bench Verified re­mains chal­leng­ing across mod­els; Sarvam 30B shows com­pet­i­tive per­for­mance within its class. Taken to­gether, these re­sults in­di­cate that the model is well suited for real-world agen­tic de­ploy­ments re­quir­ing ef­fi­cient tool use and struc­tured task ex­e­cu­tion, par­tic­u­larly in pro­duc­tion en­vi­ron­ments where in­fer­ence ef­fi­ciency is crit­i­cal.To eval­u­ate Indian lan­guage ca­pa­bil­i­ties, we de­vel­oped a new bench­mark us­ing a pair­wise com­par­i­son frame­work with an LLM-as-judge pro­to­col. A key goal of this bench­mark is to re­flect how lan­guage is ac­tu­ally used in India to­day. This means eval­u­at­ing each lan­guage in two script styles, na­tive script rep­re­sent­ing for­mal writ­ten us­age and ro­man­ized Latin script rep­re­sent­ing col­lo­quial us­age com­monly seen in mes­sag­ing and on­line com­mu­ni­ca­tion.The bench­mark is or­ga­nized into four do­mains: gen­eral chat, STEM, math­e­mat­ics, and cod­ing. It orig­i­nates from 110 English source prompts, with 50 cov­er­ing gen­eral chat and 20 each for STEM, math­e­mat­ics, and cod­ing. Each prompt is trans­lated into 22 sched­uled Indian lan­guages and pro­vided in both na­tive and ro­man­ized script. Evaluating cor­rect­ness for com­plex rea­son­ing prompts di­rectly in low-re­source lan­guages can be noisy and in­con­sis­tent. To ad­dress this, we gen­er­ated high-qual­ity ref­er­ence an­swers in English us­ing Claude Opus 4, which are used only to eval­u­ate the use­ful­ness di­men­sion, cov­er­ing rel­e­vance, com­plete­ness, and cor­rect­ness, for an­swers gen­er­ated in Indian lan­guages.The eval­u­a­tion uses a pair­wise com­par­i­son method­ol­ogy with Gemini 3 as the judge model. The judge eval­u­ates re­sponses across four di­men­sions: flu­ency, lan­guage/​script cor­rect­ness, use­ful­ness, and ver­bosity. The eval­u­a­tion dataset and cor­re­spond­ing prompts are avail­able here.Sar­vam 105B wins on av­er­age 90% across all bench­marked di­men­sions and on av­er­age 84% on STEM. math, and cod­ing.Sar­vam 30B wins on av­er­age 89% of com­par­isons across all bench­marked di­men­sions and 87% on STEM, math­e­mat­ics, and cod­ing.The Sarvam to­k­enizer is op­ti­mized for ef­fi­cient to­k­eniza­tion across all 22 sched­uled Indian lan­guages, span­ning 12 dif­fer­ent scripts, di­rectly re­duc­ing the cost and la­tency of serv­ing in Indian lan­guages. It out­per­forms other open-source to­k­eniz­ers in en­cod­ing Indic text ef­fi­ciently, as mea­sured by the fer­til­ity score, which is the av­er­age num­ber of to­kens re­quired to rep­re­sent a word. It is sig­nif­i­cantly more ef­fi­cient for low-re­source lan­guages such as Odia, Santali, and Manipuri (Meitei) com­pared to other to­k­eniz­ers. The chart be­low shows the av­er­age fer­til­ity of var­i­ous to­k­eniz­ers across English and all 22 sched­uled lan­guages.Sar­vam 30B was built with an in­fer­ence op­ti­miza­tion stack de­signed to max­i­mize through­put across de­ploy­ment tiers, from flag­ship data-cen­ter GPUs to de­vel­oper lap­tops. Rather than re­ly­ing on stan­dard serv­ing im­ple­men­ta­tions, the in­fer­ence pipeline was re­built us­ing ar­chi­tec­ture-aware fused ker­nels, op­ti­mized sched­ul­ing, and dis­ag­gre­gated serv­ing.Mi­crosec­ond-level pro­fil­ing of the ex­e­cu­tion stack iden­ti­fied mem­ory stalls, ker­nel launch over­head, and in­ef­fi­cient sched­ul­ing as pri­mary bot­tle­necks. Addressing these yielded sub­stan­tial through­put im­prove­ments across all hard­ware classes and se­quence lengths. The op­ti­miza­tion strat­egy fo­cuses on three key com­po­nents.Ker­nel-level rewrites us­ing fused at­ten­tion and mat­mul pipelines tai­lored for each hard­ware tar­ge­tAd­vanced sched­ul­ing and batch­ing strate­gies that im­prove GPU uti­liza­tion un­der re­al­is­tic multi-user loads­Dis­ag­gre­gated serv­ing pipelines that re­move bot­tle­necks be­tween pre­fill and de­code stages­These op­ti­miza­tions yield sig­nif­i­cantly higher to­kens per sec­ond per GPU at the same la­tency tar­gets, en­abling higher user con­cur­rency and lower in­fra­struc­ture costs.On H100-class in­fra­struc­ture, Sarvam 30B achieves sub­stan­tially higher through­put per GPU across all se­quence lengths and re­quest rates com­pared to the Qwen3 base­line, con­sis­tently de­liv­er­ing 3x to 6x higher through­put per GPU at equiv­a­lent to­kens per sec­ond per user op­er­at­ing points.Sar­vam 30B runs ef­fi­ciently on mid-tier ac­cel­er­a­tors such as L40S, en­abling pro­duc­tion de­ploy­ments with­out re­ly­ing on pre­mium GPUs. Under tighter com­pute and mem­ory band­width con­straints, the op­ti­mized ker­nels and sched­ul­ing strate­gies de­liver 1.5x to 3x through­put im­prove­ments at typ­i­cal op­er­at­ing points. The im­prove­ments are more pro­nounced at longer in­put and out­put se­quence lengths (28K / 4K), where most real-world in­fer­ence re­quests fall.Sar­vam 30B is also op­ti­mized for lo­cal ex­e­cu­tion on Apple Silicon sys­tems us­ing MXFP4 mixed-pre­ci­sion in­fer­ence. On MacBook Pro M3, the op­ti­mized run­time achieves 20 to 40% higher to­ken through­put across com­mon se­quence lengths. These im­prove­ments make lo­cal ex­per­i­men­ta­tion sig­nif­i­cantly more re­spon­sive and en­able light­weight edge de­ploy­ments with­out re­quir­ing ded­i­cated ac­cel­er­a­tors.Sar­vam 105B is op­ti­mized for server-cen­tric hard­ware, fol­low­ing a sim­i­lar process to the one de­scribed above with spe­cial fo­cus on MLA (Multi-head Latent Attention) op­ti­miza­tions. These in­clude cus­tom shaped MLA op­ti­miza­tion, vo­cab­u­lary par­al­lelism, ad­vanced sched­ul­ing strate­gies, and dis­ag­gre­gated serv­ing. The com­par­isons above il­lus­trate the per­for­mance ad­van­tage across var­i­ous in­put and out­put sizes on an H100 node.Com­bined with the ef­fi­cient Indic to­k­enizer, the per­for­mance delta in­creases sig­nif­i­cantly for the same SLA. For the 30B model, the delta in­creases by as much as 10x, reach­ing per­for­mance lev­els pre­vi­ously not achiev­able for mod­els of this class on Indic gen­er­a­tion.The fol­low­ing demon­stra­tions show the prac­ti­cal ca­pa­bil­i­ties of the Sarvam model fam­ily across real-world ap­pli­ca­tions, span­ning web­page gen­er­a­tion, mul­ti­lin­gual con­ver­sa­tional agents, com­plex STEM prob­lem solv­ing, and ed­u­ca­tional tu­tor­ing. The ex­am­ples re­flect the mod­els’ strengths in rea­son­ing, tool us­age, mul­ti­lin­gual un­der­stand­ing, and end-to-end task ex­e­cu­tion, and il­lus­trate how Sarvam mod­els can be in­te­grated into pro­duc­tion sys­tems to build in­ter­ac­tive ap­pli­ca­tions, in­tel­li­gent as­sis­tants, and de­vel­oper tools.The wid­gets be­low demon­strate Sarvam 105Bs agen­tic ca­pa­bil­i­ties through end-to-end pro­ject gen­er­a­tion us­ing a Claude Code har­ness, show­ing the mod­el’s abil­ity to build com­plete web­sites from a sim­ple prompt spec­i­fi­ca­tion.A fully in­ter­ac­tive Pokédex web app, gen­er­ated en­tirely by our 105B model from a sin­gle prompt. Search, fil­ter by type, and browse de­tailed stats.The goal was to gen­er­ate a com­plete, pro­duc­tion-ready web­page in­clud­ing all HTML, CSS, and JavaScript re­quired to run the ap­pli­ca­tion with­out frame­works or build tools. The model used the PokéAPI to dy­nam­i­cally load Pokémon data, im­ple­ment­ing pag­i­na­tion, search, fil­ter­ing, and a de­tailed modal view, all from the prompt shown be­low.A com­plete web­site land­ing page, de­signed and coded by our 105B model in a sin­gle pass. Scroll through to ex­plore the full lay­out, an­i­ma­tions, and in­ter­ac­tions.The task was to build a com­plete web­site for Sarvam, cap­tur­ing the spirit of an Indian AI com­pany build­ing for a bil­lion peo­ple while match­ing a world-class vi­sual stan­dard across ty­pog­ra­phy, mo­tion, lay­out, and in­ter­ac­tion de­sign. The full prompt is shown be­low.Sar­vam 105B was eval­u­ated on the JEE Main 2026 pa­per from Shift 2, con­ducted on 28 January 2026, to demon­strate its STEM rea­son­ing ca­pa­bil­i­ties. The ques­tion pa­per and so­lu­tions were sourced from: https://​allen.in/​jee-main/​jan­u­ary-2026-ques­tion-pa­per-with-so­lu­tion­s­The eval­u­a­tion was car­ried out in two phases:Text-Only Evaluation: For text-only ques­tions, Sarvam 105B was eval­u­ated di­rectly on ques­tions con­tain­ing purely tex­tual con­tent.Di­a­gram-Based Evaluation: For ques­tions that in­cluded di­a­grams, Gemini-3-Pro was used to gen­er­ate struc­tured tex­tual de­scrip­tions of the vi­su­als, which were then pro­vided as in­put to Sarvam 105B for an­swer gen­er­a­tion.The ta­bles be­low sum­ma­rize Sarvam 105Bs per­for­mance across Physics, Chemistry, and Mathematics un­der Pass@1 and Pass@2 eval­u­a­tion set­tings.Un­der Pass@1, the model shows strong first-at­tempt ac­cu­racy across all sub­jects. In Mathematics, it achieves a per­fect 25/25. In Chemistry, it scores 23/25, with near-per­fect per­for­mance on both text-only and di­a­gram-de­rived ques­tions. Physics shows sim­i­larly strong per­for­mance at 22/25, with most er­rors oc­cur­ring in di­a­gram-based rea­son­ing.Un­der Pass@2, per­for­mance im­proves to per­fect scores across all sub­jects. Physics im­proves from 22/25 to 25/25, Chemistry from 23/25 to 25/25, and Mathematics main­tains a per­fect 25/25. Diagram-based ques­tions in both Physics and Chemistry achieve full marks at Pass@2, in­di­cat­ing that the model re­li­ably re­solves vi­sual rea­son­ing tasks when given struc­tured tex­tual rep­re­sen­ta­tions.Tu­tor Mode is an in­ter­nal pro­ject where the Indus stack op­er­ates with a sys­tem prompt op­ti­mized for stu­dent-teacher con­ver­sa­tions. The ex­am­ple be­low shows Sarvam 105B help­ing a stu­dent solve a JEE prob­lem through in­ter­ac­tive di­a­log rather than pro­vid­ing the an­swer di­rectly. The model guides the stu­dent by ask­ing prob­ing ques­tions, build­ing to­ward the un­der­ly­ing con­cepts be­fore ar­riv­ing at the an­swer. This also demon­strates the mod­el’s role-play­ing abil­ity.Note: the ques­tions be­low are taken from the same JEE Mains pa­per solved above.the mean free path of a mol­e­cule of di­am­e­ter 5 x 10^-10 m at the tem­per­a­ture 41°c and pres­sure 1.38 x 10^5 pa, is given as ____ m. (given k_b = 1.38 x 10^-23 j/​k).

Hi there! I see you’re work­ing on a prob­lem about the mean free path of a gas mol­e­cule—that’s a clas­sic con­cept in ki­netic the­ory.

Before we dive into the math, could you let me know which grade you’re in? Also, when you hear the term mean free path,” what do you think it de­pends on? For ex­am­ple, if you imag­ine mol­e­cules in a gas, what phys­i­cal fac­tors would make it harder for a mol­e­cule to travel a long dis­tance with­out hit­ting some­thing?im not re­ally sure about the con­cepts be­hind this. im prepar­ing for jee mains and this topic al­ways con­fuses me.No wor­ries! JEE Mains prob­lems of­ten look more in­tim­i­dat­ing than they ac­tu­ally are. Let’s break it down.

The mean free path () is sim­ply the av­er­age dis­tance a mol­e­cule trav­els be­tween two suc­ces­sive col­li­sions. Think of it like walk­ing through a crowded room; how far you can get be­fore bump­ing into some­one de­pends on a few things you al­ready in­tu­itively know.

Size of mol­e­cules (ddd): Bigger mol­e­cules are eas­ier to hit.

Density/Number of mol­e­cules: More peo­ple in the room means more bumps.

Temperature (TTT) and Pressure (PPP): These dic­tate how packed the mol­e­cules are.

If you had to guess, would the dis­tance be­tween hits () be larger or smaller if the pres­sure () in­creased?i think if the pres­sure is higher, the mol­e­cules are packed tighter, so they would hit each other more of­ten. that should make the dis­tance smaller, right?Spot on! Your in­tu­ition is lead­ing you ex­actly where we need to go.

Here is how those fac­tors re­late math­e­mat­i­cally:

λ∝1d2\lambda \propto \frac{1}{d^2}λ∝d21​: If the mol­e­cule is twice as wide, it’s ac­tu­ally four times more likely to col­lide (because the area it oc­cu­pies mat­ters).

λ∝1P\lambda \propto \frac{1}{P}λ∝P1​: Higher pres­sure means mol­e­cules are squeezed to­gether, lead­ing to more fre­quent col­li­sions.

λ∝T\lambda \propto Tλ∝T: At higher tem­per­a­tures (for a fixed pres­sure), gas ex­pands and mol­e­cules move fur­ther apart.

You men­tioned know­ing . We can ac­tu­ally use that to find the for­mula for . Since we are look­ing for a for­mula in­volv­ing di­am­e­ter (), pres­sure (), and tem­per­a­ture (), let’s try to vi­su­al­ize the collision zone” first.i know pv = nrt, but i cant re­mem­ber the spe­cific for­mula for mean free path. how do we get from one to the other?That’s a great start­ing point be­cause is the heart of gas be­hav­ior!

Let’s vi­su­al­ize why a mol­e­cule col­lides. Imagine a mol­e­cule with di­am­e­ter mov­ing through space. It will hit any other mol­e­cule whose cen­ter comes within a dis­tance of its own cen­ter.

This means our mol­e­cule ef­fec­tively acts like a bulldozer” with an ef­fec­tive di­am­e­ter of . If any other mol­e­cule’s cen­ter falls within this danger zone,” a col­li­sion hap­pens.

If the ef­fec­tive col­li­sion di­am­e­ter is , what would be the cross-sec­tional area of that danger zone” cir­cle? (Recall the area of a cir­cle is ).Close! While the danger zone” di­am­e­ter is , the ac­tual ra­dius in­volved for the cen­ter-to-cen­ter hit is .

Now, imag­ine this mol­e­cule zips for­ward. It sweeps out an imag­i­nary cylin­der. Any mol­e­cule in­side this cylin­der gets hit.

In the de­riva­tion, we find that the mean free path is in­versely pro­por­tional to this area and the num­ber of mol­e­cules per unit vol­ume (). However, be­cause all mol­e­cules are mov­ing (not just one), we add a fac­tor of to ac­count for the av­er­age rel­a­tive ve­loc­ity.

This gives us the fi­nal JEE for­mula:

Do you see where the val­ues from your ques­tion (, , , and ) fit into this?oh, i see! but the ques­tion gives kb as 1.38 x 10^-23. where does that go in the cal­cu­la­tion?It fits per­fectly! The in the ques­tion is the Boltzmann con­stant, and it sits right in the nu­mer­a­tor of our for­mula:

By plug­ging the val­ues in, the units will can­cel out to give you the dis­tance in me­ters. Let’s list what we have:

Before we cal­cu­late, we must con­vert the tem­per­a­ture to Kelvin. Do you re­mem­ber how to turn Celsius into Kelvin?yes, i add 273. so 41 + 273 = 314 k. now i just plug them all in?Ex­actly! You’ve got the tem­per­a­ture right (, or for pre­ci­sion).

Now, here is a pro-tip for JEE math: look for things that can­cel out. Notice that is and is .

When you put them in the for­mula:

The in the top and bot­tom will can­cel out im­me­di­ately! That makes your life much eas­ier.

Would you like to try sim­pli­fy­ing the pow­ers of next? What do you get for the de­nom­i­na­tor’s power of when you square ( m)?i tried cal­cu­lat­ing it all and i think it sim­pli­fies to some­thing like 2.82 x 10^-8. does that mean the an­swer is op­tion c?You nailed it! Option C () is cor­rect. 🎉

When you fin­ish the cal­cu­la­tion, you get ap­prox­i­mately m. Since , then is in­deed .

Used the cor­rected mean free path for­mula .

Would you like me to find an­other prac­tice prob­lem on RMS ve­loc­ity or Graham’s Law to keep this mo­men­tum go­ing?The cod­ing ca­pa­bil­i­ties of Sarvam 30B and Sarvam 105B were eval­u­ated us­ing real-world com­pet­i­tive pro­gram­ming prob­lems from Codeforces (Div3, link). The eval­u­a­tion in­volved gen­er­at­ing Python so­lu­tions and man­u­ally sub­mit­ting them to the Codeforces plat­form to ver­ify cor­rect­ness. Correctness is mea­sured at pass@1 and pass@4 as shown in the table be­low.An­swers are gen­er­ated us­ing the fol­low­ing sys­tem prompt, with code snip­pets ex­tracted from mark­down fences and think to­kens stripped from within tags.The Codeforces con­test used for this eval­u­a­tion took place in February 2026, while the knowl­edge cut­off of both mod­els is June 2025, mak­ing it un­likely that the mod­els had seen these ques­tions. Strong per­for­mance in this set­ting pro­vides ev­i­dence of gen­uine gen­er­al­iza­tion and real prob­lem-solv­ing ca­pa­bil­ity.Sar­vam 30B has been fine-tuned for pro­duc­tion de­ploy­ment of con­ver­sa­tional agents on Samvaad, Sarvam’s Conversational AI plat­form. Compared to mod­els of sim­i­lar size, it shows clear per­for­mance im­prove­ments in both con­ver­sa­tional qual­ity and la­tency.Key strengths in­clude strong pro­fi­ciency in Indian lan­guages, par­tic­u­larly ac­cu­rate han­dling of nu­mer­i­cal in­for­ma­tion within those lan­guages, and re­li­able ex­e­cu­tion of tool calls dur­ing mul­ti­lin­gual in­ter­ac­tions. Latency gains come from a com­bi­na­tion of fewer ac­tive pa­ra­me­ters than com­pa­ra­ble mod­els, tar­geted in­fer­ence op­ti­miza­tions, and re­duced to­k­enizer over­head.The two ex­am­ples be­low show tele­phonic con­ver­sa­tions han­dled by Sarvam 30B in Hindi and Tamil.Sarvam 105B pow­ers Indus, Sarvam’s chat ap­pli­ca­tion, op­er­at­ing with a sys­tem prompt op­ti­mized for con­ver­sa­tions. The ex­am­ple demon­strates the mod­el’s abil­ity to un­der­stand Indic queries, ex­e­cute tool calls ef­fec­tively, and rea­son ac­cu­rately. Web search is con­ducted in English to ac­cess cur­rent and com­pre­hen­sive in­for­ma­tion, while the model in­ter­prets the query and de­liv­ers a cor­rect re­sponse in Telugu.1. Top Pickleball Courts in Vijayawada near me

2. The Pickleball Republic - Siddhartha Nagar, Vijayawada

3. PickleBall Arena (@pickleballarena_vijayawada)

4. Associations Of Sports in Benz Circle, Vijayawada - Justdial

5. Sports Venues in Benz-circle-vijayawada: Book Top …1. Buy Pickleball Equipment Paddles, Balls, Nets Online in …

2. Buy Pickleball Paddles Online at Best Prices In India

3. Pickleball Equipment

4. Buy Pickleball Paddles Online in India at Best Prices

5. Buy HEAD Pickleball Paddle at Best Price in India1. 15 Common Pickleball Errors Ruining Your Game

2. How to Play Pickleball: 9 Rules Every Beginner Should Know

3. 5 com­mon be­gin­ner mis­takes in pick­le­ball

4. Common Pickleball Mistakes: 5 Errors Beginners Make

5. How to Play Pickleball: The Ultimate Guide on Pickleball Rules1. 🏓 Play Pickleball at the Lowest Price Ever in VIJAYAWADA

2. Dink It Pickleball - Vijayawada - Guru Nanak Colony …

3. Pickleball in Vijayawada! Play at The Pickleball Republic

4. 🏓 Play Pickleball at the Lowest Price Ever in VIJAYAWADA

5. 5️⃣0️⃣0️⃣ 1 month swim­ming pool(in­clud­ing train­ing)+ …

Top Lawn Tennis Courts in Vijayawada near meSar­vam 30B and Sarvam 105B rep­re­sent a sig­nif­i­cant step in build­ing high-per­for­mance, open foun­da­tion mod­els in India. By com­bin­ing ef­fi­cient Mixture-of-Experts ar­chi­tec­tures with large-scale, high-qual­ity train­ing data and deep op­ti­miza­tion across the en­tire stack, from to­k­enizer de­sign to in­fer­ence ef­fi­ciency, both mod­els de­liver strong rea­son­ing, cod­ing, and agen­tic ca­pa­bil­i­ties while re­main­ing prac­ti­cal to de­ploy.A defin­ing strength of the Sarvam model fam­ily is its in­vest­ment in the Indian AI ecosys­tem, re­flected in strong per­for­mance across Indian lan­guages, to­k­eniza­tion op­ti­mized for di­verse scripts, and safety and eval­u­a­tion tai­lored to India-specific con­texts. Combined with Apache 2.0 open-source avail­abil­ity, these mod­els serve as foun­da­tional in­fra­struc­ture for sov­er­eign AI de­vel­op­ment.This re­lease also marks a mile­stone in in­ter­nal ca­pa­bil­i­ties. Through this ef­fort, Sarvam has de­vel­oped the know-how to build high-qual­ity datasets at scale, train large mod­els ef­fi­ciently, and achieve strong re­sults at com­pet­i­tive train­ing bud­gets. With these foun­da­tions in place, the next step is to scale fur­ther, train­ing sig­nif­i­cantly larger and more ca­pa­ble mod­els.These mod­els were trained us­ing com­pute pro­vided through the IndiaAI Mission, un­der the Ministry of Electronics and Information Technology, Government of India. Nvidia col­lab­o­rated closely on the pro­ject, con­tribut­ing li­braries used across pre-train­ing, align­ment, and serv­ing. We’re also grate­ful to the de­vel­op­ers who used ear­lier Sarvam mod­els and took the time to share feed­back. We’re open-sourc­ing these mod­els as part of our on­go­ing work to build foun­da­tional AI in­fra­struc­ture in India.

...

Read the original on www.sarvam.ai »

8 156 shares, 15 trendiness

matduggan.com

I have never been an online com­mu­nity first” per­son. The in­ter­net is how I stay in touch with peo­ple I met in real life. I’m not a tweet com­ments at celebri­ties” guy. I was never funny enough to be the fun­ni­est per­son on Twitter.

So when Twitter was ac­ci­den­tally pur­chased by a fas­cist high on ke­t­a­mine, I moved to Mastodon mostly be­cause it seemed to be Twitter with­out the bull­shit”. No rec­om­mended for you feed, no ads, it was bro­ken in a way I find charm­ing. Of course search was bro­ken be­cause all OSS so­cial tools must have one glar­ing lack of func­tion­al­ity. In a night­mare world full of con­stant change it’s good to have a few con­stants to hold on to.

A lot of the nar­ra­tive at the time was this is our flag in the ground in the fight against The Man”. It was­n’t clear in this con­text if they meant cor­po­ra­tions or the me­dia or the weird pseudo celebrity that had taken over so­cial me­dia where peo­ple would breath­lessly tell me about shit like Chris-Chan” and Logan Paul bought a Pokemon card”.

We all need point­less hob­bies, but I care about YouTube stars like I care about dis­tant stars dy­ing. It’s in­ter­est­ing to some­one some­where but those peo­ple don’t talk to me. I mostly use so­cial me­dia as a place to waste time, not a plat­form to form para-so­cial re­la­tion­ships to nar­cis­sists. I pre­fer my nar­cis­sism farm to table. I’d rather dig a grave with a rusty spoon than watch a Twitch star”.

Anyway, I watched mostly ap­a­thet­i­cally as the in­ter­net tried to rally it­self to an­other cause. I read my news at the nor­mal news­pa­pers, watched my nor­mal tele­vi­sion and put so­cial me­dia off into its own silo. Then Trump ef­fec­tively shut down the en­tire free press in the US in a se­ries of bull­shit law­suits.

See I had for­got­ten the one golden rule of cap­i­tal­ism. To thrive in cap­i­tal­ism one must be amoral. Now you can be wildly sick­en­ingly suc­cess­ful with morals but you can­not reach that ab­solute zenith of share­holder value. Either you ac­cept a lower share price and don’t com­mit atroc­i­ties or you be­come evil. There is no third op­tion.

So of course me­dia cor­po­ra­tions be­came bar­gain­ing chips for the oli­garchs’ ac­tual busi­nesses. Why fight a defama­tion suit when you can set­tle it by run­ning fa­vor­able cov­er­age and maybe bank­rupt­ing the me­dia out­let you bought as a stock­ing stuffer? Suddenly I could­n’t find any re­li­able re­port­ing about any­thing in the US. My beloved Washington Post be­came straight-up pro­pa­ganda and des­per­ate at­tempts to cope. Best win­ter stews to make while you watch your neigh­bors get kid­napped at gun­point.” Twelve dol­lars a month for that.

Threads was worth­less be­cause it’s the most bor­ing so­cial me­dia web­site ever imag­ined. It’s a so­cial me­dia net­work de­signed by brands for brands, like if some­one made a ca­ble chan­nel that was just ad­ver­tise­ments and meta com­men­tary about the ad­ver­tise­ments you just saw. Billions of dol­lars at their dis­posal and Meta made a hot new so­cial me­dia net­work with the ap­peal of junk mail.

Bluesky had a bunch of stuff” but they’re try­ing to cap­ture that 2008 Twitter light­ning in a bot­tle which is a gi­ant waste of time. We’re never go­ing to go back to pre­tend­ing that tweet­ing at politi­cians does any­thing and every­one there is des­per­ately try­ing to build a brand” as the funny one or what­ever. I want news I don’t want your end­less meta com­men­tary on the news.

People talk a lot about the pro­to­cols that power Bluesky vs. ActivityPub, be­cause we’re nerds and we be­lieve deep in our hearts that the su­pe­rior pro­to­col will win. This is adorable. It flies in the face of lit­er­ally all of hu­man his­tory, where the more con­ve­nient thing al­ways wins re­gard­less of tech­ni­cal merit. VHS beat Betamax. USB-C took twenty years. The pro­to­col fight is in­ter­est­ing the way me­dieval siege war­fare is in­ter­est­ing — I’m glad some­one’s into it, but it has no bear­ing on my life. There’s no ac­tual plan to self-host Bluesky. Their pro­to­col makes it eas­ier to scale their ser­vice. That’s why it was writ­ten and that’s what it does. End of story.

Now EU news re­mained re­li­able, but send­ing European re­porters into the mad­ness of the US and try­ing to get a report” out of it is an ex­er­cise in frus­tra­tion. This be­came es­pe­cially rel­e­vant for me when Trump threat­ened to in­vade Greenland and sud­denly there was a dis­tinct pos­si­bil­ity that there might be an armed con­flict be­tween Denmark and the US. Danish re­porters weren’t get­ting meet­ings with the right peo­ple and it was just end­less ru­mors and Truth Social non­sense.

If the American press had given me 20 min­utes of air­time I could have con­vinced every­one they don’t want to get in­volved with Greenland. We’re not tough enough as a peo­ple to sur­vive in Greenland, much less take it over”. Greenlandic peo­ple shrug off hor­rific in­juries hun­dreds of kilo­me­ters from med­ical help with a smile. I watched a Greenlandic tod­dler munch meat from the spine of a seal with its head very much in­tact. We aren’t equipped to fuck with these peo­ple, they are the real deal.

So in this com­plete break­down of the press came in the Fediverse. It be­came the only re­li­able source of in­for­ma­tion I had. People posted links with a min­i­mal amount of com­men­tary, pick­ing and choos­ing the best con­tent from other so­cial me­dia net­works. They’re not do­ing it to build a brand” be­cause that’s not a thing in the Fediverse. It’s too dis­jointed to be a place to build a newslet­ter sub­scrip­tion base.

Instead it be­came the only place con­sis­tently post­ing trust­wor­thy in­for­ma­tion I could ac­tu­ally ac­cess. This be­came per­son­ally rel­e­vant when Trump threat­ened to in­vade Greenland, which is the kind of sen­tence I never ex­pected to type and yet here we are. It would be funny if I was­n’t a tiny bit con­cerned that my new home was go­ing to get a CIA overnight regime change spe­cial in the mid­dle of the night.

It was some­where in the mid­dle of DMing with some­one who had for­got­ten more about Greenland than I would ever know and some­one who lived close to an RAF base in the UK that it clicked. This was what they had been talk­ing about. Actual hu­man be­ings were able to find each other and ask di­rect ques­tions with­out this gi­ant moun­tain of bull­shit en­gage­ment piled on top of it. Meta or Oracle or who­ever owns TikTok this week could­n’t stop me.

I never ex­pected to find my news from strangers on a fed­er­ated so­cial net­work that half the in­ter­net has never heard of. I never ex­pected a lot of things. But there’s some­thing qui­etly beau­ti­ful about a place where peo­ple just… share what they know. No brand deals, no en­gage­ment met­rics, no al­go­rithm nudg­ing you to­ward rage. Just some­one who spent twenty years study­ing Arctic pol­icy post­ing a thread at 2 AM be­cause they think you should un­der­stand what’s hap­pen­ing. It’s the in­ter­net I was promised in 1996. It only took thirty years and the com­plete col­lapse of American jour­nal­ism to get here.

...

Read the original on matduggan.com »

9 156 shares, 21 trendiness

The yoghurt delivery women combatting loneliness in Japan

As lone­li­ness deep­ens in one of the world’s fastest-age­ing na­tions, a net­work of women de­liv­er­ing pro­bi­otic milk drinks has be­come a vi­tal source of rou­tine, con­nec­tion and care.

A woman in a neat navy suit and pow­der-blue shirt cy­cles pur­pose­fully down a quiet res­i­den­tial street in Tokyo. It’s 08:30 but al­ready balmy, and she’s grate­ful for the match­ing vi­sor that shields her eyes from the sum­mer sun.

She ar­rives at her first stop, parks her bike and knocks on the door of a small wooden house with pot­ted plants flank­ing the en­trance. Inside, an el­derly woman waits. Her face breaks into a broad smile as she opens the door — she has been ex­pect­ing this visit.

Japan is the world’s most rapidly age­ing ma­jor econ­omy. Nearly 30% of its pop­u­la­tion is now over 65, and the num­ber of el­derly peo­ple liv­ing alone con­tin­ues to rise. As fam­i­lies shrink and tra­di­tional multi-gen­er­a­tional house­holds de­cline, iso­la­tion has be­come one of the coun­try’s most press­ing so­cial chal­lenges.

The suited woman is a Yakult Lady — one of tens of thou­sands across Japan who de­liver the epony­mous pro­bi­otic drinks di­rectly to peo­ple’s homes. On pa­per they’re de­liv­ery work­ers, but in prac­tice they’re part of the coun­try’s in­for­mal so­cial safety net. In a coun­try grap­pling with a rapidly age­ing pop­u­la­tion and a deep­en­ing lone­li­ness cri­sis, Yakult Ladies have be­come an un­likely source of com­mu­nity, help­ing to re­duce the prob­lem of iso­la­tion one drop-off at a time.

With their dis­tinc­tive squat plas­tic bot­tles and shiny red caps, Yakult pi­o­neered a genre. The pro­bi­otic drink was launched in Japan 90 years ago — long be­fore microbiome” be­came com­mon par­lance. But to­day, the women who de­liver them are as im­por­tant to the brand’s iden­tity as the prod­uct it­self.

...

Read the original on www.bbc.com »

10 151 shares, 19 trendiness

Tinnitus Is Somehow Connected to a Crucial Bodily Function

Those who have never en­dured the re­lent­less ring­ing of tin­ni­tus can only dream of the tor­ment. In fact, a bad dream may be the clos­est some get to ex­pe­ri­enc­ing any­thing like it.

The sub­jec­tive sound, which can also be a hiss­ing, buzzing, or click­ing, is heard by no one else, and it may be pre­sent con­stantly, or may come and go.

Neuroscientists at the University of Oxford now sus­pect that sleep and tin­ni­tus are closely in­ter­twined in the brain.

Their find­ings hint at a fun­da­men­tal re­la­tion­ship be­tween the two con­di­tions — one that has, sur­pris­ingly, been over­looked in the brain un­til very re­cently.

What first made me and my col­leagues cu­ri­ous were the re­mark­able par­al­lels be­tween tin­ni­tus and sleep,” neu­ro­sci­en­tist Linus Milinski at Oxford’s Sleep and Circadian Neuroscience Institute told ScienceAlert.

Tinnitus is a de­bil­i­tat­ing med­ical con­di­tion, whereas sleep is a nat­ural state we en­ter reg­u­larly, yet both ap­pear to rely on spon­ta­neous brain ac­tiv­ity. Because there is still no ef­fec­tive treat­ment for sub­jec­tive tin­ni­tus, I be­lieve that ex­plor­ing these sim­i­lar­i­ties might of­fer new ways to un­der­stand and even­tu­ally treat phan­tom per­cepts.”

Watch the video be­low for a sum­mary of the study:

A phantom per­cept’ is when our brains fool us into think­ing we are see­ing, hear­ing, feel­ing, or smelling some­thing that is not there, phys­i­cally speak­ing.

Many peo­ple ex­pe­ri­ence phan­tom per­cepts only dur­ing sleep, but for about 15 per­cent of the world’s pop­u­la­tion, an in­escapable noise rings in their ears dur­ing wak­ing hours, too.

Tinnitus is the world’s most com­mon phan­tom per­cept, and yet there is no known cause or cure, de­spite a long list of hy­pothe­ses.

While many in­di­vid­u­als with tin­ni­tus re­port poor sleep and show poor sleep pat­terns, the po­ten­tial con­nec­tion to this cru­cial bod­ily func­tion has only re­cently come to light.

In 2022, Milinski led a re­view, which the au­thors claim is the first to con­sider, at a func­tional level, how sleep might im­pact tin­ni­tus, and vice versa.

The Oxford re­searchers pro­posed that the large spon­ta­neous waves of brain ac­tiv­ity that oc­cur dur­ing deep sleep, or non-rapid eye move­ment sleep (non-REM), might sup­press the brain ac­tiv­ity that leads to tin­ni­tus.

To test that idea, the team turned to fer­rets, which have a sim­i­lar au­di­tory sys­tem to hu­mans. In ex­per­i­ments pub­lished in 2024, re­searchers found that fer­rets that de­vel­oped more se­vere tin­ni­tus also showed dis­rupted sleep.

We could ac­tu­ally see these sleep prob­lems ap­pear at the same time as tin­ni­tus af­ter noise ex­po­sure,” Milinski told ScienceAlert. This sug­gested, for the first time, a clear link be­tween de­vel­op­ing tin­ni­tus and dis­rupted sleep.”

Crucially, the fer­rets that de­vel­oped tin­ni­tus showed overly re­spon­sive brain ac­tiv­ity to sound. When the fer­rets fi­nally did man­age to slip into non-REM sleep, that hy­per­ac­tiv­ity was damp­ened.

This sug­gests that sleep may tem­porar­ily mask the ef­fects of tin­ni­tus by en­gag­ing the same brain cir­cuits.

Our find­ings in­di­cate that deep sleep may in­deed help mit­i­gate tin­ni­tus and could re­veal nat­ural brain mech­a­nisms for mod­u­lat­ing ab­nor­mal ac­tiv­ity,” said Milinski.

Research on non-hu­man an­i­mals has its ob­vi­ous lim­i­ta­tions, but the same sort of brain ac­tiv­ity pat­terns may ex­ist in hu­mans, too.

Since their 2022 re­view, Milinski says the field has rapidly ex­panded, with a grow­ing num­ber of large-scale stud­ies in­ves­ti­gat­ing how sleep, the en­vi­ron­ment, and tin­ni­tus in­ter­act — and not just in fer­rets.

I hope this re­search will lead to greater aware­ness of tin­ni­tus and open new ways of ex­plor­ing treat­ments,” Milinski told ScienceAlert.

Acknowledging the im­pact of tin­ni­tus, es­pe­cially in older adults, where hear­ing loss and tin­ni­tus can in­crease iso­la­tion and con­tribute to men­tal health prob­lems, is in­cred­i­bly im­por­tant.”

Just last year, a study from China found that in­di­vid­u­als with tin­ni­tus were less able to sup­press the hy­per­ac­tiv­ity of their awake brains as they tran­si­tioned into a sleep state.

During deep sleep, how­ever, the hy­per­ac­tiv­ity linked to tin­ni­tus was sup­pressed.

This study es­tab­lishes sleep as a crit­i­cal ther­a­peu­tic tar­get to in­ter­rupt the 24-hour dys­func­tional cy­cle of tin­ni­tus,” the au­thors con­clude, led by Xiaoyu Bao of South China University of Technology.

At Oxford, Milinski and his col­leagues are now fo­cus­ing on how sleep may af­fect the de­vel­op­ment of tin­ni­tus.

Tinnitus can make sleep worse, and poor sleep may, in turn, make tin­ni­tus worse. It may be a kind of vi­cious cir­cle, al­though I do not be­lieve it is un­break­able,” spec­u­lated Milinski.

When we do not sleep well, we be­come more vul­ner­a­ble to stress, and stress is one of the strongest fac­tors known to worsen tin­ni­tus. Stress can even trig­ger tin­ni­tus to be­gin with.”

Further re­search could not only lead to ef­fec­tive tin­ni­tus treat­ments but also help sci­en­tists bet­ter un­der­stand the mys­ter­ies of sleep it­self.

The 2022 re­view was pub­lished in Brain Communications.

An ear­lier ver­sion of this ar­ti­cle was pub­lished in November 2025.

...

Read the original on www.sciencealert.com »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.