10 interesting stories served every morning and every evening.
In September 2024, Amandla Thomas-Johnson was a Ph. D. candidate studying in the U.S. on a student visa when he briefly attended a pro-Palestinian protest. In April 2025, Immigration and Customs Enforcement (ICE) sent Google an administrative subpoena requesting his data. The next month, Google gave Thomas-Johnson’s information to ICE without giving him the chance to challenge the subpoena, breaking a nearly decade-long promise to notify users before handing their data to law enforcement.
Google names a handful of exceptions to this promise (such as if Google receives a gag order from a court) that do not apply to Thomas-Johnson’s case. While ICE “requested” that Google not notify Thomas-Johnson, the request was not enforceable or mandated by a court. Today, the Electronic Frontier Foundation sent complaints to the California and New York Attorneys General asking them to investigate Google for deceptive trade practices for breaking that promise. You can read about the complaints here. Below is Thomas-Johnson’s account of his ordeal.
I thought my ordeal with U. S. immigration authorities was over a year ago, when I left the country, crossing into Canada at Niagara Falls.
By that point, the Trump administration had effectively turned federal power against international students like me. After I attended a pro-Palestine protest at Cornell University—for all of five minutes—the administration’s rhetoric about cracking down on students protesting what we saw as genocide forced me into hiding for three months. Federal agents came to my home looking for me. A friend was detained at an airport in Tampa and interrogated about my whereabouts.
I’m currently a Ph. D. student. Before that, I was a reporter. I’m a dual British and Trinadad and Tobago citizen. I have not been accused of any crime.
I believed that once I left U. S. territory, I had also left the reach of its authorities. I was wrong.
Weeks later, in Geneva, Switzerland, I received what looked like a routine email from Google. It informed me that the company had already handed over my account data to the Department of Homeland Security.
At first, I wasn’t alarmed. I had seen something similar before. An associate of mine, Momodou Taal, had received advance notice from Google and Facebook that his data had been requested. He was given advanced notice of the subpoenas, and law enforcement eventually withdrew them before the companies turned over his data.
Google had already disclosed my data without telling me.
I assumed I would be given the same opportunity. But the language in my email was different. It was final: “Google has received and responded to legal process from a law enforcement authority compelling the release of information related to your Google Account.”
Google had already disclosed my data without telling me. There was no opportunity to contest it.
To be clear, this should not have happened this way. Google promises that it will notify users before their data is handed over in response to legal processes, including administrative subpoenas. That notice is meant to provide a chance to challenge the request. In my case, that safeguard was bypassed. My data was handed over without warning—at the request of an administration targeting students engaged in protected political speech.
Months later, my lawyer at the Electronic Frontier Foundation obtained the subpoena itself. On paper, the request focused largely on subscriber information: IP addresses, physical address, other identifiers, and session times and durations.
But taken together, these fragments form something far more powerful—a detailed surveillance profile. IP logs can be used to approximate location. Physical addresses show where you sleep. Session times would show when you were communicating with friends or family. Even without message content, the picture that emerges is intimate and invasive.
What this experience has made clear is that anyone can be targeted by law enforcement. And with their massive stores of data, technology companies can facilitate those arbitrary investigations. Together, they can combine state power, corporate data, and algorithmic inference in ways that are difficult to see—and even harder to challenge.
The consequences of what happened to me are not abstract. I left the United States. But I do not feel that I have left its reach. Being investigated by the federal government is intimidating. Questions run through your head. Am I now a marked individual? Will I face heightened scrutiny if I continue my reporting? Can I travel safely to see family in the Caribbean?
Who, exactly, can I hold accountable?
Update: This post has been updated to include more information about Google’s exceptions to their notification policy, none of which applied to the subpoena targeting Thomas-Johnson.
...
Read the original on www.eff.org »
Our latest model, Claude Opus 4.7, is now generally available. Opus 4.7 is a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks. Users report being able to hand off their hardest coding work—the kind that previously needed close supervision—to Opus 4.7 with confidence. Opus 4.7 handles complex, long-running tasks with rigor and consistency, pays precise attention to instructions, and devises ways to verify its own outputs before reporting back.The model also has substantially better vision: it can see images in greater resolution. It’s more tasteful and creative when completing professional tasks, producing higher-quality interfaces, slides, and docs. And—although it is less broadly capable than our most powerful model, Claude Mythos Preview—it shows better results than Opus 4.6 across a range of benchmarks:Last week we announced Project Glasswing, highlighting the risks—and benefits—of AI models for cybersecurity. We stated that we would keep Claude Mythos Preview’s release limited and test new cyber safeguards on less capable models first. Opus 4.7 is the first such model: its cyber capabilities are not as advanced as those of Mythos Preview (indeed, during its training we experimented with efforts to differentially reduce these capabilities). We are releasing Opus 4.7 with safeguards that automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses. What we learn from the real-world deployment of these safeguards will help us work towards our eventual goal of a broad release of Mythos-class models.Security professionals who wish to use Opus 4.7 for legitimate cybersecurity purposes (such as vulnerability research, penetration testing, and red-teaming) are invited to join our new Cyber Verification Program.Opus 4.7 is available today across all Claude products and our API, Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry. Pricing remains the same as Opus 4.6: $5 per million input tokens and $25 per million output tokens. Developers can use claude-opus-4-7 via the Claude API.Claude Opus 4.7 has garnered strong feedback from our early-access testers:In early testing, we’re seeing the potential for a significant leap for our developers with Claude Opus 4.7. It catches its own logical faults during the planning phase and accelerates execution, far beyond previous Claude models. As a financial technology platform serving millions of consumers and businesses at significant scale, this combination of speed and precision could be game-changing: accelerating development velocity for faster delivery of the trusted financial solutions our customers rely on every day.Anthropic has already set the standard for coding models, and Claude Opus 4.7 pushes that further in a meaningful way as the state-of-the-art model on the market. In our internal evals, it stands out not just for raw capability, but for how well it handles real-world async workflows—automations, CI/CD, and long-running tasks. It also thinks more deeply about problems and brings a more opinionated perspective, rather than simply agreeing with the user.Claude Opus 4.7 is the strongest model Hex has evaluated. It correctly reports when data is missing instead of providing plausible-but-incorrect fallbacks, and it resists dissonant-data traps that even Opus 4.6 falls for. It’s a more intelligent, more efficient Opus 4.6: low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6.On our 93-task coding benchmark, Claude Opus 4.7 lifted resolution by 13% over Opus 4.6, including four tasks neither Opus 4.6 nor Sonnet 4.6 could solve. Combined with faster median latency and strict instruction following, it’s particularly meaningful for complex, long-running coding workflows. It cuts the friction from those multi-step tasks so developers can stay in the flow and focus on building.Based on our internal research-agent benchmark, Claude Opus 4.7 has the strongest efficiency baseline we’ve seen for multi-step work. It tied for the top overall score across our six modules at 0.715 and delivered the most consistent long-context performance of any model we tested. On General Finance—our largest module—it improved meaningfully on Opus 4.6, scoring 0.813 versus 0.767, while also showing the best disclosure and data discipline in the group. And on deductive logic, an area where Opus 4.6 struggled, Opus 4.7 is solid.Claude Opus 4.7 extends the limit of what models can do to investigate and get tasks done. Anthropic has clearly optimized for sustained reasoning over long runs, and it shows with market-leading performance. As engineers shift from working 1:1 with agents to managing them in parallel, this is exactly the kind of frontier capability that unlocks new workflows.We’re seeing major improvements in Claude Opus 4.7’s multimodal understanding, from reading chemical structures to interpreting complex technical diagrams. The higher resolution support is helping Solve Intelligence build best-in-class tools for life sciences patent workflows, from drafting and prosecution to infringement detection and invalidity charting.Claude Opus 4.7 takes long-horizon autonomy to a new level in Devin. It works coherently for hours, pushes through hard problems rather than giving up, and unlocks a class of deep investigation work we couldn’t reliably run before.For Replit, Claude Opus 4.7 was an easy upgrade decision. For the work our users do every day, we observed it achieving the same quality at lower cost—more efficient and precise at tasks like analyzing logs and traces, finding bugs, and proposing fixes. Personally, I love how it pushes back during technical discussions to help me make better decisions. It really feels like a better coworker.Claude Opus 4.7 demonstrates strong substantive accuracy on BigLaw Bench for Harvey, scoring 90.9% at high effort with better reasoning calibration on review tables and noticeably smarter handling of ambiguous document editing tasks. It correctly distinguishes assignment provisions from change-of-control provisions, a task that has historically challenged frontier models. Substance was consistently rated as a strength across our evaluations: correct, thorough, and well-cited.Claude Opus 4.7 is a very impressive coding model, particularly for its autonomy and more creative reasoning. On CursorBench, Opus 4.7 is a meaningful jump in capabilities, clearing 70% versus Opus 4.6 at 58%.For complex multi-step workflows, Claude Opus 4.7 is a clear step up: plus 14% over Opus 4.6 at fewer tokens and a third of the tool errors. It’s the first model to pass our implicit-need tests, and it keeps executing through tool failures that used to stop Opus cold. This is the reliability jump that makes Notion Agent feel like a true teammate.In our evals, we saw a double-digit jump in accuracy of tool calls and planning in our core orchestrator agents. As users leverage Hebbia to plan and execute on use cases like retrieval, slide creation, or document generation, Claude Opus 4.7 shows the potential to improve agent decision-making in these workflows.On Rakuten-SWE-Bench, Claude Opus 4.7 resolves 3x more production tasks than Opus 4.6, with double-digit gains in Code Quality and Test Quality. This is a meaningful lift and a clear upgrade for the engineering work our teams are shipping every day.For CodeRabbit’s code review workloads, Claude Opus 4.7 is the sharpest model we’ve tested. Recall improved by over 10%, surfacing some of the most difficult-to-detect bugs in our most complex PRs, while precision remained stable despite the increased coverage. It’s a bit faster than GPT-5.4 xhigh on our harness, and we’re lining it up for our heaviest review work at launch.For Genspark’s Super Agent, Claude Opus 4.7 nails the three production differentiators that matter most: loop resistance, consistency, and graceful error recovery. Loop resistance is the most critical. A model that loops indefinitely on 1 in 18 queries wastes compute and blocks users. Lower variance means fewer surprises in prod. And Opus 4.7 achieves the highest quality-per-tool-call ratio we’ve measured.Claude Opus 4.7 is a meaningful step up for Warp. Opus 4.6 is one of the best models out there for developers, and this model is measurably more thorough on top of that. It passed Terminal Bench tasks that prior Claude models had failed, and worked through a tricky concurrency bug Opus 4.6 couldn’t crack. For us, that’s the signal.Claude Opus 4.7 is the best model in the world for building dashboards and data-rich interfaces. The design taste is genuinely surprising—it makes choices I’d actually ship. It’s my default daily driver now.Claude Opus 4.7 is the most capable model we’ve tested at Quantium. Evaluated against leading AI models through our proprietary benchmarking solution, the biggest gains showed up where they matter most: reasoning depth, structured problem-framing, and complex technical work. Fewer corrections, faster iterations, and stronger outputs to solve the hardest problems our clients bring us.Claude Opus 4.7 feels like a real step up in intelligence. Code quality is noticeably improved, it’s cutting out the meaningless wrapper functions and fallback scaffolding that used to pile up, and fixes its own code as it goes. It’s the cleanest jump we’ve seen since the move from Sonnet 3.7 to the Claude 4 series.For the computer-use work that sits at the heart of XBOW’s autonomous penetration testing, the new Claude Opus 4.7 is a step change: 98.5% on our visual-acuity benchmark versus 54.5% for Opus 4.6. Our single biggest Opus pain point effectively disappeared, and that unlocks its use for a whole class of work where we couldn’t use it before.Claude Opus 4.7 is a solid upgrade with no regressions for Vercel. It’s phenomenal on one-shot coding tasks, more correct and complete than Opus 4.6, and noticeably more honest about its own limits. It even does proofs on systems code before starting work, which is new behavior we haven’t seen from earlier Claude models.Claude Opus 4.7 is very strong and outperforms Opus 4.6 with a 10% to 15% lift in task success for Factory Droids, with fewer tool errors and more reliable follow-through on validation steps. It carries work all the way through instead of stopping halfway, which is exactly what enterprise engineering teams need.Claude Opus 4.7 autonomously built a complete Rust text-to-speech engine from scratch—neural model, SIMD kernels, browser demo—then fed its own output through a speech recognizer to verify it matched the Python reference. Months of senior engineering, delivered autonomously. The step up from Opus 4.6 is clear, and the codebase is public.Claude Opus 4.7 passed three TBench tasks that prior Claude models couldn’t, and it’s landing fixes our previous best model missed, including a race condition. It demonstrates strong precision in identifying real issues, and surfaces important findings that other models either gave up on or didn’t resolve. In Qodo’s real-world code review benchmark, we observed top-tier precision.On Databricks’ OfficeQA Pro, Claude Opus 4.7 shows meaningfully stronger document reasoning, with 21% fewer errors than Opus 4.6 when working with source information. Across our agentic reasoning over data benchmarks, it is the best-performing Claude model for enterprise document analysis.For Ramp, Claude Opus 4.7 stands out in agent-team workflows. We’re seeing stronger role fidelity, instruction-following, coordination, and complex reasoning, especially on engineering tasks that span tools, codebases, and debugging context. Compared with Opus 4.6, it needs much less step-by-step guidance, helping us scale the internal agent workflows our engineering teams run.Claude Opus 4.7 is measurably better than Opus 4.6 for Bolt’s longer-running app-building work, up to 10% better in the best cases, without the regressions we’ve come to expect from very agentic models. It pushes the ceiling on what our users can ship in a single session.Below are some highlights and notes from our early testing of Opus 4.7:Instruction following. Opus 4.7 is substantially better at following instructions. Interestingly, this means that prompts written for earlier models can sometimes now produce unexpected results: where previous models interpreted instructions loosely or skipped parts entirely, Opus 4.7 takes the instructions literally. Users should re-tune their prompts and harnesses accordingly.Improved multimodal support. Opus 4.7 has better vision for high-resolution images: it can accept images up to 2,576 pixels on the long edge (~3.75 megapixels), more than three times as many as prior Claude models. This opens up a wealth of multimodal uses that depend on fine visual detail: computer-use agents reading dense screenshots, data extractions from complex diagrams, and work that needs pixel-perfect references.1Real-world work. As well as its state-of-the-art score on the Finance Agent evaluation (see table above), our internal testing showed Opus 4.7 to be a more effective finance analyst than Opus 4.6, producing rigorous analyses and models, more professional presentations, and tighter integration across tasks. Opus 4.7 is also state-of-the-art on GDPval-AA, a third-party evaluation of economically valuable knowledge work across finance, legal, and other domains.Memory. Opus 4.7 is better at using file system-based memory. It remembers important notes across long, multi-session work, and uses them to move on to new tasks that, as a result, need less up-front context.The charts below display more evaluation results from our pre-release testing, across a range of different domains:Overall, Opus 4.7 shows a similar safety profile to Opus 4.6: our evaluations show low rates of concerning behavior such as deception, sycophancy, and cooperation with misuse. On some measures, such as honesty and resistance to malicious “prompt injection” attacks, Opus 4.7 is an improvement on Opus 4.6; in others (such as its tendency to give overly detailed harm-reduction advice on controlled substances), Opus 4.7 is modestly weaker. Our alignment assessment concluded that the model is “largely well-aligned and trustworthy, though not fully ideal in its behavior”. Note that Mythos Preview remains the best-aligned model we’ve trained according to our evaluations. Our safety evaluations are discussed in full in the Claude Opus 4.7 System Card.Overall misaligned behavior score from our automated behavioral audit. On this evaluation, Opus 4.7 is a modest improvement on Opus 4.6 and Sonnet 4.6, but Mythos Preview still shows the lowest rates of misaligned behavior.In addition to Claude Opus 4.7 itself, we’re launching the following updates:More effort control: Opus 4.7 introduces a new xhigh (“extra high”) effort level between high and max, giving users finer control over the tradeoff between reasoning and latency on hard problems. In Claude Code, we’ve raised the default effort level to xhigh for all plans. When testing Opus 4.7 for coding and agentic use cases, we recommend starting with high or xhigh effort.On the Claude Platform (API): as well as support for higher-resolution images, we’re also launching task budgets in public beta, giving developers a way to guide Claude’s token spend so it can prioritize work across longer runs.In Claude Code: The new /ultrareview slash command produces a dedicated review session that reads through changes and flags bugs and design issues that a careful reviewer would catch. We’re giving Pro and Max Claude Code users three free ultrareviews to try it out. In addition, we’ve extended auto mode to Max users. Auto mode is a new permissions option where Claude makes decisions on your behalf, meaning that you can run longer tasks with fewer interruptions—and with less risk than if you had chosen to skip all permissions.Opus 4.7 is a direct upgrade to Opus 4.6, but two changes are worth planning for because they affect token usage. First, Opus 4.7 uses an updated tokenizer that improves how the model processes text. The tradeoff is that the same input can map to more tokens—roughly 1.0–1.35× depending on the content type. Second, Opus 4.7 thinks more at higher effort levels, particularly on later turns in agentic settings. This improves its reliability on hard problems, but it does mean it produces more output tokens. Users can control token usage in various ways: by using the effort parameter, adjusting their task budgets, or prompting the model to be more concise. In our own testing, the net effect is favorable—token usage across all effort levels is improved on an internal coding evaluation, as shown below—but we recommend measuring the difference on real traffic. We’ve written a migration guide that provides further advice on upgrading from Opus 4.6 to Opus 4.7.Score on an internal agentic coding evaluation as a function of token usage at each effort level. In this evaluation, the model works autonomously from a single user prompt, and results may not be representative of token usage in interactive coding. See the migration guide for more on tuning effort levels.
...
Read the original on www.anthropic.com »
Skip to content
Google collects statistics about IPv6 adoption in the Internet on an ongoing basis. We hope that publishing this information will help Internet providers, website owners, and policy makers as the industry rolls out IPv6.
We are continuously measuring the availability of IPv6 connectivity among Google users. The graph shows the percentage of users that access Google over IPv6.
The chart above shows the availability of IPv6 connectivity around the world.
Regions where IPv6 is more widely deployed (the darker the green, the greater the deployment) and users experience infrequent issues connecting to IPv6-enabled websites.
Regions where IPv6 is more widely deployed but users still experience significant reliability or latency issues connecting to IPv6-enabled websites.
Regions where IPv6 is not widely deployed and users experience significant reliability or latency issues connecting to IPv6-enabled websites.
...
Read the original on www.google.com »
*Menu prices may differ at special location restaurants, selected restaurants and for delivery.
English menu is available for your convenience
McDonald’s menu and allergen/nutrition information is available in English for the convenience of our customers, except for the information listed below, which is currently available only in Japanese in McDonald’s Japan website.
Information and notes on products and availability
*McDonald’s Japan’s allergen information only covers 8 ingredients which must be indicated on the label and 20 which are recommended by Japanese Food Labeling Standard (Food Labeling Act) as of September 2024. You can also place an order in English on our official app. Several restaurants also have English menus on hand, so please ask our crew if you are looking for an English menu.
※Click the image or product name to learn more about allergen/nutrition information, and other details.
※All displayed prices are tax included and a single, tax-inclusive price applies for both eat-in and takeout (inc. drive-thru) orders (tax-exclusive price may differ).
※Menu prices may differ at special location restaurants and selected restaurants.
※Some products are not available at all restaurants.
※“Bai Burger” menu is available for all regular burgers except for “Roasted Soy Sauce Double Thick Beef” and “Roasted Soy Sauce Egg Bacon Thick Beef”.
※Breakfast is available until 10:30am, Regular Menu is available from 10:30am and Yoru Mac menu is available from 5:00pm
※Asa Mac orders are accepted until 10:20am for Mobile Order & Pay and McDelivery
※HiruMac is available between 10:30am and 2:00pm on weekdays
※McShake®, McFloat®, Soft Twist, McFlurry® are available between 10:30am and 1:00 am the next day
※McShake® may be mixed with other flavors due to the nature of the machine. For this reason, the allergy information may differ from the usual information during limited-time product sales. Please check the latest information each time you order.
※For customized products, exact information may vary. Please be aware that customization is not a service that completely eliminates allergens.
※Oreo and the design of the Oreo cookie are trademarks licensed by the Mondelez International Group.
※ Coke is a registered trademarks of The Coca-Cola Company.
※McCafé® menu at McCafé by Barista stores availability is subject to McCafé by Barista counter business hours.
※McCafé® menu is not available for purchase at the drive-thru at some McCafé by Barista stores.
※Images are for illustrative purposes only.
※Coupons for shareholders are not redeemable for Shaka Shaka Potato® Buttered Potato Flavor.
...
Read the original on www.mcdonalds.co.jp »
Ollama is the most popular way to run local LLMs. It shouldn’t be. It gained that position by being first, the first tool that made llama.cpp accessible to people who didn’t want to compile C++ or write their own server configs. That was a real contribution, briefly. But the project has since spent years systematically obscuring where its actual technology comes from, misleading users about what they’re running, and drifting from the local-first mission that earned it trust in the first place. All while taking venture capital money.
This isn’t a “both sides” piece. I’ve used Ollama. I’ve moved on. Here’s why you should too.
Ollama’s entire inference capability comes from llama.cpp, the C++ inference engine created by Georgi Gerganov in March 2023. Gerganov’s project is what made it possible to run LLaMA models on consumer laptops at all, he hacked together the first version in an evening, and it kicked off the entire local LLM movement. Today llama.cpp has over 100,000 stars on GitHub, 450+ contributors, and is the foundation that nearly every GGUF-based tool depends on.
Ollama was founded in 2021 by Jeffrey Morgan and Michael Chiang, both previously behind Kitematic, a Docker GUI that was acquired by Docker Inc. They went through Y Combinator’s Winter 2021 batch, raised pre-seed funding, and launched publicly in 2023. From day one, the pitch was “Docker for LLMs”, a convenient wrapper that downloads and runs models with a single command. Under the hood, it was llama.cpp doing all the work.
For over a year, Ollama’s README contained no mention of llama.cpp. Not in the README, not on the website, not in their marketing materials. The project’s binary distributions didn’t include the required MIT license notice for the llama.cpp code they were shipping. This isn’t a matter of open-source etiquette, the MIT license has exactly one major requirement: include the copyright notice. Ollama didn’t.
The community noticed. GitHub issue #3185 was opened in early 2024 requesting license compliance. It went over 400 days without a response from maintainers. When issue #3697 was opened in April 2024 specifically requesting llama.cpp acknowledgment, community PR #3700 followed within hours. Ollama’s co-founder Michael Chiang eventually added a single line to the bottom of the README: “llama.cpp project founded by Georgi Gerganov.”
The response to the PR was revealing. Ollama’s team wrote: “We spend a large chunk of time fixing and patching it up to ensure a smooth experience for Ollama users… Overtime, we will be transitioning to more systematically built engines.” Translation: we’re not going to give llama.cpp prominent credit, and we plan to distance ourselves from it anyway.
As one Hacker News commenter put it: “I’m continually puzzled by their approach, it’s such self-inflicted negative PR. Building on llama is perfectly valid and they’re adding value on ease of use here. Just give the llama team proper credit.” Another: “The fact that Ollama has been downplaying their reliance on llama.cpp has been known in the local LLM community for a long time.”
In mid-2025, Ollama followed through on that distancing. They moved away from using llama.cpp as their inference backend and built a custom implementation directly on top of ggml, the lower-level tensor library that llama.cpp itself uses. Their stated reason was stability, llama.cpp moves fast and breaks things, and Ollama’s enterprise partners need reliability.
The result was the opposite. Ollama’s custom backend reintroduced bugs that llama.cpp had solved years ago. Community members flagged broken structured output support, vision model failures, and GGML assertion crashes across multiple versions. Models that worked fine in upstream llama.cpp failed in Ollama, including new releases like GPT-OSS 20B, where Ollama’s implementation lacked support for tensor types that the model required. Georgi Gerganov himself identified that Ollama had forked and made bad changes to GGML.
The irony is thick. They downplayed their dependence on llama.cpp for years, then when they finally tried to go it alone, they produced an inferior version of the thing they refused to credit.
Benchmarks tell the story. Multiple community tests show llama.cpp running 1.8x faster than Ollama on the same hardware with the same model, 161 tokens per second versus 89. On CPU, the gap is 30-50%. A recent comparison on Qwen-3 Coder 32B showed ~70% higher throughput with llama.cpp. The performance overhead comes from Ollama’s daemon layer, poor GPU offloading heuristics, and a vendored backend that trails upstream.
When DeepSeek released its R1 model family in January 2025, Ollama listed the smaller distilled versions, models like DeepSeek-R1-Distill-Qwen-32B, which are fine-tuned Qwen and Llama models, not the actual 671-billion-parameter R1, simply as “DeepSeek-R1” in their library and CLI. Running ollama run deepseek-r1 pulls an 8B Qwen-derived distillate that behaves nothing like the real model.
This wasn’t an oversight. DeepSeek themselves named these models with the “R1-Distill” prefix. Hugging Face listed them correctly. Ollama stripped the distinction. The result was a flood of social media posts from people claiming they were running “DeepSeek-R1” on consumer hardware, followed by confusion about why it performed poorly, doing reputational damage to DeepSeek in the process.
GitHub issues #8557 and #8698 requested separation of the models. Both were closed as duplicates with no fix. As of today, ollama run deepseek-r1 still launches a tiny distilled model. Ollama knew the difference and chose to obscure it, presumably because “DeepSeek-R1” drives more downloads than “DeepSeek-R1-Distill-Qwen-32B” does.
In July 2025, Ollama released a GUI desktop app for macOS and Windows. The app was developed in a private repository (github.com/ollama/app), shipped without a license, and the source code wasn’t publicly available. For a project that had built its reputation on being open-source, this was a jarring move.
Community members immediately raised concerns. The license issue received 40 upvotes. Developers found potential AGPL-3.0 dependencies in the binary. The website placed the download button next to a GitHub link, giving the impression users were downloading the MIT-licensed open-source tool when they were actually getting an unlicensed closed-source application. Maintainers were silent for months. The code was eventually merged into the main repo in November 2025, but the initial rollout revealed where the project’s instincts lie.
As XDA put it: “If your project trades on being open source, you do not get to be vague about what is and is not open at launch.”
GGUF, the model format created by Georgi Gerganov, was designed with one core principle: single-file deployment. Bullet point #1 in the GGUF spec reads: “Full information: all information needed to load a model is contained in the model file, and no additional information needs to be provided by the user.” Chat templates, stop tokens, model metadata, it’s all embedded in the file. You point llama.cpp at a GGUF and it works.
Ollama added the Modelfile on top of this. It’s a separate configuration file, inspired by Dockerfiles, naturally, that specifies the base model, chat template, system prompt, sampling parameters, and stop tokens. Most of this information already exists inside the GGUF file. As one Hacker News commenter put it: “We literally just got rid of that multi-file chaos only for Ollama to add it back.”
The problems with this approach compound quickly. Ollama only auto-detects chat templates it already knows about from a hardcoded list. If a GGUF file has a valid Jinja chat template embedded in its metadata but it doesn’t match one of Ollama’s known templates, Ollama falls back to a bare {{ .Prompt }} template, silently breaking the model’s instruction format. The user has to manually extract the chat template from the GGUF, translate it into Go template syntax (which is different from Jinja), and write it into a Modelfile. Meanwhile, llama.cpp reads the embedded template and just uses it.
Modifying parameters is worse. If you want to change the temperature or system prompt on a model you pulled from Ollama’s registry, the workflow is: export the Modelfile with ollama show –modelfile, edit it, then run ollama create to build a new model entry. Users have reported that this process copies the entire model, 30 to 60 GB, to change one parameter. As one user described it: “The ‘modelfile’ workflow is a pain in the booty. It’s a dogwater pattern and I hate it. Some of these models are 30 to 60GB and copying the entire thing to change one parameter is just dumb.”
Compare this to llama.cpp, where parameters are command-line flags. Want a different temperature? Pass –temp 0.7. Different system prompt? Pass it in the API request. No files to create, no gigabytes to copy, no proprietary format to learn.
The Modelfile also locks users into Ollama’s Go template syntax, which is a different language from the Jinja templates that model creators actually publish. LM Studio accepts Jinja templates directly. llama.cpp reads them from the GGUF. Only Ollama requires you to translate between template languages, and gets it wrong often enough that entire GitHub issues are dedicated to mismatched templates between Ollama’s library and the upstream GGUF metadata.
When a new model drops, say a new Qwen, Gemma, or DeepSeek variant, GGUFs typically appear on Hugging Face within hours, quantized by community members like Unsloth or Bartowski. With llama.cpp, you can run them immediately: llama-server -hf unsloth/Qwen3.5-35B-A3B-GGUF:Q4_K_M. One command, straight from Hugging Face, no intermediary.
With Ollama, you wait. Someone at Ollama has to package the model for their registry, choose which quantizations to offer (typically just Q4_K_M and Q8_0, no Q5, Q6, or IQ quants), convert the chat template to Go format, and push it. Until then, the model doesn’t exist in Ollama’s world unless you do the Modelfile dance yourself.
This creates a recurring pattern on r/LocalLLaMA: new model launches, people try it through Ollama, it’s broken or slow or has botched chat templates, and the model gets blamed instead of the runtime. A recent PSA post titled “If you want to test new models, use llama.cpp/transformers/vLLM/SGLang” documented how Qwen models showed problems with tool calls and garbage responses that “only happen with Ollama” due to their vendored backend and broken template handling. As one commenter put it: “Friends don’t let friends use ollama.”
The quantization limitation is particularly frustrating. Ollama only supports creating Q4_K_S, Q4_K_M, Q8_0, F16, and F32 quantizations. If you need Q5_K_M, Q6_K, or any IQ quant, formats that llama.cpp has supported for years, you’re out of luck unless you do the quantization yourself outside of Ollama. When a user asked about Q2_K support, the response was effectively “use a different tool.” For a project that markets itself as the easy way to run models, telling users to go elsewhere for basic quantization options is telling.
Ollama eventually added ollama run hf.co/{repo}:{quant} to pull directly from Hugging Face, which partially addresses the availability problem. But even then, the file gets copied into Ollama’s hashed blob storage, you still can’t share the GGUF with other tools, and the template detection issues still apply. The fundamental architecture remains: Ollama inserts itself as a middleman between you and your models, and that middleman is slower, less capable, and less compatible than the tools it sits on top of.
In late 2025, Ollama introduced cloud-hosted models alongside its local library. The tool that was synonymous with local, private inference started routing prompts to third-party cloud providers. Proprietary models like MiniMax appeared in the model list without clear disclosure that selecting them would send your data off-machine.
Users raised concerns about data routing, when you run a closed-source model like MiniMax-m2.7 through “Ollama Cloud,” your prompts may be forwarded to the external provider who actually hosts the model. Ollama’s own documentation says “we process your prompts and responses to provide the service but do not store or log that content,” but says nothing about what the third-party provider does with it. For models hosted by Alibaba Cloud, users noted there is no zero-data-retention guarantee.
This was compounded by CVE-2025-51471, a token exfiltration vulnerability that affects all Ollama versions. A malicious registry server can trick Ollama into sending its authentication token to an attacker-controlled endpoint during a normal model pull. The fix exists as a PR but took months to land. In a tool that built its brand on local privacy, a vulnerability that leaks credentials to arbitrary servers is not a minor issue, it’s an architectural philosophy problem.
All of this makes more sense when you look at the incentive structure. Ollama is a Y Combinator-backed (W21) startup, founded by engineers who previously built a Docker GUI that was acquired by Docker Inc. The playbook is familiar: wrap an existing open-source project in a user-friendly interface, build a user base, raise money, then figure out monetization.
The progression follows the pattern cleanly:
Minimize attribution, make the product look self-sufficient to investors
Create lock-in, proprietary model registry format, hashed filenames that don’t work with other tools
The model registry is worth examining. Ollama stores downloaded models using hashed filenames in its own format. If you’ve been pulling models through Ollama for months, you can’t just point llama.cpp or LM Studio at those files without extra work. You can bring your own GGUFs to Ollama via a Modelfile, but it’s deliberately friction-filled to take them out. This is a form of vendor lock-in that most users don’t notice until they try to leave.
The tools Ollama wraps are directly accessible, and they’re not much harder to set up.
llama.cpp is the engine. It has an OpenAI-compatible API server (llama-server), a built-in web UI, full control over context windows and sampling parameters, and consistently better throughput than Ollama. In February 2026, Gerganov’s ggml.ai joined Hugging Face to ensure the long-term sustainability of the project. It’s truly community-driven, MIT-licensed, and under active development with 450+ contributors.
llama-swap handles multi-model orchestration, loading, unloading, and hot-swapping models on demand behind a single API endpoint. Pair it with LiteLLM and you get a unified OpenAI-compatible proxy that routes across multiple backends with proper model aliasing.
LM Studio gives you a GUI if that’s what you want. It uses llama.cpp under the hood, exposes all the knobs, and supports any GGUF model without lock-in. Jan is another open-source desktop app with a clean chat interface and local-first design. Msty offers a polished GUI with multi-model support and built-in RAG. koboldcpp is another option with a web UI and extensive configuration options.
Red Hat’s ramalama is worth a look too, a container-native model runner that explicitly credits its upstream dependencies front and center. Exactly what Ollama should have done from the start.
None of these tools require more than a few minutes to set up. The idea that Ollama is the only accessible option hasn’t been true for a long time.
Georgi Gerganov hacked together llama.cpp in an evening in March 2023 and kicked off a revolution in local AI. He and a community of hundreds of contributors have spent years making it possible to run increasingly powerful models on consumer hardware. That work is genuinely important, it’s the foundation that keeps local inference open and accessible.
Ollama wrapped that work in a nice CLI, raised VC money on the back of it, spent over a year refusing to credit it, forked it badly, shipped a closed-source app alongside it, and then pivoted the whole thing toward cloud services. At every decision point where they could have been good open-source citizens, they chose the path that made them look more self-sufficient to investors.
The local LLM ecosystem doesn’t need Ollama. It needs llama.cpp. The rest is packaging, and better packaging already exists.
...
Read the original on sleepingrobots.com »
Last week we learned about Anthropic’s Mythos, a new LLM so “strikingly capable at computer security tasks” that Anthropic didn’t release it publicly. Instead, only critical software makers have been granted access, providing them time to harden their systems.
We quickly blew through our standard stages of processing big AI claims: shock, existential fear, hype, skepticism, criticism, and (finally) moving onto the next thing. I encouraged people to take a wait-and-see approach, as security capabilities are tailor-made for impressive demos. Finding exploits is a clearly defined, verifiable search problem. You’re not building a complex system, but poking at one that exists. A problem well suited to throwing millions of tokens at.
Yesterday, the first 3rd party analysis landed, from the AI Security Institute (AISI), largely supporting Anthropic’s claims. Mythos is really good, “a step up over previous frontier models in a landscape where cyber performance was already rapidly improving.”
The entire report is worth reading, but I want to focus on the following chart, detailing the ability of different models to successfully complete a simulated, complex corporate network attack:
“The Last Ones” is, “a 32-step corporate network attack simulation spanning initial reconnaissance through to full network takeover, which AISI estimates to require humans 20 hours to complete.” The lines are the average performance across multiple runs (10 runs for Mythos, Opus 4.6, and GPT-5.4), with the “max” lines representing the best of each batch. Mythos was the only model to complete the task, in 3 out of its 10 attempts.
This chart suggests an interesting security economy: to harden a system we need to spend more tokens discovering exploits than attackers spend exploiting them.
AISI budgeted 100M tokens for each attempt. That’s $12,500 per Mythos attempt, $125k for all ten runs. Worryingly, none of the models given a 100M budget showed signs of diminishing returns. “Models continue making progress with increased token budgets across the token budgets tested,” AISI notes.
If Mythos continues to find exploits so long as you keep throwing money at it, security is reduced to a brutally simple equation: to harden a system you need to spend more tokens discovering exploits than attackers will spend exploiting them.
You don’t get points for being clever. You win by paying more. It is a system that echoes cryptocurrency’s proof of work system, where success is tied to raw computational work. It’s a low temperature lottery: buy the tokens, maybe you find an exploit. Hopefully you keep trying longer than your attackers.
This calculus has a few immediate takeaways:
For those of you who aren’t exposed to AI maximalists, this statement feels absurd. But lately, after the LiteLLM and Axios supply chain scares, many have argued for reimplementing dependency functionality using coding agents.
Classical software engineering would have you believe that dependencies are good (we’re building pyramids from bricks), but imo this has to be re-evaluated, and it’s why I’ve been so growingly averse to them, preferring to use LLMs to “yoink” functionality when it’s simple enough and possible.
If security is purely a matter of throwing tokens at a system, Linus’s law that, “given enough eyeballs, all bugs are shallow,” expands to include tokens. If corporations that rely on OSS libraries spend to secure them with tokens, it’s likely going to be more secure than your budget allows. Certainly, this has complexities: cracking a widely used OSS package is inherently more valuable than hacking a one-off implementation, which incentivizes attackers to spend more on OSS targets.
Second, hardening will be an additional phase for agentic coders.
We’ve already been seeing developers break their process into two steps, development and code review, often using different models for each phase. As this matures, we’re seeing purpose-built tooling meeting this pattern. Anthropic launched a code review product that costs $15-20 per review.
If the above Mythos claims hold, I suspect we’ll see a three phase cycle: development, review, and hardening.
Review: Document, refactor, and other gardening tasks, async, applying best practices with each PR.
Hardening: Identify exploits, autonomously, until the budget runs out.
Critically, human input is the limiter for the first phase and money is the limiter for the last. This quality inherently makes them separate stages (why spend to harden before you have something?). Previously, security audits were rare, discrete, and inconsistent. Now we can apply them constantly, within an optimal (we hope!) budget.
Code remains cheap, unless it needs to be secure. Even if costs go down as inference optimizations, unless models reach the point of diminishing security returns, you still need to buy more tokens than attackers do. The cost is fixed by the market value of an exploit.
...
Read the original on www.dbreunig.com »
We present Darkbloom, a decentralized inference network. AI compute today flows through three layers of markup — GPU manufacturers to hyperscalers to API providers to end users. Meanwhile, over 100 million Apple Silicon machines sit idle for most of each day. We built a network that connects them directly to demand. Operators cannot observe inference data. The API is OpenAI-compatible. Our measurements show up to 70% lower costs compared to centralized alternatives. Operators retain 95% of revenue.
Idle hardware has near-zero marginal cost. That saving passes through to price. OpenAI-compatible API for chat, image generation, and speech-to-text. Every request is end-to-end encrypted.
Open Console ↗
Your Mac already has the hardware. Operators keep 100% of inference revenue. Electricity cost on Apple Silicon runs $0.01–0.03 per hour depending on workload. The rest is profit.
Start Earning ↗
The AI compute market has three layers of margin.
NVIDIA sells GPUs to hyperscalers. AWS, Google, Azure, and CoreWeave mark them up and rent capacity to AI companies. AI companies mark them up again and charge end users per token. Each layer takes a cut. End users pay multiples of what the silicon actually costs to run.
This concentrates both wealth and access. A small number of companies control the supply. Everyone else rents.
Meanwhile, Apple has shipped over 100 million machines with serious ML hardware. Unified memory architectures. 273 to 819 GB/s memory bandwidth. Neural Engines. Machines capable of running 235-billion-parameter models. Most sit idle 18 or more hours a day. Their owners earn nothing from this compute.
That is not a technology problem. It is a marketplace problem.
The pattern is familiar. Airbnb connected idle rooms to travelers. Uber connected idle cars to riders. Rooftop solar turned idle rooftops into energy assets. In each case, distributed idle capacity undercut centralized incumbents on price because the marginal cost was near zero.
Darkbloom does this for AI compute. Idle Macs serve inference. Users pay less because there is no hyperscaler in the middle. Operators earn from hardware they already own. Unlike those other networks, the operator cannot see the user’s data.
of revenue goes to the hardware owner
Other decentralized compute networks connect buyers and sellers. That is the easy part.
The hard part is trust. You are sending prompts to a machine you do not own, operated by someone you have never met. Your company’s internal data. Your users’ conversations. Your competitive advantage, running on hardware in someone else’s house.
No enterprise will do this without guarantees stronger than a terms-of-service document.
Without verifiable privacy, decentralized inference does not work.
We eliminate every software path through which an operator could observe inference data. Four independent layers, each independently verifiable.
Requests are encrypted on the user’s device before transmission. The coordinator routes ciphertext. Only the target node’s hardware-bound key can decrypt.
Each node holds a key generated inside Apple’s tamper-resistant secure hardware. The attestation chain traces back to Apple’s root certificate authority.
The inference process is locked at the OS level. Debugger attachment is blocked. Memory inspection is blocked. The operator cannot extract data from a running process.
Every response is signed by the specific machine that produced it. The full attestation chain is published. Anyone can verify it independently.
encrypted before it leaves your device
↑ operator is here — every path inward is eliminated
The operator runs your inference. They cannot see your data.
Prompts are encrypted before they leave your machine. The coordinator routes traffic it cannot read. The provider decrypts inside a hardened process it cannot inspect. The attestation chain is public.
Read the paper ↗
...
Read the original on darkbloom.dev »
This is not an easy post to write.
When we started Cal.com, we believed deeply in open source. It’s a core principle we built this company around, and something we’ve been incredibly proud of.
Today, we are making the very difficult decision to move to closed source, and there’s one simple reason: security.
AI is changing everything. It’s transforming how we write content, build software, and operate day to day. But what’s talked about far less is how dramatically AI is changing the world of security.
In the past, exploiting an application required a highly skilled hacker with years of experience and a significant investment of time to find and exploit vulnerabilities. The reality is that humans don’t have the time, attention, or patience to find everything.
Today, AI can be pointed at an open source codebase and systematically scan it for vulnerabilities.
Being open source is increasingly like giving attackers the blueprints to the vault. When the structure is fully visible, it becomes much easier to identify weaknesses and exploit them.
In recent months, we’ve seen a wave of AI security startups productizing this capability. Each platform surfaces different vulnerabilities, making it difficult to establish a single, reliable source of truth for what is actually secure.
This uncertainty forced us to make a choice: remain open source and accept increasing risk to customer data, or move to closed source to reduce that risk. It’s not a perfect solution, but we have to do everything we can to protect our users.
At the same time, we still care deeply about open source. That’s why we are releasing a version of our codebase to the community under the MIT license as Cal.diy. While our production codebase has significantly diverged, including major rewrites of core systems like authentication and data handling, we want to ensure there is still a truly open version available for developers, hobbyists, and anyone who wants to explore and experiment.
The risk landscape is accelerating quickly. Advanced AI models are now capable of identifying and exploiting vulnerabilities at unprecedented speed. In one recent example, AI uncovered a 27-year-old vulnerability in the BSD kernel, one of the most widely used and security-focused open source projects, and generated working exploits in a matter of hours.
Continuing as open source would put our application, our customers, and the sensitive data we handle at significant risk. We are taking every step we can to reduce that risk and protect our users, and for now, that means moving to closed source despite how difficult that decision is.
We hope that one day we can return to open source as the security landscape evolves. But for now, we have to put our customers first.
...
Read the original on cal.com »
We are looking for guidance regarding an unexpected €54,000+ Gemini API charge that occurred within a few hours after enabling Firebase AI Logic on an existing Firebase project.
We created the project over a year ago and initially used it only for Firebase Authentication. Recently, we added a simple AI feature (generating a web snippet from a text prompt) and enabled Firebase AI Logic.
Shortly after enabling this, we experienced a sudden and extreme spike in Gemini API usage. The traffic was not correlated with our actual users and appeared to be automated. The activity occurred within a short overnight window and stopped once we disabled the API and rotated credentials.
We had a budget alert (€80) and a cost anomaly alert, both of which triggered with a delay of a few hours
By the time we reacted, costs were already around €28,000
The final amount settled at €54,000+ due to delayed cost reporting
This describes our issue in more detail:
Google API Keys Weren’t Secrets. But then Gemini Changed the Rules. ◆ Truffle…
Google spent over a decade telling developers that Google API keys (like those used in Maps, Firebase, etc.) are not secrets. But that’s no longer true.
We worked with Google Cloud support and provided logs and analysis. The charges were classified as valid usage because they originated from our project, and our request for a billing adjustment was ultimately denied.
This usage was clearly anomalous, not user-driven, and does not reflect intended or meaningful consumption of the service.
Has anyone encountered a similar issue after enabling Firebase AI Logic or Gemini?
Are there recommended safeguards beyond App Check, quotas, and moving calls server-side?
Is there any escalation path we may have missed for cases like this?
Any guidance or shared experience would be greatly appreciated.
Hey @zanbezi ! Sorry to hear about this. A few things:
We have billing account caps rolled out to users of the Gemini API, see: https://ai.google.dev/gemini-api/docs/billing#tier-spend-caps, tier 1 users can spend $250 a month and then are cut off by default (there is a 10 minute delay in all of the reporting)
We now support project spend caps, if you want to set a customer spend cap, you can also do that (I have my account set at $50 so I don’t spend too much accidenlty when building, the same 10 minute delay applies here too): https://ai.google.dev/gemini-api/docs/billing#project-spend-caps
We are moving to disable the usage of unrestricted API keys in the Gemini API, should have more updates there soon.
We now generate Auth keys by default for new users (more secure key which didn’t exist when the Gemini API was originally created a few years ago) and will have more to share there soon.
You should generally avoid putting a key in client side code as if it is exposed, even with the restrictions above you can incur costs.
In many cases, we can automatically detect when a key is visible on the public web and shut down those keys automatically for security reasons (this happened to me personally, I accidentally pushed my API key to the public API docs and it was shut down in minutes).
By default, keys generated in Google AI Studio are restricted to just the Gemini API, no other services are enabled. However keys generated from other parts of Google Cloud have this cross service capability, you can double check keys and make sure they are restricted for just the resource you need.
Pls email me and our team can take a look into this case (Lkilpatrick@google.com), we take this all very serious and have been pushing hard to land all the features mentioned above and more.
We just started the prepaid billing rollout which means you have to pay ahead of time to use the Gemini API, this is rolled out to all new US billing accounts as of yesterday and rolling out globally right now. This is yet another way to give developers more control over their spending / costs and ensure you know what you are signing up for when using the Gemini API.
I hope this helps and sorry for the hassle on this experience, pls email me if there is more to chat about!
Thanks for the detailed response, we really appreciate it. It is good to see that additional safeguards (like spend caps) are being introduced.
I will reach out via email with the details so your team can take a closer look.
Thanks again for taking the time to respond.
Great to see you here Logan. This is the proper way to deal with a fiasco like this one.
...
Read the original on discuss.ai.google.dev »
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.