10 interesting stories served every morning and every evening.
AlphaGenome: AI for better understanding the genome
Introducing a new, unifying DNA sequence model that advances regulatory variant-effect prediction and promises to shed new light on genome function — now available via API.
The genome is our cellular instruction manual. It’s the complete set of DNA which guides nearly every part of a living organism, from appearance and function to growth and reproduction. Small variations in a genome’s DNA sequence can alter an organism’s response to its environment or its susceptibility to disease. But deciphering how the genome’s instructions are read at the molecular level — and what happens when a small DNA variation occurs — is still one of biology’s greatest mysteries.
Today, we introduce AlphaGenome, a new artificial intelligence (AI) tool that more comprehensively and accurately predicts how single variants or mutations in human DNA sequences impact a wide range of biological processes regulating genes. This was enabled, among other factors, by technical advances allowing the model to process long DNA sequences and output high-resolution predictions.
To advance scientific research, we’re making AlphaGenome available in preview via our AlphaGenome API for non-commercial research, and planning to release the model in the future.
We believe AlphaGenome can be a valuable resource for the scientific community, helping scientists better understand genome function, disease biology, and ultimately, drive new biological discoveries and the development of new treatments.
Our AlphaGenome model takes a long DNA sequence as input — up to 1 million letters, also known as base-pairs — and predicts thousands of molecular properties characterising its regulatory activity. It can also score the effects of genetic variants or mutations by comparing predictions of mutated sequences with unmutated ones.
Predicted properties include where genes start and where they end in different cell types and tissues, where they get spliced, the amount of RNA being produced, and also which DNA bases are accessible, close to one another, or bound by certain proteins. Training data was sourced from large public consortia including ENCODE, GTEx, 4D Nucleome and FANTOM5, which experimentally measured these properties covering important modalities of gene regulation across hundreds of human and mouse cell types and tissues.
Animation showing AlphaGenome taking one million DNA letters as input and predicting diverse molecular properties across different tissues and cell types.
The AlphaGenome architecture uses convolutional layers to initially detect short patterns in the genome sequence, transformers to communicate information across all positions in the sequence, and a final series of layers to turn the detected patterns into predictions for different modalities. During training, this computation is distributed across multiple interconnected Tensor Processing Units (TPUs) for a single sequence.
This model builds on our previous genomics model, Enformer and is complementary to AlphaMissense, which specializes in categorizing the effects of variants within protein-coding regions. These regions cover 2% of the genome. The remaining 98%, called non-coding regions, are crucial for orchestrating gene activity and contain many variants linked to diseases. AlphaGenome offers a new perspective for interpreting these expansive sequences and the variants within them.
Our model analyzes up to 1 million DNA letters and makes predictions at the resolution of individual letters. Long sequence context is important for covering regions regulating genes from far away and base-resolution is important for capturing fine-grained biological details.
Previous models had to trade off sequence length and resolution, which limited the range of modalities they could jointly model and accurately predict. Our technical advances address this limitation without significantly increasing the training resources — training a single AlphaGenome model (without distillation) took four hours and required half of the compute budget used to train our original Enformer model.
By unlocking high resolution prediction for long input sequences, AlphaGenome can predict the most diverse range of modalities. In doing so, AlphaGenome provides scientists with more comprehensive information about the complex steps of gene regulation.
In addition to predicting a diverse range of molecular properties, AlphaGenome can efficiently score the impact of a genetic variant on all of these properties in a second. It does this by contrasting predictions of mutated sequences with unmutated ones, and efficiently summarising that contrast using different approaches for different modalities.
Many rare genetic diseases, such as spinal muscular atrophy and some forms of cystic fibrosis, can be caused by errors in RNA splicing — a process where parts of the RNA molecule are removed, or “spliced out”, and the remaining ends rejoined. For the first time, AlphaGenome can explicitly model the location and expression level of these junctions directly from sequence, offering deeper insights about the consequences of genetic variants on RNA splicing.
AlphaGenome achieves state-of-the-art performance across a wide range of genomic prediction benchmarks, such as predicting which parts of the DNA molecule will be in close proximity, whether a genetic variant will increase or decrease expression of a gene, or whether it will change the gene’s splicing pattern.
Bar graph showing AlphaGenome’s relative improvements on selected DNA sequence and variant effect tasks, compared against results for the current best methods in each category.
When producing predictions for single DNA sequences, AlphaGenome outperformed the best external models on 22 out of 24 evaluations. And when predicting the regulatory effect of a variant, it matched or exceeded the top-performing external models on 24 out of 26 evaluations.
This comparison included models specialized for individual tasks. AlphaGenome was the only model that could jointly predict all of the assessed modalities, highlighting its generality. Read more in our preprint.
AlphaGenome’s generality allows scientists to simultaneously explore a variant’s impact on a number of modalities with a single API call. This means that scientists can generate and test hypotheses more rapidly, without having to use multiple models to investigate different modalities.
Moreover AlphaGenome’s strong performance indicates it has learned a relatively general representation of DNA sequence in the context of gene regulation. This makes it a strong foundation for the wider community to build upon. Once the model is fully released, scientists will be able to adapt and fine-tune it on their own datasets to better tackle their unique research questions.
Finally, this approach provides a flexible and scalable architecture for the future. By extending the training data, AlphaGenome’s capabilities could be extended to yield better performance, cover more species, or include additional modalities to make the model even more comprehensive.
It’s a milestone for the field. For the first time, we have a single model that unifies long-range context, base-level precision and state-of-the-art performance across a whole spectrum of genomic tasks.
AlphaGenome’s predictive capabilities could help several research avenues:
Disease understanding: By more accurately predicting genetic disruptions, AlphaGenome could help researchers pinpoint the potential causes of disease more precisely, and better interpret the functional impact of variants linked to certain traits, potentially uncovering new therapeutic targets. We think the model is especially suitable for studying rare variants with potentially large effects, such as those causing rare Mendelian disorders. Synthetic biology: Its predictions could be used to guide the design of synthetic DNA with specific regulatory function — for example, only activating a gene in nerve cells but not muscle cells.Fundamental research: It could accelerate our understanding of the genome by assisting in mapping its crucial functional elements and defining their roles, identifying the most essential DNA instructions for regulating a specific cell type’s function.
For example, we used AlphaGenome to investigate the potential mechanism of a cancer-associated mutation. In an existing study of patients with T-cell acute lymphoblastic leukemia (T-ALL), researchers observed mutations at particular locations in the genome. Using AlphaGenome, we predicted that the mutations would activate a nearby gene called TAL1 by introducing a MYB DNA binding motif, which replicated the known disease mechanism and highlighted AlphaGenome’s ability to link specific non-coding variants to disease genes.
AlphaGenome will be a powerful tool for the field. Determining the relevance of different non-coding variants can be extremely challenging, particularly to do at scale. This tool will provide a crucial piece of the puzzle, allowing us to make better connections to understand diseases like cancer.
AlphaGenome marks a significant step forward, but it’s important to acknowledge its current limitations.
Like other sequence-based models, accurately capturing the influence of very distant regulatory elements, like those over 100,000 DNA letters away, is still an ongoing challenge. Another priority for future work is further increasing the model’s ability to capture cell- and tissue-specific patterns.
We haven’t designed or validated AlphaGenome for personal genome prediction, a known challenge for AI models. Instead, we focused more on characterising the performance on individual genetic variants. And while AlphaGenome can predict molecular outcomes, it doesn’t give the full picture of how genetic variations lead to complex traits or diseases. These often involve broader biological processes, like developmental and environmental factors, that are beyond the direct scope of our model.
We’re continuing to improve our models and gathering feedback to help us address these gaps.
AlphaGenome is now available for non-commercial use via our AlphaGenome API. Please note that our model’s predictions are intended only for research use and haven’t been designed or validated for direct clinical purposes.
Researchers worldwide are invited to get in touch with potential use-cases for AlphaGenome and to ask questions or share feedback through the community forum.
We hope AlphaGenome will be an important tool for better understanding the genome and we’re committed to working alongside external experts across academia, industry, and government organizations to ensure AlphaGenome benefits as many people as possible.
Together with the collective efforts of the wider scientific community, we hope it will deepen our understanding of the complex cellular processes encoded in the DNA sequence and the effects of variants, and drive exciting new discoveries in genomics and healthcare.
We would like to thank Juanita Bawagan, Arielle Bier, Stephanie Booth, Irina Andronic, Armin Senoner, Dhavanthi Hariharan, Rob Ashley, Agata Laydon and Kathryn Tunyasuvunakool for their help with the text and figures. This work was done thanks to the contributions of the AlphaGenome co-authors: Žiga Avsec, Natasha Latysheva, Jun Cheng, Guido Novati, Kyle R. Taylor, Tom Ward, Clare Bycroft, Lauren Nicolaisen, Eirini Arvaniti, Joshua Pan, Raina Thomas, Vincent Dutordoir, Matteo Perino, Soham De, Alexander Karollus, Adam Gayoso, Toby Sargeant, Anne Mottram, Lai Hong Wong, Pavol Drotár, Adam Kosiorek, Andrew Senior, Richard Tanburn, Taylor Applebaum, Souradeep Basu, Demis Hassabis and Pushmeet Kohli.We would also like to thank Dhavanthi Hariharan, Charlie Taylor, Ottavia Bertolli, Yannis Assael, Alex Botev, Anna Trostanetski, Lucas Tenório, Victoria Johnston, Richard Green, Kathryn Tunyasuvunakool, Molly Beck, Uchechi Okereke, Rachael Tremlett, Sarah Chakera, Ibrahim I. Taskiran, Andreea-Alexandra Muşat, Raiyan Khan, Ren Yi and the greater Google DeepMind team for their support, help and feedback.
...
Read the original on deepmind.google »
WASHINGTON (AP) — The U. S. economy shrank at a 0.5% annual pace from January through March as President Donald Trump’s trade wars disrupted business, the Commerce Department reported Thursday in an unexpected deterioration of earlier estimates.
First-quarter growth was weighed down by a surge of imports as U. S. companies, and households, rushed to buy foreign goods before Trump could impose tariffs on them. The Commerce Department previously estimated that the economy fell 0.2% in the first quarter. Economists had forecast no change in the department’s third and final estimate.
The January-March drop in gross domestic product — the nation’s output of goods and services — reversed a 2.4% increase in the last three months of 2024 and marked the first time in three years that the economy contracted. Imports expanded 37.9%, fastest since 2020, and pushed GDP down by nearly 4.7 percentage points.
Consumer spending also slowed sharply, expanding just 0.5%, down from a robust 4% in the fourth-quarter of last year. It is a significant downgrade from the Commerce Department’s previous estimate.
Consumers have turned jittery since Trump started plastering big taxes on imports, anticipating that the tariffs will impact their finances directly.
And the Conference Board reported this week that Americans’ view of the U. S. economy worsened in June, resuming a downward slide that had dragged consumer confidence in April to its lowest level since the COVID-19 pandemic five years ago.
The Conference Board said Tuesday that its consumer confidence index slid to 93 in June, down 5.4 points from 98.4 last month. A measure of Americans’ short-term expectations for their income, business conditions and the job market fell 4.6 points to 69. That’s well below 80, the marker that can signal a recession ahead.
Former Federal Reserve economist Claudia Sahm said “the downward revision to consumer spending today is a potential red flag.’′ Sahm, now chief economist at New Century Advisors, noted that Commerce downgraded spending on recreation services and foreign travel — which could have reflect ”great consumer pessimism and uncertainty.’′
A category within the GDP data that measures the economy’s underlying strength rose at a 1.9% annual rate from January through March. It’s a decent number, but down from 2.9% in the fourth quarter of 2024 and from the Commerce Department’s previous estimate of 2.5% January-March growth.
This category includes consumer spending and private investment but excludes volatile items like exports, inventories and government spending.
And federal government spending fell at a 4.6% annual pace, the biggest drop since 2022.
In another sign that Trump’s policies are disrupting trade,
Trade deficits reduce GDP. But that’s just a matter of mathematics. GDP is supposed to count only what’s produced domestically, not stuff that comes in from abroad. So imports — which show up in the GDP report as consumer spending or business investment — have to be subtracted out to keep them from artificially inflating domestic production.
The first-quarter import influx likely won’t be repeated in the April-June quarter and therefore shouldn’t weigh on GDP. In fact, economists expect second-quarter growth to bounce back to 3% in the second quarter, according to a survey of forecasters by the data firm FactSet.
The first look at April-June GDP growth is due July 30.
This story has been corrected to show that the drop in federal spending was the biggest since 2022, not 1986.
...
Read the original on apnews.com »
The first Gemma model launched early last year and has since grown into a thriving Gemmaverse of over 160 million collective downloads. This ecosystem includes our family of over a dozen specialized models for everything from safeguarding to medical applications and, most inspiringly, the countless innovations from the community. From innovators like Roboflow building enterprise computer vision to the Institute of Science Tokyo creating highly-capable Japanese Gemma variants, your work has shown us the path forward. Building on this incredible momentum, we’re excited to announce the full release of Gemma 3n. While last month’s preview offered a glimpse, today unlocks the full power of this mobile-first architecture. Gemma 3n is designed for the developer community that helped shape Gemma. It’s supported by your favorite tools including Hugging Face Transformers, llama.cpp, Google AI Edge, Ollama, MLX, and many others, enabling you to fine-tune and deploy for your specific on-device applications with ease. This post is the developer deep dive: we’ll explore some of the innovations behind Gemma 3n, share new benchmark results, and show you how to start building today.Gemma 3n represents a major advancement for on-device AI, bringing powerful multimodal capabilities to edge devices with performance previously only seen in last year’s cloud-based frontier models.
Link to Youtube Video
(visible only when JS is disabled)
Multimodal by design: Gemma 3n natively supports image, audio, video, and text inputs and text outputs.Optimized for on-device: Engineered with a focus on efficiency, Gemma 3n models are available in two sizes based on effective parameters: E2B and E4B. While their raw parameter count is 5B and 8B respectively, architectural innovations allow them to run with a memory footprint comparable to traditional 2B and 4B models, operating with as little as 2GB (E2B) and 3GB (E4B) of memory.Groundbreaking architecture: At its core, Gemma 3n features novel components like the MatFormer architecture for compute flexibility, Per Layer Embeddings (PLE) for memory efficiency, LAuReL and AltUp for architectural efficiency, and new audio and MobileNet-v5 based vision encoders optimized for on-device use cases.Enhanced quality: Gemma 3n delivers quality improvements across multilinguality (supporting 140 languages for text and multimodal understanding of 35 languages), math, coding, and reasoning. The E4B version achieves an LMArena score over 1300, making it the first model under 10 billion parameters to reach this benchmark.
Achieving this leap in on-device performance required rethinking the model from the ground up. The foundation is Gemma 3n’s unique mobile-first architecture, and it all starts with MatFormer.At the core of Gemma 3n is the MatFormer (🪆Matryoshka Transformer) architecture, a novel nested transformer built for elastic inference. Think of it like Matryoshka dolls: a larger model contains smaller, fully functional versions of itself. This approach extends the concept of Matryoshka Representation Learning from just embeddings to all transformer components.
During the MatFormer training of the 4B effective parameter (E4B) model, a 2B effective parameter (E2B) sub-model is simultaneously optimized within it, as shown in the figure above. This provides developers two powerful capabilities and use cases today:1: Pre-extracted models: You can directly download and use either the main E4B model for the highest capabilities, or the standalone E2B sub-model which we have already extracted for you, offering up to 2x faster inference.2: Custom sizes with Mix-n-Match: For more granular control tailored to specific hardware constraints, you can create a spectrum of custom-sized models between E2B and E4B using a method we call Mix-n-Match. This technique allows you to precisely slice the E4B model’s parameters, primarily by adjusting the feed forward network hidden dimension per layer (from 8192 to 16384) and selectively skipping some layers. We are releasing the MatFormer Lab, a tool that shows how to retrieve these optimal models, which were identified by evaluating various settings on benchmarks like MMLU.
MMLU scores for the pre-trained Gemma 3n checkpoints at different model sizes (using Mix-n-Match)
Looking ahead, the MatFormer architecture also paves the way for elastic execution. While not part of today’s launched implementations, this capability allows a single deployed E4B model to dynamically switch between E4B and E2B inference paths on the fly, enabling real-time optimization of performance and memory usage based on the current task and device load.Gemma 3n models incorporate Per-Layer Embeddings (PLE). This innovation is tailored for on-device deployment as it dramatically improves model quality without increasing the high-speed memory footprint required on your device’s accelerator (GPU/TPU).While the Gemma 3n E2B and E4B models have a total parameter count of 5B and 8B respectively, PLE allows a significant portion of these parameters (the embeddings associated with each layer) to be loaded and computed efficiently on the CPU. This means only the core transformer weights (approximately 2B for E2B and 4B for E4B) need to sit in the typically more constrained accelerator memory (VRAM).
With Per-Layer Embeddings, you can use Gemma 3n E2B while only having ~2B parameters loaded in your accelerator.
Processing long inputs, such as the sequences derived from audio and video streams, is essential for many advanced on-device multimodal applications. Gemma 3n introduces KV Cache Sharing, a feature designed to significantly accelerate time-to-first-token for streaming response applications.KV Cache Sharing optimizes how the model handles the initial input processing stage (often called the “prefill” phase). The keys and values of the middle layer from local and global attention are directly shared with all the top layers, delivering a notable 2x improvement on prefill performance compared to Gemma 3 4B. This means the model can ingest and understand lengthy prompt sequences much faster than before.Gemma 3n uses an advanced audio encoder based on the Universal Speech Model (USM). The encoder generates a token for every 160ms of audio (about 6 tokens per second), which are then integrated as input to the language model, providing a granular representation of the sound context.Automatic Speech Translation (AST): Translate spoken language into text in another language.We’ve observed particularly strong AST results for translation between English and Spanish, French, Italian, and Portuguese, offering great potential for developers targeting applications in these languages. For tasks like speech translation, leveraging Chain-of-Thought prompting can significantly enhance results. Here’s an example:
user
Transcribe the following speech segment in Spanish, then translate it into English:
At launch time, the Gemma 3n encoder is implemented to process audio clips up to 30 seconds. However, this is not a fundamental limitation. The underlying audio encoder is a streaming encoder, capable of processing arbitrarily long audios with additional long form audio training. Follow-up implementations will unlock low-latency, long streaming applications.Alongside its integrated audio capabilities, Gemma 3n features a new, highly efficient vision encoder, MobileNet-V5-300M, delivering state-of-the-art performance for multimodal tasks on edge devices.Designed for flexibility and power on constrained hardware, MobileNet-V5 gives developers:Multiple input resolutions: Natively supports resolutions of 256x256, 512x512, and 768x768 pixels, allowing you to balance performance and detail for your specific applications.Broad visual understanding: Co-trained on extensive multimodal datasets, it excels at a wide range of image and video comprehension tasks.High throughput: Processes up to 60 frames per second on a Google Pixel, enabling real-time, on-device video analysis and interactive experiences.This level of performance is achieved with multiple architectural innovations, including:An advanced foundation of MobileNet-V4 blocks (including Universal Inverted Bottlenecks and Mobile MQA).A significantly scaled up architecture, featuring a hybrid, deep pyramid model that is 10x larger than the biggest MobileNet-V4 variant.A novel Multi-Scale Fusion VLM adapter that enhances the quality of tokens for better accuracy and efficiency.
Benefiting from novel architectural designs and advanced distillation techniques, MobileNet-V5-300M substantially outperforms the baseline SoViT in Gemma 3 (trained with SigLip, no distillation). On a Google Pixel Edge TPU, it delivers a 13x speedup with quantization (6.5x without), requires 46% fewer parameters, and has a 4x smaller memory footprint, all while providing significantly higher accuracy on vision-language tasksWe’re excited to share more about the work behind this model. Look out for our upcoming MobileNet-V5 technical report, which will deep dive into the model architecture, data scaling strategies, and advanced distillation techniques.Making Gemma 3n accessible from day one has been a priority. We’re proud to partner with many incredible open source developers to ensure broad support across popular tools and platforms, including contributions from teams behind AMD, Axolotl, Docker, Hugging Face, llama.cpp, LMStudio, MLX, NVIDIA, Ollama, RedHat, SGLang, Unsloth, and vLLM.But this ecosystem is just the beginning. The true power of this technology is in what you will build with it. That’s why we’re launching the Gemma 3n Impact Challenge. Your mission: use Gemma 3n’s unique on-device, offline, and multimodal capabilities to build a product for a better world. With $150,000 in prizes, we’re looking for a compelling video story and a “wow” factor demo that shows real-world impact. Join the challenge and help build a better future.Ready to explore the potential of Gemma 3n today? Here’s how:Experiment directly: Use Google AI Studio to try Gemma 3n in just a couple of clicks. Gemma models can also be deployed directly to Cloud Run from AI Studio.Download the models: Find the model weights on Hugging Face and Kaggle.Learn & integrate: Dive into our comprehensive documentation to quickly integrate Gemma into your projects or start with our inference and fine-tuning guides.Build with your favorite on-device AI tools: Google AI Edge Gallery/LiteRT-LLM, Ollama, MLX, llama.cpp, Docker, transformers.js and more.Use your favorite development tools: Leverage your preferred tools and frameworks, including Hugging Face Transformers and TRL, NVIDIA NeMo Framework, Unsloth, and LMStudio.Deploy your way: Gemma 3n offers multiple deployment options, including Google GenAI API, Vertex AI, SGLang, vLLM, and NVIDIA API Catalog.
Unlock deeper insights with the new Python client library for Data Commons
Multilingual innovation in LLMs: How open models help unlock global communication
Supercharge your notebooks: The new AI-first Google Colab is now available to everyone
...
Read the original on developers.googleblog.com »
In the nineteenth century, the invention of anesthesia was considered a gift from God. But post-operative pain relief has continued to rely on opioids, derivatives of opium, the addictive substance employed since ancient times. Although no other drug has managed to match the rapid, potent, and broadly effective relief delivered by opioids, their side effects have led to decades of addiction and overdose, leaving researchers keen to find a better solution.
This all changed in January 2025, when the FDA approved Vertex Pharmaceuticals’s Journavx (suzetrigine): the first non-opioid pain reliever suitable for treating post-surgery pain. Clinical trials found no signs of the problematic side effects associated with opioids: no drug abuse, tolerance, or withdrawal. But this was not an easy win: Vertex and other pharma companies spent decades searching for drugs like this to no avail.
Opioids are used primarily to treat nociceptive pain, pain caused by tissue damage from injury or disease. This damage activates nearby nociceptors: sensory neurons that signal physical or chemical harm. These nociceptors send signals up to the central nervous system — the brain and spinal cord — and the brain then creates a localized sensation of pain, drawing your attention to the threat.
Traditional opioids mimic opium, a compound found in the poppy plant that contains morphine. Opioids alleviate pain by acting on one of the three main opioid receptors, mu (μ) opioid receptors, which are distributed throughout the central nervous system, particularly in the brain. When opioids bind to the brain’s mu receptors, this suppresses incoming pain signals from the damaged site’s nociceptors, preventing the brain from creating the sensation of pain even when tissue damage is present.
Our bodies naturally produce their own opioids — such as endorphins, endogenous morphine — to briefly blunt pain during moments of stress or injury. However, these are far weaker and shorter-acting than prescription opioids since they degrade quickly, remain localized, and are released in short, controlled bursts. Prescription opioids, on the other hand, flood the brain with higher doses that linger for hours.
Crucially, opioids don’t just kill pain: they also incite pleasure. When the mu opioid receptors present in the reward center of the brain are activated, this reduces the secretion of a neurotransmitter called GABA, which works to inhibit dopamine-producing neurons. As GABA release declines, dopamine spikes, lighting up the reward center and inducing pleasure.
With the body’s natural opioids, this is fleeting and unproblematic. When properly prescribed, even synthetic opioids are no issue for most patients: under severe post-surgical pain, opioids mostly function to normalize disrupted brain function, dampening any pleasurable effect. But for some, whether due to genetics or inappropriate administration (e.g. a prescription that goes on after the pain’s source has been relieved), the intensity of prescription opioids produces a prolonged dopamine spike, along with a marked sensation of euphoria: a recipe for addiction.
With chronic use, the body’s natural opioid system becomes dysfunctional. Fewer natural opioids are produced and opioid receptors become desensitized. As a result, the patient develops a tolerance, requiring higher and higher doses to even feel normal.
The nineteenth century witnessed the creation of morphine, codeine, and heroin (which was sold over-the-counter), as well as the invention of the hypodermic syringe. By the turn of the century, 15 percent of all prescriptions dispensed in Boston were for opioids, which were used for everything from menstrual cramps to children’s coughs, and as many as 300,000 Americans, or 0.5 percent of the population, were opiate addicts. Anti-narcotics laws proliferated throughout the states, and the medical community expressed concerns about the liberal provision of addictive drugs. These mounting pressures led to the passage of the Harrison Narcotic Act in 1914, which made opium and opiates the first regulated substances in the United States.
Unlike opioids, which act within the central nervous system, Journavx does not meaningfully interact with the brain. Instead, it targets a specific sodium ion channel found almost exclusively on peripheral nociceptors, the pain-sensing neurons throughout your body. Ion channels, whether sodium, potassium, or calcium, are like tiny doors embedded on the neuron’s membrane: when a door opens, ions rush in or out and the neuron fires, sending an electrical signal to the next cell.
Three sodium channels are found primarily on nociceptors: NaV1.7, NaV1.8, and NaV1.9. Suzetrigine selectively blocks NaV1.8, which stops nociceptors from sending pain signals to the brain. Rather than preventing your brain from receiving pain signals, as opioids do, it prevents your neurons from transmitting them. In essence, Journavx works from the bottom up to alleviate pain, rather than the top down.
Critically, the NaV1.8 channel is largely absent from the central nervous system. This means that suzetrigine does not affect the brain, which means users do not experience the same euphoria that is triggered by opioids. This prevents addiction and abuse, as well as the depressive effects on breathing or heart rate typical with opioids.
At first glance, this may seem like a straightforward solution, especially given the urgent demand for non-opioid alternatives. So why did it take so long?
Unlike diseases with well-defined biological causes, pain is a broad symptom rooted in complex and overlapping pathways. Many of these are deeply intertwined with vital bodily functions like blood pressure, immune response, and respiration. Together, this makes it difficult to isolate a target that can be blocked without collateral damage.
A particularly good example of this predicament involves TRPV1, also known as the capsaicin receptor. It is an ion channel mainly found in nociceptors, and is responsible for the pain you feel when eating spicy foods. In clinical trials, TRPV1 inhibitors effectively alleviated pain, but researchers found that they unexpectedly disrupted thermoregulation, causing patients to experience hyperthermia, or overheating, with one trial participant sustaining a 104 degrees fahrenheit fever for hours.
Another example involves nerve growth factor inhibitors like tanezumab. Although tanezumab alleviated inflammatory pain from conditions like osteoarthritis, Phase III trials revealed an unfortunate side effect: rapidly progressive osteoarthritis. Researchers hypothesized that because patients felt so much better, they overused their arthritic joints, accelerating damage. Although further trials were conducted at lower doses and with restrictions, the FDA ultimately voted against its approval. Tanezumab’s story reflects a difficulty in developing painkillers: while pain can cause excessive suffering, it also serves as a vital warning sign that must be selectively maintained.
Vertex has historically focused on developing drugs targeting ion channels. These channels play a major role in cellular signaling, meaning that compounds that act upon them can produce large, rapid physiological effects. Ion channels are ‘really good drug targets’, Paul Negulescu, head of Vertex’s pain program, says, ‘They just require a lot of care and attention to how you measure them’.
The discovery of the NaV sodium channels, made independently in the early 2000s by two different researchers, opened a new frontier in pain research. Both observed that mutations affecting NaV1.7 caused abnormalities in the experience of pain, a major clue that pain might be mediated through that specific sodium channel.
Stephen Waxman, a professor of neurology, neuroscience, and pharmacology at Yale’s medical school, discovered that a community in Alabama had numerous individuals suffering from erythromelalgia or ‘Man on Fire’ syndrome. These individuals experienced mild warmth — such as from wearing a sweater or shoes — as intense burning pain. Waxman’s research tied this phenomenon to mutations in the SCN9A gene, which is involved in the production of NaV1.7 channels. Meanwhile, Geoff Woods, a clinical geneticist at St. James’s University Hospital in Leeds, uncovered a complementary discovery. He observed congenital insensitivity to pain within specific Pakistani communities, also tracing it back to mutations in the SCN9A gene.
This congenital insensitivity provided a particularly compelling genetic validation for a drug target, as the affected individuals were entirely normal except for their inability to feel pain, unlike prior similar cases. Related channels like NaV1.8 and NaV1.9 were also investigated by Woods’s team and found relevant for pain signaling.
But despite the initial enthusiasm surrounding these discoveries, researchers soon encountered significant obstacles: NaV1.7 inhibitors failed to relieve pain during clinical trials. Researchers eventually uncovered that the congenital absence of NaV1.7 did not eliminate pain signals but instead amplified the production of natural painkillers called enkephalins. They concluded that completely blocking the channel, which would be required to replicate this effect pharmaceutically, was impractical.
So researchers turned their attention to the other promising sodium channel: NaV1.8. Again, research began with setbacks: in 2015, it was discovered that individuals with Brugada syndrome, a disorder characterized by abnormal heart rhythms and sudden cardiac death, also had mutations in the gene encoding NaV1.8.
Despite this challenge, researchers still thought NaV1.8 had potential. Woods’ research genetically validated it, showing that mutations in NaV1.8 affect pain signalling. Researchers at the University of Alcalá confirmed that mice genetically engineered to lack Nav1.8 channels showed virtually no spontaneous nerve activity after injury — activity thought to underlie certain chronic pains. Additionally, NaV1.8′s almost exclusive presence in the peripheral nervous system (rather than in the brain) suggested that it might uniquely limit undesirable central side effects.
As Vertex researchers searched for NaV1.8 inhibitors, they made use of Negulescu’s E-VIPR technology, which enabled them to conduct more than 50,000 tests per day to identify compounds that blocked NaV1.8 without affecting other ion channels. This was essential because the human body contains nine known voltage-gated sodium channel types, each with a distinctive ‘personality’ — a unique pattern of rapid opening, closing, and voltage sensitivity — making high throughput key to pinpointing an appropriately selective drug.
But even with this tool, Negulescu described the iterative learning process as ‘painful’. Vertex spent a decade screening millions of compounds before finding a promising class of molecules. Another decade was spent on optimization, conducting tens of thousands of screenings to maximize potency and selectivity (a drug is selective if it binds only to the target proteins and nowhere else).
Vertex faced several failures in preclinical and clinical testing. Between 2018 and 2022, they terminated development for three generations of NaV1.8 inhibitors, VX-150, VX-128, and VX-961, due to dosing and tolerability issues. However, unlike previous attempts with NaV1.7, TRPV1, and nerve growth factor inhibitors, the pathway overall did not exhibit fatal flaws, and so research continued.
Eventually, this iterative process produced VX-548, which was discovered to be many times more selective and potent than earlier candidates. In 2022, two Phase II proof-of-concept studies yielded positive results. In 2024, Phase III trials validated VX-548’s efficacy in treating acute pain with minimal adverse effects. During this process, the FDA granted VX-548, now suzetrigine, Fast Track and Breakthrough Therapy designations, processes designed to accelerate the development and review of crucial pharmaceutical innovations.
On July 30, 2024, the FDA accepted Vertex’s New Drug Application, filing it under priority review. Exactly six months later, on January 30, 2025, it was approved, marking suzetrigine — sold under the brand name Journavx — as the first non-opioid analgesic for treating acute pain.
Journavx isn’t a silver bullet. It has not yet been tested or approved for treating chronic pain, from which over 20 percent of Americans suffer. Across its clinical trials, 85 and 98 percent of participants were female. This reflects a broader pattern in painkiller trials, which often rely on surgical models like bunionectomy and abdominoplasty (‘tummy tuck’) — procedures overwhelmingly performed on women. Since 2022, the bipartisan ‘No Pain Act’ has mandated Medicare and other government health plans to cover this class of medication in outpatient surgical settings, private insurance coverage is still in flux: without insurance, a week’s worth of Journavx costs around $230, compared to $10–20 for a low-dose opioid-acetaminophen combination medication.
Journavx failed to outperform that opioid-acetaminophen combination in clinical trials. Todd Bertoch, an anesthesiologist involved in suzetrigine’s Phase III trials, explains that the drug likely won’t serve as an outright opioid replacement, but the first step on the journey to minimizing opioid usage. If paracetamol and ibuprofen are inadequate for pain relief, Journavx can now be prescribed as the next alternative treatment, instead of mild- to moderate-strength opioids.
It will almost certainly improve: Vertex’s scientists are continuing their decades-long project to iterate and screen for even more potent and selective NaV1.8 blockers. They are also investigating complementarities with NaV1.7 inhibitors. A Phase III clinical trial of suzetrigine for diabetic peripheral neuropathy, which involves chronic pain, is currently underway.
Journavx is the product of 27 years, billions of dollars, millions of molecules screened, dozens of monkeys and rats and data from over 2,400 surgical patients, all distilled into a single 50-mg blue tablet.
Vertex chose to keep funding and pushing forward through decades of work that industry professionals describe as ‘tedious’, ‘mind-numbing’, and ‘painstaking’, a slog driven by slow, incremental progress and frequent setbacks. In exchange, humanity now has its first non-opioid painkiller.
Michelle Ma studies economics at the University of Chicago.
...
Read the original on www.worksinprogress.news »
To use the Mastodon web application, please enable JavaScript. Alternatively, try one of the native apps for Mastodon for your platform.
...
Read the original on oldbytes.space »
Snow emulates classic (Motorola 680x0-based) Macintosh computers. It features a graphical user interface to operate the emulated machine and provides extensive debugging capabilities. The aim of this project is to emulate the Macintosh on a hardware-level as much as possible, as opposed to emulators that patch the ROM or intercept system calls.
It currently emulates the Macintosh 128K, Macintosh 512K, Macintosh Plus, Macintosh SE, Macintosh Classic and Macintosh II.
The emulator is written in Rust and released as open source, licensed under the MIT license.
There is a limited online demo available (only the emulated machine, no user interface or other functionality from the full software).
To get set up or for further information, check the online documentation.
Currently, only bleeding edge builds are available. These get generated automatically as work progresses on the emulator.
* Bug reports can be filed on GitHub issues.
* For support or just a chat, join the #snow channel on MartyPC and Friends Discord.
...
Read the original on snowemu.com »
In 2024 and 2025, I served for six months as an international volunteer on a first-person view attack drone team in the Armed Forces of Ukraine. My team was deployed in the Donbas region, in one of the hottest sectors of the front. When I joined the team, I was excited to work with a cutting-edge tool. By the end of my deployment, I was a bit disillusioned. Let me tell you why.
First-person view drones are unmanned aerial vehicles with four propellers located at the four corners of the craft, roughly in the shape of a square of seven to 12 inches in length on each side. They are controlled by an operator wearing virtual-reality goggles that receive the image from the drone’s forward-facing camera (hence the name first-person view). The most common types of first-person view drones are single-use: They fly directly into their target, where they detonate an explosive charge of up to 1.5 kilograms. These drones are touted as a cheap and accessible solution that can give troops on the tactical level their own organic precision-strike capability. They can supposedly react quickly and strike moving targets or targets in difficult-to-reach locations, such as bunkers, basements, or inside buildings. Proponents of first-person view drones often repeat the claim that as much as 60 to 70 percent of all battlefield casualties in the Russo-Ukrainian War are now caused by drones. This statistic is probably broadly accurate, though it does not differentiate between casualties caused by first-person view drones and other types of uncrewed aerial systems.
Some authors, including experienced military officers writing in these pages, go even further and claim that first-person view drones will precipitate a revolution in how wars are fought, akin to the introduction of muskets. Among other things, they will make concealment and the massing of troops and equipment in the combat zone nearly impossible. Any concentration of troops or vehicles will supposedly be observed immediately and butchered by swarms of cheap, fast drones. Proponents of drones, especially in Silicon Valley, have claimed that drones might completely replace artillery.
Whether or not we believe these far-reaching claims, we’ve certainly all seen the videos on social media of these drones performing impressive, highly precise attacks. We’ve seen them striking a Russian tank on the move, flying through the open back hatch of an infantry fighting vehicle, or entering a building to surprise the enemy, sometimes literally, with their pants down. But those impressive strikes are rare exceptions. The cases when first-person view drones actually do that are few and far between.
During my time in Ukraine, I collected statistics on the success of our drone operations. I found that 43 percent of our sorties resulted in a hit on the intended target in the sense that the drone was able to successfully fly all the way to the target, identify it correctly, hit it, and the drone’s explosive charge detonated as it was supposed to. This number does not include instances when our higher command requested a sortie but we had to decline because we knew that we could not strike the target for reasons such as weather, technical problems, or electronic interference. If this type of pre-aborted mission is included in the total, the success rate drops to between 20 and 30 percent. On the face of it, this success rate is bad, but that is not the whole story.
I began to notice that the vast majority of our sorties were against targets that had already been struck successfully by a different weapons system, most commonly by a mortar or by a munition dropped by a reusable drone (in other words, not a first-person view drone). Put differently, the goal of the majority of our missions was to deliver the second tap in a double-tap strike against a target that had already been successfully prosecuted by a different weapons system. The proportion of missions when we successfully carried out a task that only a first-person view drone can fulfill — delivering a precision strike on a target that could not be hit by other means — was in the single-digit percent.
There are two reasons why these drones rarely successfully do what they were designed to do. The first has to do with how commanders choose to employ first-person view drones. Presumably, our commanders decided that they had first-person view drones as a capability, so they might as well use them, even if there were other weapons systems that could also do the job. There is a certain logic to this, and the commanders were not paying for the expended drones out of their own pockets. They were more focused on the immediate mission. While first-person view drones are cheap, they are usually not the cheapest option available to commanders. This is the problem with using them in double-tap strikes or for missions that can be achieved by other systems. One of these drone sorties costs about $500 in materiel. A mortar shell costs less than $100. A munition dropped from a reusable drone, usually also something like a modified mortar shell or 40-millimeter grenade, also costs less than $100.
The second reason why these drones rarely do what they were designed to do is technical. They are finicky, unreliable, hard to use, and susceptible to electronic interference. Few first-person view drones have night-vision capability. Those that do are in short supply and cost twice as much as the base model. In Ukraine, in the winter, it’s dark for 14 hours a day. Wind, rain, snow, and fog all mean a drone cannot fly.
A solid quarter of all these drones have some sort of technical fault that prevents them from taking off. This is usually discovered only when they are being prepped for launch. The most common is a fault in the radio receiver that receives inputs from the control panel, or in the video transmitter that transmits the signal to the operator’s virtual-reality goggles. Sometimes this fault can be fixed through a software update in the field. Often, it cannot. Many faulty drones are simply cannibalized for spare parts, because there is no better use for them. Even once a drone is airborne, batteries often die mid-flight. In about 10 percent of sorties, the drone hits the target, but its warhead does not detonate.
Once airborne, operating a first-person view drone successfully is not easy. These drones were originally designed to be toys for rich people. Before they were press-ganged into service as tools of war, they were used either in aerobatic displays or in races where a group of operators would compete in flying through an obstacle course. In either case, the drones were not meant to be easy to fly. They were meant to be highly maneuverable, but also unstable. First-person view drones cannot really hover, fly slowly, or linger above a target. The assumption among hobbyists is that enthusiasts will invest the time and money to become proficient at flying. As a result, training a highly proficient operator can take months. A standard, base-level course for Ukrainian drone pilots takes about five weeks. The quality of operators it prepares is questionable, and graduates of the course need extra on-the-job experience to become truly proficient. Most drone pilots I encountered did not go through this course. Instead, they learned to fly drones on the job. Even experienced operators routinely miss their targets and crash into trees, power lines, or other obstacles.
To keep costs down, the first-person view drones used by Ukrainian forces have no navigational aids, such as a compass, a GPS receiver (though it should be noted that using GPS often would not be possible anyway due to widespread GPS signal jamming), or an inertial navigation system. The operator relies on their knowledge of the local terrain and on verbal instructions from a navigator, who usually has access to the video from the first-person view drone itself and from other reconnaissance assets that are tracking the target.
But the greatest obstacle to the successful use of these drones by far is the unreliability of the radio link between the operator and the drone. One of the reasons why hitting a target at ground level with precision is difficult is that when first-person view drones get close to the ground, due to obstacles, they start to lose their radio connection to the operator, often located up to 10 kilometers away. In some cases, drones cannot attack a target if it is simply on the wrong side of a tall building or hill because the building or hill blocks the line of sight between the drone and the operator. Sometimes, the operator can work around the loss of signal close to the ground by climbing, pointing the drone at the target, and hoping inertia will take it to its target once they have lost control. When striking a small target like a doorway, a window, or the entrance to a basement, this degrades precision significantly.
Drones also operate in a cluttered segment of the electromagnetic spectrum. First-person view drones use unencrypted analog radio signals, and in hot parts of the front, as many as a dozen drone teams may be competing for use of a handful of frequencies (a consequence of using cheaper components). This results in the need for sophisticated de-confliction procedures that, quite simply, do not always work. Even when de-confliction works, sometimes a team must wait as long as half an hour for a frequency to become available before takeoff. If it does not work and two drones find themselves in the air on the same channel at the same time, they will interfere with each other’s signals, usually resulting in a crash. On top of that, the enemy’s drones also fly on the same frequencies, which can also result in interference and a crash. Interference from another drone, whether friendly or hostile, resulted in the failure of at least three percent of our missions.
In addition to interference and the physical limitations of radio communication, first-person view drones are also highly susceptible to electronic-warfare jamming. Both sides of the Russo-Ukrainian War make extensive use of jamming. When our side turned on its jammers, they usually informed us in advance. That meant our drones simply could not take off, sometimes for a period of several hours. About three percent of our sorties failed because we did not get advanced warning that our own jamming systems would be operational, causing our drones to fall out of the sky. On top of that, sometimes, even the best efforts at de-confliction were not enough, simply because Ukrainian infantry or individual vehicles are often equipped with small portable jammers. When they heard a drone, they simply activated the jammer without waiting to find out whether the drone was friendly or not.
Of course, when the other side activated its jammers, we got no advance warning whatsoever. Enemy electronic warfare downed a full 31 percent of our sorties. This number could have been lower, but for our command’s occasional stubborn insistence that we fly even though it was almost certain that enemy jammers were operating in the target area. When enemy jammers were operating, the enemy’s own drones also could not fly, putting them in the same dilemma that our side also suffered. Nevertheless, when jammers were available and switched on, first-person view operations became effectively impossible.
Some of the problems with first-person view drones will eventually be resolved as technology matures. Better production standards will ensure that a larger percentage of drones actually take off. In Ukraine, there are countless assembly lines that build drones from cheap, off-the-shelf components sourced from dubious suppliers. A single unit often sources its drones from numerous organizations, each with its own production processes. More standardization, better quality control, and less reliance on cheap components could improve reliability. Better transmitters and receivers that are more resistant to interference will improve the connection between drone and operator. Digital signal transmission and frequency hopping are starting to appear in some first-person view drones, though these are still rare. Putting re-translators that amplify the drone’s signal on a second drone that hovers somewhere between the operator and the first-person view drone can also improve the quality of the connection. Improved and standardized procedures for training operators would cut down the time needed to become proficient.
To be sure, the technology has already evolved since I left the battlefield. Today, some Ukrainian and Russian units are also using drones controlled by fiber-optic cable, rather than radio, though I had no personal experience with this type of drone in my unit. This technology is often touted as the next step in the evolution of drone warfare. It would seem to address some of the major problems with radio-controlled drones I experienced, and compared to radio-controlled drones, fiber-optic drones may indeed have a number of advantages. Fiber optics make jamming impossible and deconflicting frequencies unnecessary. The absence of an energy-guzzling radio transmitter can extend battery life and even allow for some innovative tactics, such as landing the drone next to a road and waiting for several hours until a vehicle passes by.
Fiber optic drones do, however, have a number of drawbacks that mean they might not fully replace radio-controlled drones. The wire that connects the drone to the operator limits the maneuverability of the drone. Snagging it on any kind of obstacle can result in a loss of control. Fiber-optic drones cannot really double back over their route or circle a target, as this could tangle their control wire and also result in a loss of control. As a result, fiber-optic drones are said to be even more difficult to fly than radio-controlled drones. Because of these limitations, several drone operators I spoke to actively resist using fiber-optic drones. Furthermore, though cost will probably come down, at present the cost of the cable means that a fiber-optic drone with 10 kilometers of cable costs about twice as much as a radio-controlled model of similar range. Finally, production capacities available to Ukraine for fiber-optic cables are, at present, fairly limited compared to radio-controlled drones, meaning they are chronically in short supply.
All that said, if a member of a NATO military were hypothetically to ask me whether NATO countries should acquire first-person view drone capabilities, based on my experience and given the current state of the technology, I would probably say no, whether they are radio-controlled or fiber-optic. The vast majority of first-person view drone missions can be completed more cheaply, effectively, or reliably by other assets. Furthermore, other authors have noted that drones still do not come close to matching the effects that can be achieved by massed artillery fires. Additionally, experts on artillery systems consistently note the greater reliability and range of artillery.
Scaling up drone use would also involve scaling up the drones’ logistical tail. This means more complicated and expensive logistics for drones that would compete for resources with other types of weapons. For the time being, first-person view drones are unlikely to fully replace other weapons systems. No military leader is yet seriously advocating doing away with artillery completely in favor of first-person view drones. This means that the military will have two competing logistical tails: one for first-person view drones and one for artillery.
For sophisticated NATO militaries, instead of investing heavily in the development of first-person view drone capabilities, I would, first of all, recommend ensuring that troops in the field have well-trained organic mortar support with an ample supply of ammunition. Mortars, like artillery, can’t be stopped by bad weather, jamming, or crowded frequencies. Nor can they be impeded by the dark. A well-trained mortar crew can reliably put rounds on a target in less than five minutes. Our first-person view sorties took about 15 minutes from the initial request to the moment the drone struck the target, and that was only when conditions were optimal. A mortar’s price per shot is lower than a first-person view drone. Drones can nominally have an advantage over mortars in range, but this is variable and depends on the terrain, the specific location of the mortars relative to the drone launch site, and the deployment of intelligence, surveillance, and reconnaissance assets that find the targets for drones or mortars. In practice, I don’t remember a single case when we struck a target that was beyond the range of mortars, and we certainly never struck a target that was beyond the range of artillery.
Secondly, for the rare cases when troops actually need tactical-level, organic precision-strike capability, and when actually carrying out such a strike is feasible, I would recommend something a little bit more high-end than a first-person view drone. NATO countries and their allies already produce high-quality loitering munitions, like the Switchblade. Such loitering munitions provide greater precision in day and night, more ease of use, and higher resistance to electronic interference than first-person view drones. They are also more expensive, but their cost is, like first-person view drones, coming down. The investment in quality seems to justify the greater expense, especially since, at most, one in ten first-person view sorties is a precision strike.
Jakub Jajcay is a former officer in the Armed Forces of the Slovak Republic, where he served in a number of elite units. He is currently working on his Ph. D. in the Department of Middle Eastern Studies of Charles University in Prague.
...
Read the original on warontherocks.com »
Same SizerWiggle OutFill the SpaceHyphen OutHyphenatorLast is FirstExt. Word & LetterVariable Gradient
The “Same Sizer” script applies the principle of
ensures that each word, regardless of length or letter
Following a tradition seen in Ashkenazi Hebrew
manuscripts and certain Quranic texts, this script
rotates words that are too large to fit within a
text block into the margin. The resulting curve can
be adjusted to be more or less pronounced.
It also offers a version with a straight-end finish.
This script imitates a method used in certain
manuscripts, where the space between the last
word of a line and the end of the text block is
filled with various elements ― such as a simple
or wavy pen stroke, repetition of the last letter,
punctuation marks, embellished slashes, full stops,
etc. It allows you to fill this space with one or
more glyphs of your choice, or by repeating the
last letter of the line.
the second part outside the text frame.
The size (in %) and alignment of the resulting part
can be adjusted.
The “Hyphenator” InDesign script enhances text
flow and readability by avoiding word breaks.
It reduces the size of the last letters in the final word
of a line, ensuring they fit within the available space.
Last is First
This script offers a preview of the word that will
appear on the next line, a phenomenon seen in some
Hebrew manuscripts.
Frequently used in Hebrew manuscripts,
particularly for copying biblical texts, this script
expands the last letter or the last word of a line.
To counter the 1000% maximum enlargement limit
imposed by InDesign, we suggest selecting the
vectorizing option so that the right-hand side of
the frame is perfectly aligned.
The “Variable Gradient” script creates a gradient
effect throughout a text block by calculating
intermediate values between two extremes on
a chosen axis. The result can be applied word
by word or glyph by glyph.
...
Read the original on alternativelayoutsystem.com »
How we made our AI code reviewer stop being so noisyI’m Paul, cofounder of cubic—an “AI-native GitHub.” One of our core features is an AI code review agent that performs an initial review pass, catching bugs, anti-patterns, duplicated code, and similar issues in pull requests. When we first released this agent back in April, the main feedback we got was straightforward: it was too noisy.Even small PRs often ended up flooded with multiple low-value comments, nitpicks, or outright false positives. Rather than helping reviewers, it cluttered discussions and obscured genuinely valuable feedback.We decided to take a step back and thoroughly investigate why this was happening.After three major architecture revisions and extensive offline testing, we managed to reduce false positives by 51% without sacrificing recall.Many of these lessons turned out to be broadly useful—not just for code review agents but for designing effective AI systems in general.Our initial architecture was straightforward but problematic:It looked clean in theory but quickly fell apart in practice:Excessive false positives: The agent often mistook style issues for critical bugs, flagged resolved issues, and repeated suggestions our linters had already addressed.Users lost trust: Developers quickly learned to ignore the comments altogether. When half the comments feel irrelevant, the truly important ones get missed.Opaque reasoning: Understanding why the agent made specific calls was practically impossible. Even explicit prompts like “ignore minor style issues” had minimal effect.We tried standard solutions—longer prompts, adjusting the model’s temperature, experimenting with sampling—but saw little meaningful improvement.After extensive trial-and-error, we developed an architecture that significantly improved results and proved effective in real-world repositories. These solutions underpin the 51% reduction in false positives currently running in production.We required the AI to explicitly state its reasoning before providing any feedback:{
“reasoning”: “`cfg` can be nil on line 42; dereferenced without check on line 47″,
“finding”: “Possible nil‑pointer dereference”,
“confidence”: 0.81
}Enabled us to clearly trace the AI’s decision-making process. If reasoning was flawed, we could quickly identify and exclude the pattern in future iterations.Encouraged structured thinking by forcing the AI to justify its findings first, significantly reducing arbitrary conclusions.Created a foundation to diagnose and resolve root causes behind other issues we faced.Initially, the agent had extensive tooling—Language Server Protocol (LSP), static analysis, test runners, and more. However, explicit reasoning logs revealed most analyses relied on a few core tools, with extra complexity causing confusion and mistakes.We streamlined the toolkit to essential components only—a simplified LSP and a basic terminal.With fewer distractions, the agent spent more energy confirming genuine issues, significantly improving precision.Initially, our instinct was to continuously add more rules into a single large prompt to handle edge cases:This rapidly became unsustainable and was largely ineffective as the AI frequently overlooked many rules.Our breakthrough came from employing specialized micro-agents, each handling a narrowly-defined scope:Planner: Quickly assesses changes and identifies necessary checks.Security Agent: Detects vulnerabilities such as injection or insecure authentication.Specializing allowed each agent to maintain a focused context, keeping token usage efficient and precision high. The main trade-off was increased token consumption due to overlapping context, managed through effective caching strategies.These architecture and prompt improvements led to meaningful results across hundreds of real pull requests from active open-source and private repositories. Specifically, over the past six weeks:Median comments per pull request cut by half, helping teams concentrate on genuinely important issues.Teams reported notably smoother review processes, spending less time managing irrelevant comments and more time effectively merging changes.Additionally, the reduced noise significantly improved developer confidence and engagement, making reviews faster and more impactful.Explicit reasoning improves clarity. Require your AI to clearly explain its rationale first—this boosts accuracy and simplifies debugging.Simplify the toolset. Regularly evaluate your agent’s toolkit and remove tools rarely used (less than 10% of tasks).Specialize with micro-agents. Keep each AI agent tightly focused on a single task, reducing cognitive overload and enhancing precision.How we made our AI code reviewer stop being so noisyI’m Paul, cofounder of cubic—an “AI-native GitHub.” One of our core features is an AI code review agent that performs an initial review pass, catching bugs, anti-patterns, duplicated code, and similar issues in pull requests.When we first released this agent back in April, the main feedback we got was straightforward: it was too noisy.Even small PRs often ended up flooded with multiple low-value comments, nitpicks, or outright false positives. Rather than helping reviewers, it cluttered discussions and obscured genuinely valuable feedback.We decided to take a step back and thoroughly investigate why this was happening.After three major architecture revisions and extensive offline testing, we managed to reduce false positives by 51% without sacrificing recall.Many of these lessons turned out to be broadly useful—not just for code review agents but for designing effective AI systems in general.Our initial architecture was straightforward but problematic:It looked clean in theory but quickly fell apart in practice:Excessive false positives: The agent often mistook style issues for critical bugs, flagged resolved issues, and repeated suggestions our linters had already addressed.Users lost trust: Developers quickly learned to ignore the comments altogether. When half the comments feel irrelevant, the truly important ones get missed.Opaque reasoning: Understanding why the agent made specific calls was practically impossible. Even explicit prompts like “ignore minor style issues” had minimal effect.We tried standard solutions—longer prompts, adjusting the model’s temperature, experimenting with sampling—but saw little meaningful improvement.After extensive trial-and-error, we developed an architecture that significantly improved results and proved effective in real-world repositories. These solutions underpin the 51% reduction in false positives currently running in production.We required the AI to explicitly state its reasoning before providing any feedback:{
“reasoning”: “`cfg` can be nil on line 42; dereferenced without check on line 47″,
“finding”: “Possible nil‑pointer dereference”,
“confidence”: 0.81
}Enabled us to clearly trace the AI’s decision-making process. If reasoning was flawed, we could quickly identify and exclude the pattern in future iterations.Encouraged structured thinking by forcing the AI to justify its findings first, significantly reducing arbitrary conclusions.Created a foundation to diagnose and resolve root causes behind other issues we faced.Initially, the agent had extensive tooling—Language Server Protocol (LSP), static analysis, test runners, and more. However, explicit reasoning logs revealed most analyses relied on a few core tools, with extra complexity causing confusion and mistakes.We streamlined the toolkit to essential components only—a simplified LSP and a basic terminal.With fewer distractions, the agent spent more energy confirming genuine issues, significantly improving precision.Initially, our instinct was to continuously add more rules into a single large prompt to handle edge cases:This rapidly became unsustainable and was largely ineffective as the AI frequently overlooked many rules.Our breakthrough came from employing specialized micro-agents, each handling a narrowly-defined scope:Planner: Quickly assesses changes and identifies necessary checks.Security Agent: Detects vulnerabilities such as injection or insecure authentication.Specializing allowed each agent to maintain a focused context, keeping token usage efficient and precision high. The main trade-off was increased token consumption due to overlapping context, managed through effective caching strategies.These architecture and prompt improvements led to meaningful results across hundreds of real pull requests from active open-source and private repositories. Specifically, over the past six weeks:Median comments per pull request cut by half, helping teams concentrate on genuinely important issues.Teams reported notably smoother review processes, spending less time managing irrelevant comments and more time effectively merging changes.Additionally, the reduced noise significantly improved developer confidence and engagement, making reviews faster and more impactful.Explicit reasoning improves clarity. Require your AI to clearly explain its rationale first—this boosts accuracy and simplifies debugging.Simplify the toolset. Regularly evaluate your agent’s toolkit and remove tools rarely used (less than 10% of tasks).Specialize with micro-agents. Keep each AI agent tightly focused on a single task, reducing cognitive overload and enhancing precision.How we made our AI code reviewer stop being so noisyI’m Paul, cofounder of cubic—an “AI-native GitHub.” One of our core features is an AI code review agent that performs an initial review pass, catching bugs, anti-patterns, duplicated code, and similar issues in pull requests.When we first released this agent back in April, the main feedback we got was straightforward: it was too noisy.Even small PRs often ended up flooded with multiple low-value comments, nitpicks, or outright false positives. Rather than helping reviewers, it cluttered discussions and obscured genuinely valuable feedback.We decided to take a step back and thoroughly investigate why this was happening.After three major architecture revisions and extensive offline testing, we managed to reduce false positives by 51% without sacrificing recall.Many of these lessons turned out to be broadly useful—not just for code review agents but for designing effective AI systems in general.Our initial architecture was straightforward but problematic:It looked clean in theory but quickly fell apart in practice:Excessive false positives: The agent often mistook style issues for critical bugs, flagged resolved issues, and repeated suggestions our linters had already addressed.Users lost trust: Developers quickly learned to ignore the comments altogether. When half the comments feel irrelevant, the truly important ones get missed.Opaque reasoning: Understanding why the agent made specific calls was practically impossible. Even explicit prompts like “ignore minor style issues” had minimal effect.We tried standard solutions—longer prompts, adjusting the model’s temperature, experimenting with sampling—but saw little meaningful improvement.After extensive trial-and-error, we developed an architecture that significantly improved results and proved effective in real-world repositories. These solutions underpin the 51% reduction in false positives currently running in production.We required the AI to explicitly state its reasoning before providing any feedback:{
“reasoning”: “`cfg` can be nil on line 42; dereferenced without check on line 47″,
“finding”: “Possible nil‑pointer dereference”,
“confidence”: 0.81
}Enabled us to clearly trace the AI’s decision-making process. If reasoning was flawed, we could quickly identify and exclude the pattern in future iterations.Encouraged structured thinking by forcing the AI to justify its findings first, significantly reducing arbitrary conclusions.Created a foundation to diagnose and resolve root causes behind other issues we faced.Initially, the agent had extensive tooling—Language Server Protocol (LSP), static analysis, test runners, and more. However, explicit reasoning logs revealed most analyses relied on a few core tools, with extra complexity causing confusion and mistakes.We streamlined the toolkit to essential components only—a simplified LSP and a basic terminal.With fewer distractions, the agent spent more energy confirming genuine issues, significantly improving precision.Initially, our instinct was to continuously add more rules into a single large prompt to handle edge cases:This rapidly became unsustainable and was largely ineffective as the AI frequently overlooked many rules.Our breakthrough came from employing specialized micro-agents, each handling a narrowly-defined scope:Planner: Quickly assesses changes and identifies necessary checks.Security Agent: Detects vulnerabilities such as injection or insecure authentication.Specializing allowed each agent to maintain a focused context, keeping token usage efficient and precision high. The main trade-off was increased token consumption due to overlapping context, managed through effective caching strategies.These architecture and prompt improvements led to meaningful results across hundreds of real pull requests from active open-source and private repositories. Specifically, over the past six weeks:Median comments per pull request cut by half, helping teams concentrate on genuinely important issues.Teams reported notably smoother review processes, spending less time managing irrelevant comments and more time effectively merging changes.Additionally, the reduced noise significantly improved developer confidence and engagement, making reviews faster and more impactful.Explicit reasoning improves clarity. Require your AI to clearly explain its rationale first—this boosts accuracy and simplifies debugging.Simplify the toolset. Regularly evaluate your agent’s toolkit and remove tools rarely used (less than 10% of tasks).Specialize with micro-agents. Keep each AI agent tightly focused on a single task, reducing cognitive overload and enhancing precision.
...
Read the original on mrge-home.framer.website »
My website (the one you’re reading right now) is mainly served by a single Rust binary. For far too long now, every time I wanted to make a change, I would:
Copy it to my server
This is… not ideal.
So instead, I’d like to switch to deploying my website with containers (be it Docker, Kubernetes, or otherwise), matching the vast majority of software deployed any time in the last decade.
The only issue is that fast Rust builds with Docker are not simple.
Rust in Docker, the simple way
To get your Rust program in a container, the typical approach you might find would be something like:
FROM rust:1.87-alpine3.22 AS builder
RUN apk add musl-dev
WORKDIR /workdir
COPY . .
# the “package” for my website is “web-http-server”.
RUN cargo build –package web-http-server –target=x86_64-unknown-linux-musl
# Only include the binary in the final image
FROM alpine:3.20
COPY –from=builder /workdir/target/x86_64-unknown-linux-musl/release/web-http-server /usr/bin/web-http-server
ENTRYPOINT [“/usr/bin/web-http-server”]
Unfortunately, this will rebuild everything from scratch whenever there’s any change.
In my case, building from scratch takes about 4 minutes (including 10s to download the crates every time).
$ cargo build –release –target=x86_64-unknown-linux-musl –package web-http-server
Updating crates.io index
Downloading crates …
Downloaded anstream v0.6.18
Downloaded http-body v1.0.1
… many more lines …
Compiling web-http-server v0.1.0 (/workdir/web-http-server)
Finished `release` profile [optimized + debuginfo] target(s) in 3m 51s
Sure, it could be worse. But I’ve grown accustomed to speedy local builds, thanks to incremental compilation — I don’t want to wait that long on every tiny change!
Rust in Docker, with better caching
Thankfully, there’s a tool to help with this!
Luca Palmieri’s cargo-chef makes it easy to pre-build all of the dependencies as a separate layer in the docker build cache, so that changes in your codebase only trigger re-compilation of your codebase (and not your dependencies).
I’ll save the detailed explanation for Luca’s blog post, but broadly cargo-chef creates a simplified “recipe” file from the current workspace, which can be “cooked” to cache the dependencies without being invalidated by changes in the workspace.
My website pulls in a few hundred dependencies, so this should help!
FROM … AS planner
COPY . .
RUN cargo chef prepare –recipe-path=/workdir/recipe.json
FROM … AS cooker
# NOTE: changes to the project can produce the same “recipe”,
# allowing this build stage to be cached.
COPY –from=planner /workdir/recipe.json recipe.json
RUN cargo chef cook –release –recipe-path=/workdir/recipe.json \
–target=x86_64-unknown-linux-musl
# If recipe.json is the same, ‘cooker’ will be cached.
# All that’s left is compiling the final binary.
FROM cooker AS builder
COPY . .
RUN cargo build –release –package web-http-server \
–target=x86_64-unknown-linux-musl
Unfortunately though, it doesn’t have quite the speedup we’re looking for — most of the time is still in the final binary:
$ # Build dependencies
$ cargo chef cook –release …
Updating crates.io index
Downloading crates …
Compiling web-http-server v0.0.1 (/workdir/web-http-server)
Finished `release` profile [optimized + debuginfo] target(s) in 1m 07s
$ # Build the final binary, using cached dependencies
$ cargo build –release …
Compiling web-http-server v0.1.0 (/workdir/web-http-server)
Finished `release` profile [optimized + debuginfo] target(s) in 2m 50s
Weirdly, only 25% of the time is actually spent on the dependencies! As far as I could tell, my code isn’t doing anything fundamentally unreasonable. It’s ~7k lines of gluing together various larger dependencies (axum, reqwest,
tokio-postgres, among others.)
What’s rustc doing for all that time?
Following this excellent post by fasterthanlime, I first tried using cargo –timings to get some more information:
$ cargo build –release –timings …
Compiling web-http-server v0.1.0 (/workdir/web-http-server)
Timing report saved to /workdir/target/cargo-timings/cargo-timing-20250607T192029.207407545Z.html
Finished `release` profile [optimized + debuginfo] target(s) in 2m 54s
In addition to that cargo-timing- file, there’s also a cargo-timing.html. We’ll just copy out the canonical version:
FROM cooker AS builder
COPY . .
RUN cargo build –timings –release –target=x86_64-unknown-linux-musl –package web-http-server
# NEW: Move the cargo timings to a known location
RUN mv target/cargo-timings/cargo-timing-*.html cargo-timing.html
FROM alpine:3.22
COPY –from=builder /workdir/target/x86_64-unknown-linux-musl/release/web-http-server /usr/bin/web-http-server
# NEW: Include it in the final image
COPY –from=builder /workdir/cargo-timing.html cargo-timing.html
And with a little bit of container wrangling…
id=“$(docker container create
… we should be able to see what’s going on! Let’s have a look:
Oh. There’s not really much information there!
What’s going on here?
cargo build –timings shows a bunch of information about how long each crate took to compile. But here, we only care about the compilation time of the final crate!
That aside, this does help give us more accurate timing. Measuring outside the compiler adds some extra moving pieces, or requires searching the output of cargo build — so using cargo’s self-reported timings will make more precise analysis a bit easier, later on.
Just to check, the value here of 174.1s roughly matches the “2m 54s” we saw from the cargo build output.
Actually asking rustc this time
The post from fasterthanlime had one more tip we can use — rustc’s self-profiling feature, via the -Zself-profile
flag.
Normally, you’d probably run something like:
RUSTC_BOOTSTRAP=1 cargo rustc –release — -Z self-profile
Unfortunately, this won’t work here — the change in arguments will invalidate the cached dependencies from
cargo chef cook, and there’s no equivalent way to pass additional rustc flags through cargo-chef.
Instead, we can funnel everything via the RUSTFLAGS environment variable:
# cargo chef:
RUSTC_BOOTSTRAP=1 RUSTFLAGS=‘-Zself-profile’ cargo chef cook –release …
# final build:
RUSTC_BOOTSTRAP=1 RUSTFLAGS=‘-Zself-profile’ cargo build –release …
This gives us files like web_http_server-, which we can move and extract from the image in the same way as we did for cargo-timing.html.
Actually using the profdata
The Rust folks maintain a suite of tools for exploring rustc’s self-profiling output, over in
...
Read the original on sharnoff.io »
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.