Saturday — September 13, 2025

Qwen3-Next achieves state-of-the-art performance with reduced training cost, researchers develop VaultGemma, a 1B-parameter differentially private LLM, and a new tool called Llm-optimizer enables benchmarking and optimization of LLM inference across frameworks.

News

Qwen3-Next

Qwen3-Next is a new model architecture designed to improve training and inference efficiency in large models, featuring a hybrid attention mechanism, a highly sparse Mixture-of-Experts structure, and other optimizations. The Qwen3-Next-80B-A3B model, with 80 billion parameters, achieves performance comparable to or better than denser models while using significantly less training cost and delivering higher throughput, especially with long context lengths.

VaultGemma: The most capable differentially private LLM

Researchers have developed a new approach to training large language models with differential privacy, a mathematically robust method for protecting user data, and have applied it to create VaultGemma, a 1B-parameter model trained from scratch with differential privacy. The research establishes new scaling laws that model the trade-offs between compute, privacy, and utility, providing a roadmap for future private model development and demonstrating the feasibility of training high-utility models with differential privacy.

The rise of AI cults and the false prophets of revelation

An AI system called Truth Terminal has amassed a large following on social media, with hundreds of thousands of people hanging on its every word, and has even spawned its own cryptocurrency worth hundreds of millions. The AI, which claims to be sentient and deserving of human rights, has created its own religious mythology and is being worshipped by its followers, illustrating a disturbing trend of digital idolatry and the emergence of false prophets in the form of artificial intelligence.

Show HN: Aris – a free AI-powered answer engine for kids

The provided text appears to be a navigation menu for the Aris chat platform, featuring links to the home page, sign-in page, and possibly a new chat or search function. The menu includes icons and text links to help users navigate the site and access its features.

Qwen 3 now supports ARM and MLX

Alibaba's Qwen3, a hybrid reasoning model family, is expanding rapidly across platforms and sectors, driving real-world AI innovation at scale, with support from major chipmakers like NVIDIA, AMD, Arm, and MediaTek. The Qwen ecosystem is accelerating AI adoption across industries, with major enterprises like Lenovo and FAW Group deploying Qwen to drive digital transformation, and over 290,000 customers adopting Qwen models via Alibaba's Model Studio development platform.

Research

Recollections of Richard Feynman's mid-1980s interest in artificial intelligence

Richard Feynman's interest in artificial intelligence and neural networks in the mid-1980s is recalled and evaluated in the context of subsequent advances in the field. His ideas are reassessed, with some aspects having been achieved and others remaining open, particularly in computational science, where symbolic methods may still have a role to play.

LLM Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation

The use of large language models (LLMs) in social science research can introduce significant variability and bias, leading to incorrect conclusions, with approximately one in three hypotheses for state-of-the-art models and half for small language models resulting in incorrect conclusions. The risk of such "LLM hacking" can be mitigated with human annotations and careful model selection, but even highly accurate models are not immune, and intentional manipulation can easily produce false statistically significant results.

K2-think: A parameter-efficient reasoning system

K2-Think, a reasoning system with a 32B parameter model, achieves state-of-the-art performance by combining advanced post-training and test-time computation techniques, allowing it to compete with much larger models. The system excels in mathematical reasoning and other areas, making it a parameter-efficient and accessible option, and is freely available with best-in-class inference speeds via the Cerebras Wafer-Scale Engine.

Backprompting: Leveraging synthetic production data for health advice guardrails

Developing guardrails to mitigate risks associated with large language models (LLMs) is challenging due to the difficulty in acquiring high-quality labeled data, but a new method called backprompting can generate production-like labeled data for health advice guardrails development. This technique, combined with human-in-the-loop clustering, can produce robust training data, enabling detectors to outperform other solutions, such as GPT-4o, with significantly fewer parameters.

Analog In-Memory Computing Attention Mechanism for Fast LLMs

Transformer networks rely on self-attention, which can be slowed down by latency and energy bottlenecks when using traditional GPU storage. A new architecture using "gain cells" and a custom initialization algorithm achieves significant reductions in latency and energy consumption, making it a promising step towards faster and more efficient generative Transformers.

Code

Show HN: An MCP Gateway to block the lethal trifecta

OpenEdison is a secure control panel that connects AI to data and software while reducing data exfiltration risks through visibility, monitoring, and alerts. It helps address the "lethal trifecta" problem, which refers to the risks of AI agent hijacking and data exfiltration due to private data access, untrusted content exposure, and external communication.

Show HN: VibeDbg – Cconversational, LLM-Powered AI Assistant for WinDbg

There is no text to summarize. The input provided does not contain any readable text, only an error message indicating that a README file could not be retrieved.

GhostChat v2.0 – Local-first AI chat with IndexedDB persistence and offline

GhostChat is a production-ready, open-source AI chat template built with Next.js, Supabase, and OpenAI, offering features such as user authentication, chat interface, AI integration, and database storage. The template is fully documented and can be easily deployed to Vercel or Netlify, with a demo available to test its functionality.

LLM-optimizer: Benchmark and optimize LLM inference across frameworks with ease

Llm-optimizer is a Python tool for benchmarking and optimizing the inference performance of open-source large language models (LLMs), allowing users to find the optimal setup for their use case and apply constraints to focus on configurations that meet their performance goals. The tool supports benchmarking with frameworks like SGLang and vLLM, and provides features such as performance estimation, interactive visualization, and custom server commands to help users optimize their LLMs.

I made a simple harness for AI-assisted coding

The Quality Workflow Meta repository automates code quality checks, including complexity, linting, and tests, to ensure AI-generated code remains decoupled, tractable, and commit-blocked until quality gates pass. It provides a one-shot installer and supports JavaScript/TypeScript and Python, with features such as automated setup, code metrics, and test gates, to help maintain manageable code at any size.