Thursday — December 11, 2025

Llama-70B achieves 224x compression for transformer-free inference, Terrain Diffusion offers a diffusion-based successor to Perlin Noise, and LangGraph sees 737x faster checkpoints via Rust.

News

Size of Life

"Size of Life" is a project by Neal Agarwal, with illustrations by Julius Csotonyi. It visually explores the scale of various life forms, referencing examples such as a corpse flower, cat, and Alexandra birdwing.

Qwen3-Omni-Flash-2025-12-01：a next-generation native multimodal large model

Qwen3-Omni-Flash-2025-12-01 is a next-generation native multimodal LLM capable of processing text, images, audio, and video inputs, and generating real-time text and speech outputs. This upgraded version features greatly enhanced audio-visual interaction, strengthened system prompt control for fine-tuning persona and output, and more reliable multilingual compliance across 119 text and numerous speech languages. It achieves substantial performance improvements in text understanding (reasoning, code), speech understanding and synthesis (natural prosody), and deeper image and video comprehension, resolving issues like "intelligence drop" in spoken interactions.

McDonald's pulls AI Christmas ad after backlash

McDonald's Netherlands removed its AI-generated Christmas advert following significant online backlash. Viewers criticized the 45-second film for its "uncanny" characters and "poor editing," also raising concerns about job displacement in the creative industry. McDonald's stated this served as an "important learning" as they explore the effective use of AI, while other brands have experienced mixed public reception with generative AI in advertising.

Post-transformer inference: 224× compression of Llama-70B with improved accuracy

This paper introduces a method to eliminate transformers from inference while preserving or improving accuracy. It replaces a frozen 70B-parameter Llama-3.3-70B with a 256-dimensional meaning field extracted from internal activation layers. A lightweight compressor reduces these fields by 224x, achieving an average +1.81 pp accuracy gain. A 30M-parameter student model then regenerates these fields from raw text, enabling transformer-free inference with 60x higher throughput and only 0.35 pp average accuracy loss. The core insight is that task-aligned semantics in transformers reside in a low-rank manifold, leading to the establishment of Field Processing Units (FPUs) as a post-transformer compute primitive.

The AI-Education Death Spiral a.k.a. Let the Kids Cheat

The text posits that LLMs like ChatGPT are exposing a "death spiral" in education, where widespread student use of AI for assignments reveals the inherent pointlessness and lack of relevance in much academic work. The author argues that AI isn't the problem, but rather a stress test highlighting systemic design failures in an education system focused on busywork and compliance over genuine learning, critical thinking, or real-world problem-solving. This dynamic is predicted to devalue traditional credentials, necessitating a fundamental shift towards engaging, impactful, and agency-driven learning experiences that AI cannot automate.

Research

Episodic Memory Architectures for Accurate and Efficient Character AI

An LLM architecture is proposed to overcome the latency-depth trade-off in historical character embodiment, where simple RAG is shallow and multi-stage reflection is slow. This system employs offline data augmentation and efficient parallel retrieval from structured episodic memory, converting biographical data into enriched first-person memories. Achieving 0.52s prompt generation, it matches RAG performance on GPT-4 and significantly surpasses it on smaller LLMs, proving valuable for resource-constrained deployments. The structured memory also facilitates novel visualization tools for biographical analysis.

Nonlinear Quantum Mechanics and Artificial Intelligence

A criterion for relativistic covariance in nonlinear quantum field theory, recently proposed by GPT-5, is shown to inadvertently test Hamiltonian locality and be insensitive to nonlinearity. The authors recall and reformulate the correct criterion established by Gisin and Polchinski.

Terrain Diffusion: A Diffusion-Based Successor to Perlin Noise

Terrain Diffusion introduces an AI-era successor to procedural noise, leveraging diffusion models for realistic, coherent, and infinitely extensible world generation. It employs InfiniteDiffusion for seamless, real-time synthesis of boundless landscapes, a hierarchical diffusion model stack for multi-scale detail, and Laplacian encoding for output stability. Supported by an infinite-tensor framework and consistency distillation, it enables efficient, constant-memory generation of entire planets.

[Re:DeepSeek-OCR] Optical Context Compression Is Just (Bad) Autoencoding

DeepSeek-OCR's high-fidelity text reconstruction from vision tokens sparked interest in vision-based context compression for LLMs. However, this work demonstrates that simpler alternatives like mean pooling or learned hierarchical encoders match or surpass vision for reconstruction at similar compression ratios and outperform it for language modeling, where vision-based compression fails to beat truncation. The current excitement around optical context compression for LLMs outpaces the supporting evidence.

Systemization of Knowledge: Security and Safety Challenges in MCP

This SoK analyzes the Model Context Protocol (MCP), a standard for connecting LLMs to external data and tools, highlighting how its decoupling of context and execution blurs the line between epistemic errors and security breaches. It taxonomizes adversarial security threats (e.g., indirect prompt injection, tool poisoning) and epistemic safety hazards, demonstrating how MCP primitives (Resources, Prompts, Tools) can be weaponized. The paper surveys defenses like cryptographic provenance (ETDI) and runtime intent verification, providing a roadmap for securing the transition to autonomous agentic operating systems.

Code

Show HN: Cupcake – Better performance and security for coding agents (via OPA)

Cupcake is a policy enforcement layer for AI agents, ensuring deterministic rule-following and enhanced security without consuming LLM context. It intercepts agent actions, evaluating them against user-defined OPA Rego policies compiled to Wasm for fast, sandboxed execution. This system provides granular control, offers decisions like allow, modify, or block with feedback, and supports various AI coding agent harnesses and language bindings.

Show HN: Using WebMCP to make the CDP MCP server 90% more token efficient

WebMCP allows AI agents to directly access website functionality through structured, type-safe tool calls, bypassing screenshot parsing or DOM scraping. Websites register JavaScript functions as tools via navigator.modelContext, which Chrome DevTools MCP then exposes to AI clients for discovery and execution. This method significantly reduces token usage (up to 89%) and cost for LLM-based automation, leading to faster and more reliable agent interactions.

Show HN: Open-Source Excel AI Agent

This open-source Excel AI Agent project features an Excel MCP server offering ~30 tools for direct Excel file manipulation and an Excel AI Agent Runner. It facilitates LLM integration for tasks like an Excel assistant web app or a Slack bot. The repository also includes an evaluation harness to benchmark model performance on SpreadsheetBench, relying on Microsoft Excel for precise formula and formatting validation.

Show HN: We open-sourced our internal tool for scoring PRs with Claude AI

MergeMint is an open-source platform that utilizes AI, specifically Anthropic's Claude LLM, to automatically evaluate and score GitHub pull requests. It analyzes PR diffs, linked issues, and commit messages to classify changes by component and severity, then calculates a configurable score. This system provides real-time feedback via PR comments, powers developer leaderboards, and offers product insights for bug bounty programs or contributor recognition.

Show HN: LangGraph profiling – 737x Faster Checkpoints via Rust (PyO3)

Fast-LangGraph offers high-performance Rust accelerators for LangGraph applications, addressing common bottlenecks in production AI agent workloads. It provides both automatic patching for transparent speedups (up to 2.8x E2E) and explicit Rust components for maximum performance, including RustSQLiteCheckpointer (up to 700x faster checkpointing), @cached for LLM response caching (10x+), and langgraph_state_update for efficient state merging (13-46x). This allows for significantly faster state management, checkpoint operations, and reduced LLM API costs while maintaining full API compatibility.