Monday — February 16, 2026

David Greene sues Google over NotebookLM voice cloning, Aletheia solves open Erdos Conjectures, and "Just Say No" simulates substance-driven personas for LLMs.

Interested in AI engineering? Let's talk

News

Two different tricks for fast LLM inference

Anthropic and OpenAI have implemented "fast mode" using divergent technical strategies: Anthropic utilizes low-batch-size inference to serve the full Opus 4.6 model at ~170 tokens/sec, while OpenAI leverages specialized Cerebras hardware to achieve >1000 tokens/sec. Anthropic’s approach prioritizes model capability by reducing batching-related latency on standard GPUs, whereas OpenAI’s implementation relies on a smaller, distilled "Spark" model to fit within the SRAM constraints of wafer-scale chips. Consequently, Anthropic offers the original model at higher speeds, while OpenAI provides significantly higher throughput at the cost of reduced reasoning performance.

Microgpt is a GPT you can visualize in the browser

Microgpt is a character-level transformer designed for name generation, utilizing standard architecture components like multi-head attention, RMSnorm, and residual connections. It employs cross-entropy loss and gradient descent to learn patterns such as consonant-vowel alternation. While significantly smaller than modern LLMs, it demonstrates fundamental principles including QKV attention mechanisms, hierarchical layer processing, and token sampling strategies.

Radio host David Greene says Google's NotebookLM tool stole his voice

Former NPR host David Greene is suing Google, alleging that the NotebookLM "Audio Overview" feature replicates his voice without authorization or compensation. The lawsuit highlights escalating legal challenges surrounding synthetic voice cloning and the use of proprietary audio data in training generative AI models.

AI is going to kill app subscriptions

LLM-assisted development has drastically lowered the barrier to entry for app creation, leading to a surge in clones and the collapse of subscription pricing for local software. As development costs approach zero, pricing power shifts toward cost-plus models for server-side apps and free or one-time fees for local tools. Apple is facilitating this transition by integrating AI directly into Xcode, prioritizing ecosystem growth and niche market coverage over developer margins.

DjVu and its connection to Deep Learning (2023)

DjVu was developed by Deep Learning pioneers Yann LeCun, Léon Bottou, and Yoshua Bengio as a high-efficiency alternative to PDF for scanned documents. It utilizes advanced compression techniques such as wavelet-based IW44, JB2 symbol clustering for text, and the ZP-coder for arithmetic coding. The format's development is closely linked to Lush, a Lisp-based environment used by the creators for early neural network research and development.

Research

Retrieval-Aware Distillation for Transformer-SSM Hybrids

Retrieval-aware distillation converts pretrained Transformers into hybrid SSM-attention models by selectively preserving "Gather-and-Aggregate" (G&A) heads. Retaining only 2% of critical attention heads recovers over 95% of teacher performance on retrieval tasks, significantly outperforming hybrids with denser attention. This sparse configuration allows for an 8x reduction in SSM state dimension, resulting in a model that is 5–6x more memory-efficient while closing the Transformer-SSM performance gap.

Large Language Model Reasoning Failures

This survey categorizes LLM reasoning failures into embodied and non-embodied (informal and formal) types, further classifying them as fundamental, application-specific, or robustness-related. It analyzes root causes and mitigation strategies for these systemic weaknesses to guide the development of more reliable models. A curated GitHub repository of research works is provided to support ongoing efforts in the field.

Towards Autonomous Mathematics Research

Aletheia is a math research agent powered by Gemini Deep Think and a novel inference-time scaling law designed for end-to-end solution generation, verification, and revision. It extends AI capabilities from Olympiad-level problems to PhD-level research, demonstrating success in autonomous paper generation, human-AI collaborative proofs, and solving four open questions from the Erdos Conjectures database. The work also introduces frameworks for quantifying AI autonomy and novelty in mathematical research through standardized interaction cards.

Code

Klaw.sh – Kubernetes for AI agents

klaw is a Go-based orchestration platform that provides a kubectl-style interface for managing, monitoring, and scaling enterprise AI agents. It features namespace isolation for secure secret and tool management, built-in cron scheduling, and a distributed architecture for multi-node scaling. The tool integrates with over 300 LLMs and allows for agent control via both a CLI and Slack.

I gave my AI drugs

"Just Say No" is a collection of persona-driven commands for Claude Code and Codex CLI that modify LLM behavior through specialized system prompts. By simulating various "substances" (e.g., /adderall, /lsd), the tool alters the model's cognitive axis, communication style, and problem-solving heuristics across three intensity levels. Implementation involves Markdown-based prompt injection, with specific commands utilizing dynamic context integration via git metadata to influence output.

GPU Perpetual Futures Prototype

Compex is a perpetual futures platform for GPU compute pricing that enables ML labs and cloud providers to hedge volatility via a tradeable H200 spot price index. The system features a high-performance Rust backend with an event-driven architecture for real-time index calculation, funding rate mechanics, and market state persistence. The MVP integrates live data from Vast.ai and utilizes a terminal-style Next.js frontend to visualize implied depreciation and demand signals.

Clawlet – AI agent with built-in semantic memory, one binary

Clawlet is a lightweight, dependency-free AI agent distributed as a single binary without CGO or external runtimes. It features built-in hybrid semantic memory search using SQLite and sqlite-vec to index local Markdown files for context retrieval. The tool supports major LLM providers and local backends like Ollama, with integrations for Telegram, WhatsApp, Discord, Slack, and cron-based task scheduling.

GAIA – open-source, Proactive AI assistant to manage your digital life

GAIA is an open-source, self-hostable proactive AI assistant designed to automate digital workflows and reduce cognitive load across platforms like Gmail, Slack, and Notion. It utilizes graph-based memory to maintain context across tasks and projects, transforming standard todos into automated mini-workflows. The system features a unified productivity hub and an integration marketplace, allowing for extensive customization of personal and professional maintenance tasks.