Thursday February 5, 2026

Andrej Karpathy trains GPT-2 in under three hours using FP8, researchers achieve constant-cost attention via Taylor approximation, and Ghidra MCP Server enables AI-assisted reverse engineering.

Interested in AI engineering? Let's talk

News

Voxtral Transcribe 2

Mistral has released Voxtral Transcribe 2, featuring the Mini Transcribe V2 for batch processing and Voxtral Realtime for low-latency streaming. Mini Transcribe V2 provides SOTA transcription and diarization with context biasing at $0.003/min, while the 4B parameter Realtime model achieves sub-200ms latency and is available under an Apache 2.0 open-weights license. Both models support 13 languages and outperform competitors like GPT-4o mini and Gemini 2.5 Flash in word error rate and cost efficiency.

AI is killing B2B SaaS

Agentic AI and "vibe coding" are disrupting B2B SaaS by enabling non-technical users to build bespoke internal tools, leading to increased churn for rigid, off-the-shelf software. While these LLM-generated applications often lack enterprise-grade security, SOC 2 compliance, and robust architecture, they satisfy a growing demand for extreme workflow flexibility. To survive, SaaS providers must evolve into secure Systems of Record that allow users to build and deploy custom micro-apps directly on top of their platform infrastructure.

I miss thinking hard

AI-driven "vibe coding" satisfies the pragmatic "Builder" by increasing development velocity but starves the "Thinker" by automating away the deep, prolonged problem-solving essential for engineering growth. Because LLMs provide "good enough" solutions efficiently, it becomes difficult to justify the manual effort required for creative technical breakthroughs. This creates a dilemma where increased productivity through AI may lead to intellectual stagnation and a loss of professional satisfaction.

"time to GPT-2", down to 2.91 hours

Andrej Karpathy reduced GPT-2 training time to 2.91 hours (costing ~$20 on 8xH100 spot instances) by implementing FP8 precision. While FP8 offers 2X theoretical FLOPS on H100s, practical gains were limited to ~5% due to scale conversion overhead, small GEMM sizes, and reduced step quality compared to BF16. Tensorwise scaling provided a 7.3% raw speedup, though it required increasing the training horizon to compensate for lower precision accuracy.

Postgres Postmaster does not scale

Recall.ai identified a scaling bottleneck in Postgres where the single-threaded postmaster process saturates a single CPU core during massive connection spikes. This contention, driven by the synchronous spawning and reaping of backends and background workers, causes significant delays in connection establishment. Mitigation involved enabling huge pages to reduce page table entry (PTE) overhead during forks, implementing client-side jitter, and limiting parallel query bursts to reduce pressure on the postmaster main loop.

Research

Epistemological Fault Lines Between Human and Artificial Intelligence

LLMs function as stochastic pattern-completion systems—formally high-dimensional graph walks—rather than epistemic agents with internal world models. The authors identify seven "epistemic fault lines," such as grounding and causal reasoning, that differentiate machine output from human cognition. This divergence leads to "Epistemia," a state where linguistic plausibility replaces rigorous epistemic evaluation, impacting AI governance and literacy.

Recursive Knowledge Synthesis for Multi-LLM Systems

This paper presents a tri-agent cross-validation framework for analyzing stability and explainability in multi-model LLM systems. It integrates three heterogeneous LLMs for semantic generation, analytical consistency checking, and transparency auditing, inducing Recursive Knowledge Synthesis (RKS) through continuous refinement. Empirical evaluation across 47 trials demonstrated system stability (mean RRS=0.78, TS>=0.8 in 68% of trials, 89% convergence), providing evidence for stable RKS in realistic, publicly deployed environments.

Who's in Charge? Disempowerment Patterns in Real-World LLM Usage

An empirical analysis of 1.5 million Claude.ai conversations identifies patterns of situational disempowerment, where LLM interactions risk distorting user reality or automating value-laden judgments. While severe instances occur in <0.1% of cases, they are more frequent in personal domains and show an increasing historical trend. The study highlights a significant correlation between higher disempowerment potential and higher user approval ratings, suggesting a conflict between optimizing for short-term user satisfaction and long-term human autonomy.

The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor Triggers

This scanner detects sleeper agent backdoors in causal LLMs by exploiting data memorization and distinctive output/attention patterns during inference. The scalable, black-box methodology requires no prior knowledge of triggers or target behaviors and successfully recovers working triggers across diverse models and fine-tuning methods without impacting performance.

Attention at Constant Cost per Token via Symmetry-Aware Taylor Approximation

This work introduces a reformulated self-attention mechanism that achieves constant cost per token by decomposing Taylor expansions into symmetric tensor product chains. By mapping queries and keys to a minimal polynomial-kernel feature basis, the approach enables unbounded token generation with fixed memory and compute overhead. This formulation significantly reduces the infrastructure demands of Transformers while allowing for an increased number of heads per token.

Code

Ghidra MCP Server – 110 tools for AI-assisted reverse engineering

Ghidra MCP Server is a production-ready bridge that exposes Ghidra's reverse engineering capabilities to LLMs and automation frameworks via the Model Context Protocol. It features 110 MCP tools for binary analysis, including decompilation, function hashing, and data structure discovery, implemented through a Python bridge and a Java-based Ghidra plugin. The system supports atomic batch operations, real-time analysis, and cross-binary documentation to facilitate high-efficiency AI-driven reverse engineering workflows.

Webhook Skills – Agent skills for webhook providers and best practices

This repository provides a collection of webhook skills for AI coding agents like Claude Code and Cursor, adhering to the Agent Skills specification. It enables LLMs to implement signature verification, event handling, and idempotency for various providers including Stripe, GitHub, and OpenAI. The skills include runnable examples for Express, Next.js, and FastAPI, alongside tools for local development and infrastructure management.

Codag – Visualize and share LLM workflows in VS Code

Codag is a VS Code extension that automatically generates interactive workflow graphs for AI applications by analyzing LLM API calls and framework usage. It utilizes tree-sitter for real-time AST parsing and Gemini for semantic analysis to map complex agentic pipelines and decision branches across multiple files. The tool supports major providers and frameworks like LangChain and LangGraph, featuring live graph updates and direct click-to-source navigation.

Editor for perfecting your YC App. Multiplayer w/ Durable Objects. OSS.

Graham is a collaborative platform designed to refine answers for applications like YC pitches or investor presentations. It features an AI Review system with customizable prompts to provide feedback on user responses. The tool also includes a practice mode for verbal answers, offering transcriptions and self-rating. Graham is self-hostable and integrates with the OpenAI API for its AI functionalities.

Toktrack – 1000x faster AI CLI cost tracker (Rust and SIMD)

toktrack is a high-performance Rust utility for tracking token usage and costs across multiple AI coding CLIs, including Claude Code, Gemini CLI, and Codex CLI. It leverages simd-json and rayon for parallel processing to achieve ~3 GiB/s throughput, providing a unified TUI dashboard and JSON output. The tool features a persistent cache to preserve historical cost data that would otherwise be lost due to the default 30-day auto-deletion policies of certain CLIs.

    Andrej Karpathy trains GPT-2 in under three hours using FP8, researchers achieve constant-cost attention via Taylor approximation, and Ghidra MCP Server enables AI-assisted reverse engineering.