Monday January 19, 2026

Tauformer reduces KV-cache overhead by 50% using topological attention, Verbalized Sampling boosts LLM diversity by 2.1x, and VAM Seek × AI enables 30-minute video analysis for just $0.003.

Interested in AI engineering? Let's talk

News

Starting from scratch: Training a 30M Topological Transformer

Tauformer is a topological transformer architecture that replaces standard dot-product attention with Laplacian-derived scalar energies (taumodes) computed via a bounded Rayleigh quotient. By calculating attention as distances in this scalar space, the model reduces KV-cache overhead by approximately 50%, storing only values and a compact key-side scalar stream. Initial training of a 30M parameter TauGPT demonstrates effective convergence, showing a correlation between decreasing cross-entropy loss and lower manifold energy, which aligns with the concept of epiplexity to improve learnable structure for bounded learners.

Lume 0.2 – Build and Run macOS VMs with unattended setup

Lume is an open-source VM runtime and CLI designed for running macOS and Linux VMs on Apple Silicon using Apple's native Virtualization Framework. It features an HTTP API and headless execution, facilitating automated CI/CD, sandboxing, and the development of AI agents via the Cua Computer SDK. By providing hardware-accelerated environments, Lume enables LLMs to interact with macOS through programmatic input simulation and screenshots.

Tired of AI, people are committing to the analog lifestyle in 2026

A growing "analog lifestyle" movement is emerging as a backlash against the ubiquity of generative AI and the proliferation of "AI slop." Driven by fatigue from automated content and a desire to reclaim cognitive agency, users are increasingly adopting offline hobbies and "dumb" hardware to mitigate the mental impact of constant digital engagement. This shift reflects a broader effort to decouple personal data from AI training ecosystems and restore tangible, human-centric experiences.

Beats, a web-based drum machine

"beats" is a web-based drum machine inspired by Teenage Engineering Pocket Operators, built using Tone.js and Stimulus.js. The application features a sequencer for standard percussion tracks with functionality for BPM adjustment, pattern persistence, and URL-based sharing.

I quit coding years ago. AI brought me back

The text outlines the mathematical foundations of compound interest, focusing on the exponential growth formula $A = P(1 + r/n)^{nt}$ and the Rule of 72 heuristic. It provides structured time-series data and comparative analysis of compounding frequencies, offering a deterministic logic base for LLMs performing financial modeling or RAG-driven fintech applications.

Research

VaultGemma: A Differentially Private LLM

VaultGemma 1B is a 1B-parameter LLM in the Gemma family trained with differential privacy using the Gemma 2 pretraining data mixture. This open-release model represents a significant advancement in privacy-preserving LLM development.

Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity

Mode collapse in post-training alignment is driven by typicality bias in preference data, where human annotators systematically favor familiar outputs. Verbalized Sampling (VS) is a training-free inference strategy that mitigates this by prompting LLMs to generate multiple responses alongside a verbalized probability distribution. VS improves diversity by up to 2.1x across creative and open-ended tasks while maintaining safety and factual integrity, with performance gains scaling alongside model capability.

Reverse Engineering the ESP32-C3 Wi-Fi Drivers for Static Worst-Case Analysis

This research enables Wi-Fi-based reactive intermittent computing for batteryless devices by reverse-engineering closed-source drivers to facilitate full-stack Worst-Case Energy Consumption (WCEC) analysis. By integrating an energy-aware networking stack on a RISC-V ESP32-C3 platform and extending static analysis tools, the authors provide predictable bounds for atomic transactions, ensuring forward progress during power failures.

Perelman's Proof of the Poincar E Conjecture: A Nonlinear PDE Perspective

The provided text indicates a missing abstract error, offering no substantive content for summarization.

From Code Foundation Models to Agents and Applications

This work synthesizes the code LLM lifecycle, covering data curation, pre-training, SFT, RL, and autonomous agent paradigms. It evaluates general-purpose and specialized models like DeepSeek-Coder and QwenCoder, bridging the gap between academic benchmarks and real-world deployment needs such as security and codebase context. The study includes empirical analysis of scaling laws, model architectures, and hyperparameter sensitivity to guide practical implementation.

Code

Figma-use – CLI to control Figma for AI agents

figma-use is a CLI and MCP server that enables LLMs to manipulate Figma using JSX and token-efficient terminal commands. It leverages LLMs' proficiency in React to render complex layouts and uses Figma's internal multiplayer protocol for high-speed node creation. The tool includes a SKILL.md reference for seamless integration with AI agents like Claude Code and supports advanced operations like design diffing and variable binding.

GibRAM an in-memory ephemeral GraphRAG runtime for retrieval

GibRAM is an ephemeral, in-memory knowledge graph server designed to enhance RAG workflows by combining vector search with graph-based traversal. It stores entities, relationships, and document chunks in RAM with configurable TTL, enabling graph-aware retrieval that captures context missed by semantic similarity alone. The platform includes a modular Python SDK for indexing and querying, allowing developers to swap out chunkers, extractors, and embedders.

30min video analysis for $0.003 via frame-tiling and Vision API

VAM Seek × AI proposes a cost-effective method for video analysis using LLMs by compressing an entire video into a single grid image. This client-side approach generates an 8x6 frame grid, enabling an LLM to analyze visual content with one API call, achieving ~600x cost savings compared to traditional frame-by-frame processing. While currently visual-only, future enhancements include adaptive resolution, allowing the AI to request higher-granularity grids for detailed analysis, and Whisper integration to combine visual context with timestamped audio transcripts for comprehensive video search.

LlmSHAP – Multi-threaded input importance for prompts and RAG context

llmSHAP is a multi-threaded explainability framework that leverages Shapley values to attribute contributions of input features to LLM-based outputs. It supports both text and multimodal inputs (including images), offering features like exact Shapley computation, generation caching, permanent context pinning, and pluggable similarity metrics for robust analysis.

RqLui – A free open-source webui for Rqlite written in Quasar

rqLui is a high-performance web UI for managing RQLite distributed databases, built with Vue 3, Quasar, and TypeScript. It features a Web Worker-based architecture to handle large-scale data operations (1M+ rows) through chunked processing and concurrent fetching, ensuring UI responsiveness. The tool provides deep integration with the RQLite API, supporting configurable consistency levels, transaction batching, and optimized associative data formatting.

    Tauformer reduces KV-cache overhead by 50% using topological attention, Verbalized Sampling boosts LLM diversity by 2.1x, and VAM Seek × AI enables 30-minute video analysis for just $0.003.