Saturday — March 21, 2026
Wikipedia bans wholesale LLM content generation, research proves transformers are Bayesian networks, and Rover turns web interfaces into AI agents with a single script tag.
Interested in AI engineering? Let's talk
News
OpenCode – Open source AI coding agent
OpenCode is an open-source AI coding agent available via terminal, IDE, and desktop interfaces, featuring integrated LSP support and multi-session capabilities. It supports over 75 LLM providers, including local models and existing GitHub Copilot or ChatGPT Plus accounts, while maintaining a privacy-first architecture that avoids data storage. The platform also offers "Zen," a curated selection of benchmarked models specifically optimized for coding agent performance.
MacBook M5 Pro and Qwen3.5 = Local AI Security System
HomeSec-Bench evaluates LLMs on specialized home security workflows, including tool use, event deduplication, and security triage. Benchmarking on a MacBook Pro M5 via llama.cpp shows Qwen3.5-9B achieving a 93.8% pass rate, trailing GPT-5.4 by only 4.1 points while maintaining zero API costs and full data privacy. Notably, local MoE variants like Qwen3.5-35B demonstrated lower TTFT (435ms) than top-tier cloud models, highlighting the efficiency of local inference for domain-specific tasks.
AI (2014)
Despite historical skepticism, AGI represents a transformative frontier driven by the hypothesis that a single general-purpose algorithm can replicate biological learning. Significant challenges remain in modeling the emergent complexity of neural systems and distinguishing task-specific optimization from artificial consciousness or creativity. The text questions whether creativity is an emergent property of learning or if a permanent division of labor will persist between human cognition and machine execution.
I made an email app inspired by Arc browser
Define is an AI-integrated email and productivity platform designed to centralize messages, calendar events, files, and media transcriptions into a unified, agent-ready interface. The architecture focuses on minimizing context switching by making all data types easily parsable for LLMs and autonomous agents. Key features include an AI-powered composer and smart folders tailored for modern, high-velocity workflows.
Wikipedia RFC on banning LLM contributions
Wikipedia has adopted an updated WP:NEWLLM policy that prohibits using LLMs for wholesale content generation or rewriting, citing the high verification costs of AI-generated "slop" compared to the low cost of production. The policy permits constrained use cases like human-reviewed copyediting and translation but emphasizes that editors remain fully responsible for hallucinations or policy violations. It also includes safeguards against false positives in LLM detection based purely on linguistic style.
Research
The Missing Memory Hierarchy: Demand Paging for LLM Context Windows
Pichay is a demand paging system for LLM context windows that addresses structural waste by implementing a virtual memory hierarchy. Operating as a transparent proxy, it manages eviction, page fault detection, and working-set pinning to reduce context consumption by up to 93% with a minimal fault rate. The system demonstrates that context management challenges—such as attention degradation and cost scaling—are fundamentally virtual memory problems solvable through classical techniques like demand paging and multi-level memory hierarchies.
Quantum Computing and Artificial Intelligence: Status and Perspectives
This white paper explores the intersection of quantum computing and AI, detailing how quantum computing can advance AI solutions and how classical AI can empower quantum technologies like computing and sensing. It proposes a long-term research agenda to understand their mutual benefits, addressing challenges such as aligning developments with quantum hardware, optimizing resource consumption, and advancing hybrid software engineering.
Transformers Are Bayesian Networks
This paper proposes that transformers are Bayesian networks, demonstrating through five formal proofs and experimental validation that sigmoid transformers implement weighted loopy belief propagation, with each layer performing one round of BP. It further shows transformers can implement exact BP, that their attention/FFN structure aligns with Pearl's gather/update algorithm, and argues that verifiable inference requires a finite concept space, making hallucination a structural consequence rather than a scaling bug.
SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels
SOL-ExecBench is a benchmark of 235 CUDA kernel optimization problems for NVIDIA Blackwell GPUs, shifting evaluation from software baselines to hardware-grounded Speed-of-Light (SOL) bounds derived via the SOLAR pipeline. It introduces a SOL Score to quantify the efficiency gap closed by candidate kernels across diverse AI architectures and data types like FP8 and NVFP4. To ensure robust evaluation of agentic optimizers, the suite includes a sandboxed harness featuring GPU clock locking, L2 cache clearing, and static analysis to prevent reward-hacking.
Matrix Valued Residuals
The Residual Matrix Transformer (RMT) replaces the standard residual stream with an outer product memory matrix, allowing the stream size to scale independently of compute and model size. RMT achieves equivalent loss to standard transformers with 58% fewer FLOPS and 25% fewer parameters while demonstrating superior downstream performance and improved variance propagation.
Code
Tiny pixel characters for Cursor AI agents
Cursouls is a VS Code extension that provides ambient awareness for AI agents through a pixel-art "cafe" interface. It visualizes the real-time status and lifecycle events of agents from providers like Cursor, Claude Code, and Codex using character animations. Built on @agentprobe/core, it offers a zero-config way to monitor multiple concurrent agent tasks, failures, and requests for input without parsing terminal logs.
LiteParse, a fast open-source document parser for AI agents
LiteParse is an open-source, local document parsing tool designed for fast spatial text extraction and bounding box generation without cloud dependencies. It supports PDF, Office documents, and images, utilizing built-in Tesseract.js or external HTTP OCR servers like EasyOCR and PaddleOCR. The tool provides JSON/text outputs and high-quality screenshots, making it ideal for providing structured data and visual context to LLM agents.
Rover – turn any web interface into an AI agent with one script tag
Rover is an open-source engine that enables AI agents to interact directly with the DOM and W3C accessibility tree for millisecond-latency task execution. It bypasses traditional RAG pipelines and vision-based approaches by running natively in the browser, Electron, or via the Agent Task Protocol (ATP). Developers can trigger autonomous actions through script tags, deep links, or a REST API for machine-to-machine workflows.
Agent Use Interface (AUI) – let users bring their own AI agent
Agent Use Interface (AUI) is a lightweight XML schema that enables LLM agents to discover and construct URL-parameter-driven tasks, such as searches, filters, and form pre-fills. By serving a catalog at /agents/aui.xml, sites provide agents with a machine-readable map of base paths and query parameters to translate user intent into actionable links. AUI is designed as a narrow alternative to OpenAPI or MCP, focusing specifically on unidirectional URL construction for web and native app experiences.
CopySpeak – A lightweight tool for quick AI text-to-speech
CopySpeak is a lightweight Windows desktop application that leverages multiple AI Text-to-Speech engines to vocalize clipboard content, triggered by double-copy, hotkey, or manual input. It integrates diverse TTS backends, including CPU-optimized ONNX inference (Kitten TTS), local CLI engines (Piper, Kokoro), and cloud APIs (OpenAI TTS, ElevenLabs TTS). Developed with Rust (Tauri v2) and Svelte 5, the app features real-time HUD waveform visualization, persistent history, and advanced text sanitization.