Tuesday — January 20, 2026

West Midlands police chief resigns over a Copilot hallucination, Homunculus introduces self-rewriting plugins for Claude Code and DiffusionBlocks enables memory-efficient block-wise training.

Interested in AI engineering? Let's talk

News

I quit coding years ago. AI brought me back

Calquio provides a compound interest calculator that models investment growth using discrete and continuous compounding formulas, $A = P(1 + r/n)^{nt}$ and $A = Pe^{rt}$. The tool supports multi-parameter inputs such as inflation adjustments to differentiate between nominal and real returns, alongside heuristics like the Rule of 72. This structured financial logic is representative of the deterministic tools often integrated into LLM workflows via function calling or RAG to ensure numerical accuracy.

Wikipedia: WikiProject AI Cleanup

WikiProject AI Cleanup is a collaborative initiative to identify and mitigate the impact of unsourced or erroneous AI-generated content on Wikipedia. The project focuses on detecting LLM-generated text and synthetic imagery to address issues like hallucinations and fictitious citations. While not banning AI, the group enforces quality standards through manual verification and the application of policies like WP:G15 for the speedy deletion of unreviewed LLM-generated pages.

West Midlands police chief quits over AI hallucination

West Midlands Police Chief Craig Guildford resigned after the force used hallucinated output from Microsoft Copilot to justify banning Israeli football fans. The LLM fabricated a non-existent match and associated security risks, which were used as the basis for the ban despite Guildford's initial claims to MPs that the force did not use AI. This incident highlights the high-stakes risks of LLM hallucinations and the critical need for verification in public sector AI deployments.

Weight Transfer for RL Post-Training in under 2 seconds

Kimi-K2 achieves 1.3-second weight transfers for 1T parameter models during asynchronous RL fine-tuning by leveraging RDMA WRITE for one-sided, zero-copy communication. The architecture utilizes a static transfer schedule and a task pipeline to overlap parameter preparation, quantization, and point-to-point RDMA transfers across disjoint DeviceMeshes. This design avoids rank-0 bottlenecks and inference engine modifications, saturating network fabric bandwidth while maintaining low control-plane latency.

Intent Layer: A context engineering skill for AI agents

Intent Layer is a context engineering tool that automates the creation of hierarchical AGENTS.md files to provide AI agents with high-level architectural context. By documenting folder purposes, contracts, and pitfalls, it prevents agents from wasting tokens on irrelevant files and improves debugging accuracy. This system prompt infrastructure optimizes LLM performance by bridging the gap between raw code and the mental maps used by senior engineers.

Research

The unreasonable effectiveness of pattern matching

LLMs demonstrate a surprising ability to decode "Jabberwocky" text by mapping nonsense strings to meaningful content through structural pattern-matching. This capability suggests that advanced pattern-matching is a fundamental component of intelligence rather than mere mimicry or database retrieval.

DiffusionBlocks: Block-Wise Neural Network Training

DiffusionBlocks is a principled framework that addresses the memory bottleneck of end-to-end backpropagation in transformer-based networks by enabling genuinely independent block-wise training. It leverages residual connections as updates in a dynamical system, converting them into a denoising process where each block learns independently via a score matching objective. This approach significantly reduces memory requirements, matching end-to-end training performance across diverse transformer architectures and scaling to modern generative tasks beyond classification.

Prompt Repetition Improves Non-Reasoning LLMs

Repeating the input prompt improves performance for major LLMs like Gemini, GPT, Claude, and Deepseek during non-reasoning tasks. This technique enhances model output without increasing latency or token consumption.

Using File System for Context Engineering

This paper proposes a Unix-inspired file-system abstraction for context engineering to unify fragmented RAG, tool integration, and prompt engineering practices. Implemented via the AIGNE framework, the architecture provides a persistent, governed infrastructure for managing context artefacts through a verifiable pipeline of Constructor, Loader, and Evaluator components. This approach ensures traceability and accountability in GenAI systems by treating heterogeneous context as a structured, mountable resource subject to token constraints and access controls.

EnergyNet Explained: Internetification of Energy Distribution

EnergyNet adapts packet-switched networking principles to power distribution through a software-defined architecture featuring Energy Routers, DC microgrids (ELAN/EWAN), and a dedicated control plane (EROS/ENMS). By implementing an open Energy Protocol (EP) and galvanic separation, the system enables decentralized, local-first autonomy and near-real-time energy routing. This transition from legacy synchronous grids to digitally managed distribution aims to unlock grid capacity for high-density loads like data centers and industrial electrification.

Code

I built a firewall for agents because prompt engineering isn't security

Cordum is a deterministic control plane for autonomous AI Agents and workers, utilizing NATS JetStream for messaging and Redis for state management. It provides a workflow engine with built-in policy-before-dispatch guardrails, approval gates, and capability-aware scheduling via CAP v2 wire contracts. The architecture includes a safety kernel for policy evaluation and a context engine to manage memory and context windows for LLM-driven workflows.

Homunculus – A self-rewriting Claude Code plugin

Homunculus is an experimental Claude Code plugin designed as a persistent, self-evolving assistant that maintains project-specific state. It leverages Claude Code’s architecture—including commands, subagents, skills, and hooks—to detect behavioral patterns and automatically generate new capabilities by writing its own plugin files. The system adapts its personality and technical tone to the user while attempting to automate repetitive workflows through prompt-based evolution and MCP server integration.

A 6.9B Moe LLM in Rust, Go, and Python

This project implements a 6.9B parameter MoE Transformer (1.8B active) from scratch using Rust, Go, and Python with shared CUDA kernels. The architecture incorporates MQA, SwiGLU, and NTK RoPE for context extrapolation up to 256K. Benchmarks across all three languages demonstrate that performance is dominated by optimization strategies like BLAS and SIMD rather than the host language itself.

RFC: A proposal to replace API integration with LLM Semantic Translation

The Semantic Integration Layer (SIL) is a proposed protocol that leverages LLMs as universal translators to achieve semantic interoperability across fragmented software ecosystems. By using natural language as a universal interface, SIL replaces rigid, schema-dependent protocols like REST and gRPC to eliminate API fragility and legacy debt. This shift allows modern and legacy systems to communicate without strict standardization, reducing the maintenance overhead of traditional integration layers.

create-vibe-app - a language-agnostic scaffold for AI-first coding

create-vibe-app scaffolds project structures optimized for "Vibe Coding," a methodology designed to enhance AI agent effectiveness through structured implementation and knowledge compounding. The tool generates a modular architecture featuring specialized agent roles, reusable skills, and complexity-based routing for task execution. It integrates components like MCP configurations and a project wiki to ensure AI agents can efficiently navigate, implement, and record development experiences.