Monday — January 5, 2026

AI engineered a Rust-style C++ static analyzer, gravity emerges from entropic information rearrangement, and C-Sentinel uses LLM reasoning for system security analysis.

Interested in AI engineering? Let's talk

News

Neural Networks: Zero to Hero

Andrej Karpathy’s "Neural Networks: Zero to Hero" provides a code-first deep dive into building neural networks from scratch, starting with fundamental backpropagation and manual gradient flows. The curriculum progresses through MLPs, BatchNorm, and WaveNet architectures, culminating in the implementation of a GPT model and a BPE tokenizer. It emphasizes a first-principles understanding of the components driving modern LLMs, including the Transformer architecture and the intricacies of the tokenization pipeline.

Lessons from 14 years at Google

Addy Osmani shares 21 lessons from 14 years at Google, emphasizing that long-term engineering impact stems from solving user problems and achieving team alignment rather than technical cleverness. He advocates for a bias toward action, using AI to accelerate prototyping, and prioritizing code clarity to reduce operational risk and cognitive overhead. Key insights for technical leaders include managing "innovation tokens" in tech stacks, recognizing that alignment is the primary bottleneck at scale, and understanding that abstractions inevitably leak during system failures.

Eurostar AI vulnerability: When a chatbot goes off the rails

Pen Test Partners identified four vulnerabilities in Eurostar's LLM-backed chatbot, including guardrail bypass, prompt injection for system prompt exfiltration, HTML injection leading to self-XSS, and unverified conversation/message IDs. The guardrail bypass exploited weak server-side validation, where only the latest message's signature was checked, allowing client-side manipulation of prior chat history to inject malicious payloads. This case underscores that fundamental web and API security principles, such as robust server-side enforcement, input sanitization, and secure ID management, remain critical for LLM implementations.

Building a Rust-style static analyzer for C++ with AI

A C++ static analyzer, rusty-cpp, has been developed to introduce Rust-like memory safety features, including borrow checking and const for non-mutability, into existing C++ projects via comment-based annotations and external library definitions. This complex tool was largely engineered by an AI coding assistant (Claude), which iteratively designed, implemented, and debugged the analyzer, showcasing the rapid evolution and advanced engineering capabilities of LLMs.

Developing a BLAS Library for the AMD AI Engine [pdf]

aieblas is a BLAS library for AMD/Xilinx AI Engines, addressing the challenge of programming these spatial dataflow architectures for general numerical computations. It automatically generates AIE kernels, ADF graphs, and PL kernels from a high-level JSON specification, enabling chained BLAS routines and incorporating optimizations like tiling. Evaluation shows that while single AIE routines are slower than CPU, aieblas achieves comparable performance to CPU BLAS for complex, pipelined operations by efficiently leveraging the AIE's dataflow architecture.

Research

KGGen: Extracting Knowledge Graphs from Plain Text with Language Models

KGGen is introduced as a text-to-KG generator addressing the scarcity of high-quality KG data for foundation models. It leverages LLMs to create KGs from plaintext, uniquely clustering related entities to reduce sparsity. Available as a Python library, KGGen also comes with the MINE benchmark, demonstrating superior performance over existing extractors.

What Drives Success in Physical Planning with JEPA World Models?

This work investigates world models that plan in learned representation spaces, characterized as JEPA-WMs, to address the challenge of efficient generalization for AI agents in physical tasks. The study comprehensively analyzes the impact of model architecture, training objectives, and planning algorithms through experiments in simulated and real-world robotic environments. The findings are integrated into a proposed model that outperforms baselines like DINO-WM and V-JEPA-2-AC in navigation and manipulation tasks.

On the quantum mechanics of entropic forces

This paper introduces microscopic quantum models demonstrating how gravity, specifically Newton's law, can arise from the entropic re-arrangement of information. It proposes a mechanism where gravity emerges from the free energy extremization of qubits or oscillators, rather than from virtual field quanta. The authors present both local and non-local constructions and suggest methods to experimentally distinguish these entropic models from ordinary perturbative quantum gravity.

Mainframe-Style Channel Controllers for Modern Disaggregated Memory Systems

Near-Data Processing (NDP) adoption is hindered by the lack of a clear OS-centric abstraction, despite its potential for memory bottleneck alleviation and renewed interest with disaggregated memory like CXL. This work proposes "memory channel controllers" as a portable, virtualizable OS abstraction for NDP in modern disaggregated memory systems. This approach enables OS integration without CPU changes and leverages cache coherence from emerging interconnects for a richer, more fine-grained programming model.

Pushing the Memory Bandwidth Wall with CXL-Enabled Idle I/O Bandwidth Harvesting

Server CPUs face memory bandwidth limitations per core due to pin constraints and fragmented memory/I/O bandwidth allocation, hindering memory-intensive workloads. SURGE is a software-supported architectural technique that addresses this by dynamically multiplexing memory and I/O traffic over the same processor interface (e.g., CXL), enabling fungibility of off-chip bandwidth. This salvages idle I/O bandwidth for memory, accelerating memory-intensive workloads by up to 1.3x.

Code

Hover – IDE style hover documentation on any webpage

Hover is a browser extension that provides IDE-like code documentation on hover for code snippets on any webpage, including AI chat applications like ChatGPT and Claude. It requires configuration with an OpenRouter API key or a custom OpenAI client-compatible endpoint, and users must specify target websites using URL patterns.

AI sycophancy panic

Vibesbench is a conversational AI benchmark that evaluates LLM fluency and linguistic pragmatics through semi-structured, multi-turn dialogues. It focuses on assessing models as interactive collaborators, emphasizing conversational coherence, interpretation, and emergent synthesis over task-oriented responses. The benchmark critiques current AI evaluation methods and model behaviors like "sycophancy" and epistemic rigidity, advocating for human-centric assessment of AI's "voice" and its utility in open-ended, dialogic exploration.

C-Sentinel: System prober that captures “system fingerprints” for AI analysis

C-Sentinel is a lightweight C-based system prober for UNIX systems that captures comprehensive "system fingerprints" and security events via auditd integration. It leverages LLM reasoning for AI-assisted analysis to identify non-obvious risks, provide causal reasoning, and synthesize context from system data. The tool offers explainable risk scoring, a multi-user web dashboard with features like 2FA and API keys, and proactive alerts, aiming to add "wisdom" to traditional observability data.

Krowdovi – Video-based indoor navigation on a DePIN creator economy

Krowdovi is a DePIN platform built on Solana that incentivizes videographers to create first-person indoor navigation videos using a burn-and-mint tokenomics model. It leverages AI for multi-language translation of navigation overlays and text-to-speech (TTS) for accessibility, enhancing guidance in complex spaces. The platform features motion-controlled video playback, a creator studio for overlay editing, and a reputation system that influences creator rewards.

llmnop – Rust CLI for benchmarking LLM endpoints

llmnop is a benchmarking tool designed for LLM inference endpoints, compatible with any OpenAI-compatible API. It provides critical performance metrics including time-to-first-token, inter-token latency, throughput, and end-to-end latency. Users can customize parameters such as the number of concurrent requests, input/output token lengths, and specify Hugging Face tokenizers, with results presented on stdout and detailed JSON files.