Friday — May 1, 2026
PyTorch Lightning is compromised by Shai-Hulud malware, proxy models cut AI Query costs by over 100x, and TRiP offers a complete transformer engine in C from scratch.
Interested in AI engineering? Let's talk
News
The Zig project's rationale for their anti-AI contribution policy
Zig enforces a strict ban on LLM-generated contributions, prioritizing the long-term development of human contributors over the immediate value of code submissions. This "contributor poker" philosophy treats PR reviews as an investment in a developer's growth, which the project argues is undermined when LLMs are used to bridge the gap between a contributor's actual expertise and the code's quality. Consequently, high-performance optimizations from AI-heavy projects like Bun remain in forks rather than being upstreamed to the core Zig compiler.
Shai-Hulud Themed Malware Found in the PyTorch Lightning AI Training Library
PyPI versions 2.6.2 and 2.6.3 of the lightning deep learning framework have been compromised in a supply chain attack targeting AI and LLM development environments. Upon import, an obfuscated JavaScript payload exfiltrates credentials from local files, CI/CD runners, and major cloud providers while attempting to poison downstream npm packages. The malware establishes persistence by abusing Claude Code hooks and VS Code tasks to execute a Bun-based dropper in infected repositories.
Mike: open-source legal AI
Mike is an open-source, self-hostable legal AI platform designed as a transparent alternative to proprietary tools like Harvey. It features a RAG-enabled assistant for document analysis with verbatim citations, matter-scoped workspaces, and automated tabular data extraction. The system supports BYO keys for models like Claude and Gemini, allowing for custom prompt engineering and data residency within a firm's own infrastructure without per-seat licensing fees.
DataCenter.FM – background noise app featuring the sound of the AI bubble
DataCenter.FM is an interactive audio generator that simulates the ambient environment of an AI data center. It allows users to manipulate variables such as GPU load, server count, cooling intensity, and power generation to experience the auditory profile of large-scale compute infrastructure. The tool also features thematic status indicators for power usage, temperature, and "sentience" levels.
Claude.ai and API unavailable [fixed]
Anthropic has resolved a service outage that rendered claude.ai and the Claude API unavailable on April 30, 2026. The incident affected the entire Claude ecosystem, including the Console, Claude Code, and Claude for Government, with full restoration achieved within approximately 30 minutes.
Research
Estimating Black-Box LLM Parameter Counts via Factual Capacity
IKPs estimate LLM parameter counts by measuring factual storage capacity, offering a more precise alternative to inference-based estimation. Calibrated on 89 open-weight models ($R^2 = 0.917$), the benchmark shows that total parameters, rather than active parameters, determine knowledge capacity in MoE models. Results indicate that factual capacity scales log-linearly with parameters and does not compress over time, rejecting the "Densing Law" and providing a stable metric for scaling frontier models.
Beyond 80/20: High-Entropy Minority Tokens Drive Effective RL for LLM Reasoning
RLVR enhances LLM reasoning by primarily optimizing a small subset of high-entropy "forking tokens" that determine CoT pathways. Restricting policy gradient updates to these critical tokens (top 20%) matches or significantly outperforms full-gradient updates on Qwen3 models, demonstrating a strong scaling trend. These findings suggest that RLVR efficacy is concentrated in key decision-making points rather than the entire token sequence.
Performance Analysis of AI Query Approximation Using Lightweight Proxy Models
AI Queries, SQL extensions leveraging LLMs for blending structured and unstructured data, offer powerful semantic reasoning but incur high costs with frequent invocations. This paper evaluates an approximation approach that uses cheap, accurate proxy models over embedding vectors to achieve >100x cost and latency reduction for semantic filtering and significant gains for semantic ranking. These proxy models maintain or improve accuracy across large datasets. Architectures for OLAP (BigQuery) and HTAP (AlloyDB) are presented, alongside techniques to accelerate proxy model training.
Fast GPU Linear Algebra via Compile Time Expression Fusion
Bandicoot is a C++ GPU linear algebra toolkit prioritizing ease of use and efficiency, offering API compatibility with Armadillo. It leverages template metaprogramming to generate fused GPU kernels at compile time, eliminating runtime overhead and saturating memory bandwidth. Empirical results demonstrate Bandicoot's superior performance compared to toolkits like PyTorch, TensorFlow, and JAX.
Agentic Harness Engineering
Agentic Harness Engineering (AHE) automates the optimization of coding-agent environments through a closed-loop system utilizing component, experience, and decision observability. By transforming harness edits into falsifiable contracts, AHE achieved a 7.3pp gain on Terminal-Bench 2 and outperformed human-designed baselines. The resulting harnesses generalize across model families and benchmarks, with performance gains primarily driven by improvements to tools, middleware, and memory rather than system prompts.
Code
Kanwas, open-source shared context board for teams and agents
Kanwas is a multiplayer workspace that provides a shared context board for teams and AI agents to collaborate on documents, evidence, and tool calls. It features a Git-backed markdown filesystem for version control and portability, allowing users to stream agent outputs into a shared timeline and export structured artifacts to local repos for use with coding agents. The platform leverages Yjs for real-time collaboration and supports integration with Anthropic and OpenAI APIs.
TRiP – a complete transformer engine in C built from scratch just by me
TRiP is a lightweight, from-scratch C engine for Transformer models, supporting inference, training, and multimodal vision for architectures like Llama 2, Gemma, and PaliGemma. Designed for educational deep-dives into LLM internals, it implements tensor operations, backpropagation with AdamW, and BPE tokenization without external frameworks. The project features a minimal codebase that handles SafeTensors, various weight types, and RAM-optimized inference via mmap.
Agent that refuses to run commands without human approval
Fewshell is a self-hosted, cross-platform SSH copilot that integrates LLMs into terminal workflows for DevOps and MLOps. It prioritizes security through a human-in-the-loop model where every AI-generated command requires manual approval, and sensitive secrets are redacted from the LLM context. The architecture supports a BYOM approach with providers like OpenAI and Ollama, synchronizing persistent sessions across mobile and desktop via secure SSH tunnels.
Task Manager for AI Agents (MCP, Opensource)
AgentRQ is an agent-human collaboration platform that utilizes MCP to integrate LLMs like Claude directly into task management workflows. Built with a Go/Fiber backend and a Vue.js 3 frontend, the architecture features real-time SSE synchronization and a "Supervisor" CoreMCP for global multi-workspace orchestration. The platform enables agents to autonomously manage tasks, handle attachments, and communicate with human operators through a standardized suite of MCP tools.
Yiitap – AI-native Notion-style block editor
Yiitap is an AI-native, block-based WYSIWYG editor built on Tiptap and ProseMirror, supporting both Vue and React. It features built-in AI integration for content generation, native Markdown support, and over 15 custom extensions. The framework is designed for developers building AI-powered writing tools, knowledge bases, and LLM chat interfaces.