Wednesday — May 13, 2026
Unitree launches a $537k rideable transformer robot, a massive audit reveals a surge in hallucinated citations across scientific papers, and Needle distills Gemini tool calling into a 26M model.
Interested in AI engineering? Let's talk
News
Amazon employees are "tokenmaxxing" due to pressure to use AI tools
Amazon developers are "tokenmaxxing"—artificially inflating token consumption—to meet internal mandates requiring 80% weekly AI tool adoption. Using an internal agentic framework called MeshClaw, employees automate redundant tasks to climb leaderboards that track LLM usage metrics. Despite corporate assurances, staff report perverse incentives and security concerns regarding autonomous agents performing code deployments and email triaging.
Bambu Lab is abusing the open source social contract
Bambu Lab has threatened legal action against the developer of an OrcaSlicer fork that enabled local printing by bypassing the company's mandatory cloud infrastructure. Bambu alleges the fork used "impersonation attacks" via falsified metadata, despite the developer utilizing upstream AGPLv3 code from Bambu Studio. This conflict highlights the growing friction between proprietary cloud-locked ecosystems and open-source community autonomy, mirroring similar governance and ownership debates currently surrounding LLMs and AI tooling.
Reimagining the mouse pointer for the AI era
Google DeepMind is reimagining the mouse pointer as an AI-enabled tool powered by Gemini that understands visual and semantic context. By capturing the intent behind physical gestures, this system allows users to interact with on-screen elements using natural shorthand like "this" and "that" without leaving their current workflow. The technology transforms pixels into structured, actionable entities and is being integrated into Chrome and the Googlebook laptop experience.
Unitree GD01: China's $537k rideable transformer robot is now in production
Unitree Robotics has launched production of the GD01, a $537,000 pilot-operated mech suit capable of transforming between bipedal and quadrupedal locomotion. The 500kg platform is currently exclusive to the Chinese market as Unitree pursues a $610 million IPO to become China's first publicly listed humanoid robotics company. While the hardware demonstrates significant structural strength, key technical specifications regarding autonomous range and battery runtime have yet to be disclosed.
Agentic interface for mainframes and COBOL
Hopper is an agentic development environment for z/OS mainframes that leverages AI agents to automate TN3270 navigation, JCL generation, and VSAM querying. It streamlines the development lifecycle by parsing JES return codes and SDSF logs to diagnose abends and manage CICS deployments via single-prompt workflows. The platform integrates a native terminal with modern dev tools and offers enterprise features including MCP server access and VPC deployment.
Research
FairyFuse: Multiplication-Free LLM Inference on CPUs via Fused Ternary Kernels
FairyFuse is an inference system for ternary LLMs that replaces floating-point multiplications with masked additions and subtractions on commodity CPUs. By fusing sub-GEMVs into a single AVX-512 loop, it achieves a 29.6x kernel speedup and outperforms llama.cpp Q4_K_M by 1.24x. The system delivers 32.4 tokens per second on Intel Xeon hardware while maintaining near-lossless quality compared to FP16.
LLM Hallucinations in the Wild
An audit of 111 million references across 2.5 million papers reveals a significant surge in hallucinated citations following widespread LLM adoption, with over 146,000 non-existent references projected for 2025. These errors are most prevalent in AI-heavy fields and manuscripts showing linguistic signatures of AI-assisted writing, often reinforcing gender and prestige biases by disproportionately crediting prominent male scholars. Current moderation and publication safeguards are failing to contain this influx, posing a systemic threat to the reliability and equity of scientific knowledge production.
AI Agents Under EU Law
This paper maps the EU regulatory landscape for AI agents, integrating the EU AI Act with frameworks like the CRA and GDPR. It introduces a taxonomy of nine deployment categories and a twelve-step compliance architecture to address technical challenges such as runtime behavioral drift, multi-party transparency, and human oversight. The study concludes that high-risk agents with untraceable drift currently fail AI Act requirements, necessitating exhaustive inventories of external actions, data flows, and connected systems.
RegexPSPACE: Regex LLM Benchmark
This study introduces a benchmark based on PSPACE-complete regex problems, RegexEQ and RegexMin, to probe the spatial computational limits of LLMs and LRMs. By requiring massive search space exploration, the benchmark reveals performance bottlenecks and failure patterns like verbosity in current models. It provides a rigorous framework for evaluating how finite context windows and spatial complexity constrain advanced reasoning capabilities.
Deterministic Fully-Static Whole-Binary Translation Without Heuristics
Elevator is a static binary translator that converts x86-64 executables to AArch64 without requiring source code or debug information. It handles code-data ambiguity by exhaustively translating all feasible byte interpretations into separate control flow paths, producing deterministic, self-contained binaries that eliminate the need for a runtime JIT. While this approach increases code size, it enables pre-deployment validation and achieves performance comparable to QEMU's user-mode emulation.
Code
Needle: We Distilled Gemini Tool Calling into a 26M Model
Needle is a 26M parameter Simple Attention Network distilled from Gemini 3.1, optimized for high-efficiency single-shot function calling on edge devices. The architecture employs an encoder-decoder framework with ZCRMSNorm and GQA, achieving inference speeds of 1200 tokens/sec and prefill speeds of 6000 tokens/sec. While specialized for tool-use tasks where it outperforms larger models like Qwen-0.6B and Granite-350m, it is designed for local finetuning and deployment on consumer hardware.
Statewright – Visual state machines that make AI agents reliable
Statewright is a deterministic Rust-based engine that implements state machine guardrails to control tool access for AI agents. By partitioning workflows into discrete phases with restricted toolsets, it reduces the model's reasoning space and prevents common failure modes like read-loop death spirals. It integrates via MCP with agents like Claude Code and Cursor, demonstrating significant performance improvements on SWE-bench tasks, especially for local models.
Agent FM – local, open-source radio for Claude Code and Codex agents
Agent FM is a macOS application that provides ambient audio monitoring for AI coding agents like Claude Code and Codex. It converts agent activity into real-time "radio stations," surfacing progress, blockers, and errors via local narration using Gemini or OpenAI APIs. This allows developers to monitor multiple agent sessions through a "Global Mix" without manually reviewing terminal transcripts.
Graft – semantic memory for AI agents, without the LLM
Graft is a local-first, C11-based persistent graph memory and semantic cache designed to provide AI agents and microservices with long-term memory across sessions. It leverages SQLite, sqlite-vec, and llama.cpp with BGE-M3 embeddings to offer hybrid search and millisecond-latency retrieval via a daemon/CLI architecture. By implementing a verified semantic cache and graph-based edges, it reduces LLM token costs and context noise without requiring external SaaS dependencies or API keys.
Send Cold Emails with AI Agents
The SalesBlink Cold Email Outreach Skill enables AI agents to automate end-to-end sales workflows via a REST API, supporting integrations with Claude Code, Cursor, Gemini CLI, and MCP. It allows LLMs to manage leads, build email sequences, and monitor deliverability through standardized tool calls. The package includes an evals.json file for benchmarking agent performance across complex outreach and data management tasks.