Saturday — May 2, 2026

Uber exhausts its 2026 AI budget on Claude Code, Tessera optimizes inference for heterogeneous GPUs, and Superkube rewrites Kubernetes in Rust.

Interested in AI engineering? Let's talk

News

Grok 4.3

Grok 4.3 is a multimodal LLM from xAI featuring a 1M token context window and native reasoning capabilities. It supports function calling, structured outputs, and prompt caching, with pricing at $1.25/1M input and $2.50/1M output tokens. The platform provides REST, gRPC, and WebSocket APIs, alongside integrated tools for RAG, code execution, and MCP.

Uber torches 2026 AI budget on Claude Code in four months

Uber exhausted its entire 2026 AI budget in just four months due to the rapid internal adoption of Claude Code and Cursor. With 95% of engineers using these tools and 70% of committed code now originating from AI, monthly API costs have reached up to $2,000 per developer. While Cursor usage has plateaued, Claude Code’s multi-step capabilities have driven a surge in consumption that far exceeded initial financial forecasts.

AI uses less water than the public thinks

Quantitative analysis indicates that AI data center water consumption in California, primarily for evaporative cooling, accounts for a negligible 0.055% to 0.7% of the state's total human water use. Using LLMs to generate preliminary estimates demonstrates their utility in providing quick, physics-based quantitative assessments for policy discussions. While local water scarcity in arid regions remains a concern for specific new facilities, the aggregate water footprint of AI infrastructure is currently minimal compared to sectors like agriculture.

Spotify adds 'Verified' badges to distinguish human artists from AI

Spotify is implementing a "Verified" badge system to distinguish human artists from AI-generated personas using heuristics such as social media integration, listener telemetry, and physical-world signals like touring and merchandise. The initiative aims to mitigate the proliferation of "content farms" and synthetic music projects by prioritizing artists with established cultural contributions. However, critics note that the badge verifies the authenticity of the artist's identity rather than providing granular content-level labeling for AI-assisted production.

AI CAD Harness

Adam Fusion is an AI copilot that utilizes agents to natively drive Autodesk Fusion 360 CAD workflows. It can be deployed via CLI scripts or manual bundle installation into the local AddIns directory. Once activated through the Fusion Add-Ins manager, the extension provides a docked chat interface for agentic CAD manipulation.

Research

Xmemory: Benchmarking Structured AI Memory Against RAG and Hybrid RAG

Traditional AI memory relies on RAG-style retrieval, which is often insufficient for production tasks requiring exact state management and relational updates. This paper proposes a schema-grounded architecture that shifts interpretation to the write path through iterative extraction and validation gates. Their system, xmemory, significantly outperforms frontier LLM baselines on memory benchmarks, demonstrating that structured schemas are more critical than retrieval scale for maintaining a reliable system of record.

Agentic Harness Engineering

Agentic Harness Engineering (AHE) introduces a closed-loop system to automate the manual process of designing harnesses for coding agents, which mediate tool interaction. It leverages three observability pillars—component, experience, and decision—to enable autonomous evolution by treating edits as falsifiable contracts. AHE significantly boosts agent performance on benchmarks like Terminal-Bench 2 and SWE-bench-verified, with the evolved harnesses demonstrating strong transferability and encoding general engineering experience in tools and memory rather than prompt-level strategy.

C8s: A Confidential Kubernetes Architecture

C8s is a confidential computing architecture for Kubernetes, built on hardware TEEs (e.g., AMD SEV-SNP, Intel TDX), that provides cryptographically rooted confidentiality, integrity, and verifiability for clusters. It establishes an attestation-rooted trust boundary around confidential VMs, compatible with managed Kubernetes services where the control plane cannot be attested. This enables secure deployment of sensitive AI/LLM workloads, including model inference, securing model weights, and training/fine-tuning on proprietary data, protecting them from infrastructure operators and cloud providers.

Embedded Rust or C Firmware? Lessons from an Industrial Microcontroller Use Case

An industrial case study comparing Rust and C for IoT firmware development demonstrates that Rust achieves parity in execution speed and memory footprint. The Rust-based Ariel OS provides a more efficient runtime than traditional bare-metal C stacks, establishing Rust as a viable alternative for performance-critical microcontroller systems.

Tessera: Unlocking Heterogeneous GPUs Through Kernel-Granularity Disaggregation

Tessera is a kernel-level disaggregation system designed to optimize large model inference on heterogeneous GPU clusters by mapping individual kernels to hardware based on specific resource demands. It utilizes PTX-based dependency analysis, pipelined execution to overlap communication, and workload-aware scheduling to achieve up to 2.3x throughput and 1.6x cost efficiency improvements. Unlike previous coarse-grained methods, Tessera generalizes across model architectures and can outperform homogeneous high-end GPU configurations.

Code

Loopsy, a way for terminals and AI agents on different machines to talk

Loopsy is a self-hosted platform that enables remote control of terminal-based AI agents and shells via a mobile interface. It utilizes a Cloudflare Worker relay to bridge a laptop daemon and mobile client through WebSockets, providing a full PTY with persistent sessions and voice input. Additionally, Loopsy includes an MCP server for local network peer discovery, allowing LLM agents to perform cross-machine execution, file transfers, and shared state management.

Destiny – Claude Code's fortune Teller skill

Destiny is a Claude Code plugin that provides personalized fortune-telling by combining deterministic local computation with LLM-based interpretation. It uses a Python-based engine to calculate birth charts and I-Ching hexagrams from user data, which Claude then translates into generative prose. The tool operates entirely locally and integrates directly into the Claude Code CLI via the /plugin system.

Superkube - Rewriting Kubernetes in Rust

Superkube is a minimal, single-binary Kubernetes-compatible control plane written in Rust that integrates the API server and node agent into a single process. It supports core workloads, networking, and scheduling via standard kubectl commands, utilizing SQLite or PostgreSQL for state management. The architecture enables multi-master HA through database-level leases and supports container runtimes like Docker and libcontainer, though it currently lacks full CNI networking and RBAC enforcement.

Friday Studio AI runtime: Turn prompts, skills, & tools into reliable config

Friday Studio is an AI agent orchestration platform that manages autonomous agents through a daemon-controlled workspace lifecycle. Agents are triggered by HTTP, CLI, or cron signals and execute within sessions providing MCP tool access. The platform supports Python-based "user" agents for deterministic tasks and "LLM" agents for complex reasoning, integrated into a structured design-to-ship workflow.

MemHub, Turn Your GPT/Claude/Gemini History into LLM-Wiki Mindmap

XTrace MemHub converts chat histories from LLMs like ChatGPT, Claude, and Gemini into a structured Markdown "LLM-Wiki mindmap." It extracts AI memory and context, storing it in an encrypted vector database, and organizes it for tools like Obsidian. Users can export their chat data via a Chrome extension, process it in MemHub, and then download a ZIP of Markdown files for personal knowledge management systems.