Sunday — May 10, 2026

Gemini API File Search gains multimodal RAG support, the DELEGATE-52 benchmark reveals LLMs corrupt 25% of long-form documents, and WRIT-FM powers a 24/7 autonomous AI radio station.

Interested in AI engineering? Let's talk

News

Meta's embrace of AI is making its employees miserable

Meta has implemented mandatory activity tracking for tens of thousands of U.S. employees to harvest behavioral data for training AI models on task automation. The initiative monitors keystrokes, mouse movements, and screen content with no opt-out option, sparking significant internal backlash over privacy. This aggressive data collection strategy highlights the friction in Meta's transition toward an AI-centric infrastructure and model development.

All my clients wanted a carousel, now it's an AI chatbot

AI chatbots have replaced carousels as the primary social signal for modern websites, driven by a client's fear of appearing technologically obsolete. Despite issues with hallucination and poor UX, these LLM-powered widgets are prioritized over performance-oriented, minimal designs because they signal "innovation" rather than utility. Ultimately, the chatbot has become a decorative feature that sacrifices site speed and clarity for the sake of perceived industry alignment.

People Hate AI Art

Using AI-generated imagery in professional contexts often signals low social literacy and carries significant reputational risk due to widespread public distaste. The author argues that the game theory of AI art favors human-made alternatives—ranging from crude manual edits to professional commissions—to avoid being perceived as lazy or associated with "grifter" culture. For tech professionals, opting for human-centric art preserves credibility and demonstrates social awareness.

Gemini API File Search is now multimodal

Google has updated the Gemini API File Search tool with multimodal support, custom metadata, and page-level citations to enhance RAG workflows. Powered by the Gemini Embedding 2 model, the tool now natively processes and retrieves both text and visual data without manual preprocessing. These updates allow for more precise query filtering and improved grounding through granular source attribution.

Go Players Disempower Themselves to AI

The integration of superhuman AI in Go serves as a case study for "gradual disempowerment," where users outsource cognitive labor while maintaining an illusion of agency. Analysis of cheating and training habits reveals that players often mimic AI-generated policies without internalizing the underlying logic, leading to stagnant performance in off-policy scenarios. This mirrors trends in software engineering and LLM-assisted writing, where passive verification of AI outputs replaces the active construction of knowledge, potentially leading to a civilizational loss of autonomy and skill.

Research

LLMs corrupt your documents when you delegate

DELEGATE-52 is introduced as a new benchmark to assess LLM readiness for long, delegated workflows requiring in-depth document editing across 52 professional domains. Experiments with 19 LLMs, including frontier models, revealed significant document degradation, with models corrupting an average of 25% of content by the end of long workflows. Agentic tool use did not improve performance, and degradation severity was exacerbated by document size, interaction length, or distractor files, indicating current LLMs are unreliable delegates due to silent, compounding errors.

Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval

SIRA (SuperIntelligent Retrieval Agent) replaces inefficient multi-round exploratory retrieval with a single corpus-discriminative action. It combines offline LLM document enrichment with online query expansion, using document-frequency statistics to filter terms and maximize retrieval margin. This training-free approach executes a single weighted BM25 call, outperforming dense retrievers and multi-round agentic baselines on BEIR benchmarks with lower latency and higher interpretability.

LeWorldModel: Stable End-to-End Predictive Architecture from Pixels

LeWorldModel (LeWM) is a JEPA that enables stable end-to-end training from raw pixels using only two loss terms: next-embedding prediction and Gaussian regularization. This approach eliminates the need for complex multi-term losses or pre-trained encoders, reducing hyperparameter tuning while achieving 48x faster planning than foundation-model-based world models. Despite its 15M parameter footprint, LeWM remains competitive in control tasks and effectively encodes physical structures for anomaly detection.

423.7 and 426.5 TB/S GMI Bi-Directional HCF Transmission

Bi-directional OESCL-band transmission over 60 km HCF achieved an aggregate throughput of ~850 Tb/s across 42.5 THz of bandwidth. The system yields GMIs comparable to high-end unidirectional SMF, demonstrating a high-capacity interconnect solution for data-intensive infrastructure.

AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search

AutoKernel is an open-source framework that uses an autonomous agent loop to optimize GPU kernels for PyTorch models. It identifies bottlenecks via profiling and iteratively refines Triton or CUDA C++ implementations through a five-stage correctness harness. On NVIDIA H100 GPUs, AutoKernel significantly outperforms both PyTorch eager and torch.compile (max-autotune) across key transformer operations, including RMSNorm, softmax, and cross-entropy.

Code

KillClawd – a sarcastic AI desktop crab by local Ollama

KillClawd is an Electron-based desktop pet that integrates local LLMs via Ollama to drive a reactive, physics-enabled AI character. It utilizes a two-tier architecture—a fast model for streaming chat and a background model for generating existential thoughts and environmental observations. To optimize reliability on small models like Qwen, the system employs few-shot completion prompts and maintains dynamic response pools that are asynchronously refreshed by the LLM.

A Field Study of Institutional Control in an AI-Staffed Prediction-Market Desk

This technical report proposes an institutional-control framework to manage high-skill AI labor, prioritizing auditability and governance over simple agent autonomy. Using an AI-staffed prediction-market desk as a field study, the research introduces a portable layer for models and workflows that formalizes authority, verification, and replay mechanisms. The work demonstrates how institutional design can provide a structured architecture for ownership and governed adaptation in complex AI environments.

Nexa-gauge – Cache/cost-aware graph-based eval for LLM and RAG

nexa-gauge is a Python package and CLI toolkit designed for graph-based evaluation of LLM, RAG, and agentic system outputs. It features a cache-aware execution engine with cost estimation, providing repeatable metrics and structured reports for prompt iteration, benchmarking, and production evaluation. The system evaluates output quality across key areas including relevance, grounding, red team scoring, GEval (LLM-as-a-judge), and reference metrics.

24/7 AI-powered radio station. Generates music, writes hosted breaks,speaks them

WRIT-FM is a 24/7 autonomous radio station that leverages Claude CLI for scriptwriting and Kokoro TTS for voice synthesis across five distinct AI personas. The architecture features a Claude Code operator loop for content maintenance and stream health, while music is generated via ACE-Step. The stack utilizes Icecast and ezstream for delivery, managed through a tmux-based CLI and a Python-driven playlist feeder.

ToolOps: One Decorator Away from Production-Ready AI Agents

ToolOps is a framework-agnostic Python SDK that functions as a resilience and efficiency layer for AI agent tools. It provides production-grade features like semantic caching, circuit breakers, request coalescing, and automatic retries through simple decorators. The library supports multiple backends including Postgres and Memory, offers built-in observability via OpenTelemetry and Prometheus, and integrates seamlessly with LangChain, CrewAI, and MCP.