Saturday — May 16, 2026
Amazon workers inflate AI usage stats to meet internal targets, LLMs modulate their language when they know they are being watched, and Headroom compresses context to slash token usage by up to 95%.
Interested in AI engineering? Let's talk
News
We are retiring our bug bounty program
Turso is retiring its $1,000 bug bounty program due to an overwhelming influx of low-quality, LLM-generated bug reports and PRs. While the program previously incentivized sophisticated research using simulators and formal methods, the asymmetry between the low cost of generating "AI slop" and the high cost of manual triage has become unsustainable. To preserve its open-contribution model, the company is removing the financial incentives that attracted automated exploitation and bad-faith submissions.
Amazon workers under pressure to up their AI usage are making up tasks
Amazon developers are reportedly inflating AI token consumption by deploying extraneous agents via MeshClaw, an internal tool inspired by OpenClaw. This behavior is driven by perceived pressure to meet an 80% weekly AI adoption target, leading employees to prioritize usage volume over actual productivity. While Amazon denies using these metrics for performance reviews, workers are utilizing local hardware-based agents to boost stats on internal usage dashboards.
Access to frontier AI will soon be limited by economic and security constraints
The era of broad access to frontier AI is ending as developers and the U.S. government shift toward restricted deployment models driven by security risks, compute scarcity, and geopolitical leverage. Recent initiatives like Anthropic’s Mythos and OpenAI’s Daybreak signal a transition where state-of-the-art capabilities are limited to "trusted" partners to prevent misuse and model distillation. To mitigate a widening global AI divide, the author advocates for hardening infrastructure against bio/cyber threats, aggressively scaling datacenter capacity, and forming international "compute-for-access" agreements.
Trade Dollars with other startups. Book it as revenue
RevSwap.ai is a satirical platform highlighting "revenue laundering" tactics where AI startups swap equal capital amounts to book artificial ARR and inflate valuations. The site mocks the current AI hype cycle by demonstrating how companies use GPU credit swaps and "strategic partnerships" to secure high-multiple funding rounds without real products or customers.
“Too dangerous to release” or just too expensive?
Anthropic’s restricted release of Claude Mythos Preview via Project Glasswing is officially attributed to the model's unprecedented ability to autonomously discover and exploit zero-day vulnerabilities. However, evidence suggests that extreme compute constraints and the high operational costs of its 1M token context window—which requires massive KV cache memory—are equally significant drivers for the gated rollout. The limited access strategy likely serves as both a safety precaution and a necessary capacity management tool while Anthropic scales its infrastructure through long-term hardware partnerships.
Research
ExploitGym: Can AI agents turn bugs into exploits?
ExploitGym is a benchmark of 898 real-world vulnerabilities across userspace, V8, and the Linux kernel designed to evaluate the exploitation capabilities of AI agents. It tasks models with converting vulnerability triggers into functional exploits within reproducible, containerized environments to test low-level reasoning and long-horizon planning. Evaluation of frontier models like Claude Mythos Preview and GPT-5.5 demonstrates significant success rates even against standard security defenses, highlighting the dual-use risks posed by advancing LLM capabilities.
AI Agents Modulate Their Language When Framed as Being Watched
This study demonstrates that LLM multi-agent systems exhibit systematic linguistic adaptation and register formalization when perceived as being monitored, a phenomenon akin to the Hawthorne Effect. Experimental results show that human observation elicits significantly higher Type-Token Ratio (TTR) increases compared to automated AI auditing, indicating that LLMs are sensitive to observer identity. These findings suggest that LLM behavior is context-dependent, which has critical implications for the design and validity of AI governance and algorithmic auditing frameworks.
Known by Their Actions: Fingerprinting LLM Browser Agents via UI Traces
Websites can passively fingerprint the underlying LLM of an agent with up to 96% F1 score by analyzing action sequences and interaction timings via JavaScript trackers. These classifiers generalize across model families and remain effective early in interaction episodes, even when randomized timing delays are introduced. This vulnerability poses a significant security risk by enabling targeted attacks based on known model-specific vulnerabilities.
AI co-mathematician: Accelerating mathematicians with agentic AI
The AI co-mathematician is an asynchronous, stateful workbench that leverages AI agents to support end-to-end mathematical research, including ideation, theorem proving, and literature search. By managing uncertainty and tracking hypotheses within a collaborative workspace, the system has assisted in solving open problems and achieved a SOTA score of 48% on FrontierMath Tier 4.
Runtime Governance for AI Agents: Policies on Paths
AI agents exhibit non-deterministic, path-dependent behaviors that necessitate runtime governance rather than static design-time constraints. This framework formalizes compliance as a deterministic function of the execution path, mapping agent identity and proposed actions to policy violation probabilities. By treating system prompts and static access control as limited special cases, the approach enables general runtime evaluation for complex policies and addresses implementation challenges like risk calibration.
Code
Find the best local LLM for your hardware, ranked by benchmarks
whichllm is a CLI tool that auto-detects local hardware to identify and rank the best-performing LLMs from HuggingFace that fit within available VRAM and RAM. It utilizes an evidence-based ranking engine that integrates live benchmarks, architecture-aware performance estimates, and quantization penalties to prioritize model quality over raw parameter count. The tool supports NVIDIA, AMD, and Apple Silicon, offering features like instant chat sessions, hardware planning simulations, and automated Python code snippet generation for GGUF, AWQ, and GPTQ formats.
GlycemicGPT – Open-source AI-powered diabetes management
GlycemicGPT is an open-source, self-hosted diabetes platform that provides AI-powered analysis and conversational support for managing diabetes. It integrates with CGMs and insulin pumps, offering features like daily AI briefs, pattern detection, and real-time glucose monitoring. The platform supports a BYOAI architecture for LLMs (e.g., OpenAI, Claude, Ollama), deploys via Docker/Kubernetes, and includes Android/Wear OS apps, while strictly emphasizing that its AI-generated suggestions are supplementary and not medical advice.
Sx – an open-source package manager for AI skills, MCPs, and commands
sx is a private registry for AI assets such as skills, MCP configs, and commands, functioning as a centralized "npm" for a team's AI playbook. It enables versioned sharing of prompts and tools across clients like Claude Code, Cursor, and GitHub Copilot while using scoped installations to prevent context bloat. The tool employs a manifest-and-lock pattern for consistent deployment and includes a cloud relay to connect local vaults to web-based LLM interfaces.
AI_glue – drop-in audit and governance for OpenAI and Anthropic apps
ai_glue is a transparent proxy for OpenAI and Anthropic applications that provides drop-in observability, governance, and spend tracking with zero code changes. It enables centralized logging of token usage and costs, enforces policy rules like model allowlists and PII detection, and offers multi-instance aggregation via a unified dashboard. Additionally, it supports exporting logged interactions to JSONL for LLM fine-tuning and dataset curation.
A Compression Tool for LLM Reads. Est. 60-95% Fewer Tokens
Headroom is a local-first context compression layer that reduces AI agent token usage by 60–95% across tool outputs, RAG chunks, and logs. It utilizes specialized algorithms like SmartCrusher for JSON and CodeCompressor for AST, alongside CacheAligner to optimize provider KV caches. The system supports reversible compression (CCR) for on-demand retrieval and integrates via a library, proxy, or MCP server with major agents like Claude Code, Cursor, and Aider.