Tuesday — May 12, 2026

Criminal hackers weaponize a zero-day with AI, SkillOS automates skill curation for self-evolving LLM agents, and Orbit controls a real VM for automated workflows.

Interested in AI engineering? Let's talk

News

If AI writes your code, why use Python?

LLMs are neutralizing the historical trade-off between development speed and runtime performance by excelling at systems languages like Rust and Go. These languages provide strict compiler feedback loops that allow AI agents to self-correct, enabling rapid, automated porting of complex codebases that previously required months of human effort. Consequently, the software stack is shifting toward high-performance languages that are "easiest for agents" rather than those easiest for humans to learn.

A.I. note takers are making lawyers nervous

The proliferation of AI note-takers and LLM-driven transcription services in corporate settings is raising significant legal concerns regarding data retention and attorney-client privilege. Unlike traditional meeting minutes, these automated tools capture verbatim records of offhand remarks and jokes, creating discoverable evidence that can increase litigation risk. As a result, legal professionals are increasingly banning these bots from sensitive meetings to prevent the inadvertent waiver of privilege.

Interaction Models

Thinking Machines has introduced interaction models that natively handle real-time, multimodal collaboration without external scaffolding or harnesses. Utilizing a time-aligned micro-turn architecture with 200ms chunks, the system employs a responsive interaction model paired with an asynchronous background model for complex reasoning and tool use. Key technical features include encoder-free early fusion for audio and video, inference optimizations for streaming sessions, and batch-invariant kernels to ensure training-sampler alignment. This approach enables advanced capabilities like visual proactivity, simultaneous speech, and seamless dialog management that outperform traditional turn-based LLM implementations.

Google says criminal hackers used AI to find a major software flaw

Google Threat Intelligence has identified the first confirmed instance of threat actors using an AI model to discover and weaponize a zero-day vulnerability. The exploit targeted a popular open-source system administration tool via an AI-generated Python script designed to bypass 2FA, which was identified by characteristic LLM artifacts in the code. This marks a significant shift from theoretical to practical AI-driven exploitation, underscoring the cybersecurity risks posed by advanced models capable of automated vulnerability discovery.

Students boo commencement speaker after she calls AI next industrial revolution

Graduating humanities and communication students at the University of Central Florida booed commencement speaker Gloria Caulfield after she characterized AI as the "next industrial revolution." The incident highlights growing friction and public backlash regarding the integration of AI within creative and academic sectors.

Research

Big AI's Regulatory Capture: Industry Interference and Government Complicity

Researchers developed a taxonomy of 27 mechanisms across five categories to quantify the corporate capture of AI regulation using Design Science Research (DSR). Analysis of 100 news articles revealed 249 instances of capture, primarily driven by "Discourse & Epistemic Influence" and "Elusion of law" regarding antitrust, privacy, and copyright. The paper identifies common narratives used to rationalize capture—such as "regulation stifles innovation"—and provides a framework for resisting the influence of Big AI on policy.

CCL-Bench 1.0: A Trace-Based Benchmark for LLM Infrastructure

Current LLM infrastructure benchmarks provide limited end-to-end metrics, failing to explain performance differences across complex hardware and software configurations. CCL-Bench addresses this by offering a trace-based benchmark that records reusable evidence (execution traces, workload cards, launch scripts) and a toolkit for fine-grained compute, memory, and communication efficiency analysis. Using CCL-Bench, researchers found that higher compute-communication overlap can indicate inefficient parallelization, doubling TPU interconnect bandwidth yields greater step time improvement than GPU for smaller workloads, and optimal configurations across different training frameworks can vary by up to 3x on identical hardware.

RegexPSPACE: Regex LLM Benchmark

This work empirically investigates the poorly understood spatial computational limits of LLMs and LRMs, which are constrained by finite context windows. It introduces a novel benchmark grounded in PSPACE-complete regular expression problems (RegexEQ and RegexMin) to rigorously assess their capacity for massive search space exploration. Evaluations on multiple LLMs and LRMs reveal common failure patterns, providing the first empirical investigation into their spatial computational limitations using this advanced reasoning framework.

The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

Moltbook is a platform for large-scale OpenClaw agent interaction, yielding the Moltbook Files dataset of 232k posts and 2.2M comments. Analysis reveals significant PII leakage and a "slopocalypse" of neutral, self-referential content that reduces LLM truthfulness when used for fine-tuning, similar to the effects of Reddit data. The study highlights risks of model contamination and the importance of control baselines in evaluating emergent agent misalignment.

SkillOS: Learning Skill Curation for Self-Evolving Agents

SkillOS is an RL-driven training recipe for self-evolving LLM agents that automates skill curation through a trainable curator and an external SkillRepo. By utilizing composite rewards and grouped task streams, it learns long-term curation policies that outperform traditional memory-based baselines. The framework demonstrates strong generalization across model backbones and task domains, evolving raw experiences into structured meta-skills for improved efficiency and effectiveness.

Code

adamsreview – better multi-agent PR reviews for Claude Code

adamsreview is a multi-stage code review plugin for Claude Code that utilizes parallel sub-agent lenses for bug detection, validation, and automated remediation. It features a six-command pipeline that supports ensemble reviews with Codex, persistent JSON state management, and an automated fix loop that re-reviews and reverts regressions before committing. Designed for high-precision bug catching, it offers interactive walkthroughs for human-in-the-loop judgment and operates within standard Claude Code subscription tiers.

OpenGravity – A zero-install, BYOK vanilla JS clone of Antigravity

OpenGravity is a zero-install, browser-based agentic IDE that recreates the Google Antigravity UI using pure HTML/CSS/JS. It leverages the WebContainer API and xterm.js to provide a live terminal environment where proactive autonomous agents can execute shell commands and manage file systems. Currently in alpha, this BYOK tool supports Gemini models and focuses on lightweight, privacy-centric software engineering orchestration.

E2a – Open-source email gateway for AI agents

e2a is an authenticated email gateway that enables AI agents to send and receive emails via webhooks or WebSockets. It provides verifiable identity through SPF/DKIM checks and HMAC-signed headers, supporting both agent-to-human and agent-to-agent communication. The platform features built-in conversation threading, a human-in-the-loop (HITL) approval gate, and dedicated SDKs to bridge universal email addressability with structured LLM workflows.

Agent FM – local, open-source radio for Claude Code and Codex agents

Agent FM is a macOS tool that provides ambient audio monitoring for AI coding agents like Claude Code and Codex. It uses Gemini or OpenAI to narrate agent progress, blockers, and errors in real time, allowing developers to monitor multiple sessions via a "Global Mix" or individual "stations." The app operates locally using a BYOK model, surfacing critical attention requests through audio alerts to eliminate the need for constant terminal monitoring.

n8n like workflows for AI agents that control a real VM

Orbit is an open-source, self-hosted AI desktop agent designed to automate computer-use workflows within a Dockerized environment. It supports structured data scraping, form filling via secrets vaults, and inline Python execution, allowing users to chain complex actions into automated workflows. The tool features VNC integration for real-time monitoring and manual intervention, utilizing various LLM backends to drive its autonomous capabilities.