Thursday — May 28, 2026
YouTube begins automatically labeling AI-generated videos, FuzzingBrain V2 identifies 29 zero-day vulnerabilities, and CoreTex introduces a biomimetic AI harness with neuroanatomy-inspired memory.
Interested in AI engineering? Let's talk
News
I'm Tired of Talking to AI
The author highlights a growing frustration with the "dead internet" phenomenon, where human interactions on platforms like GitHub and Reddit are increasingly replaced by unverified LLM outputs. This trend extends to professional environments, where individuals blindly forward AI-generated content without verifying its accuracy, leading to a breakdown in meaningful technical and business discourse.
YouTube to automatically label AI-generated videos
YouTube is updating its AI disclosure framework by moving labels for photorealistic or meaningfully altered content to more prominent positions, such as video overlays and descriptions. The platform is also introducing automated detection systems to apply labels when creators fail to disclose AI usage, utilizing internal signals and C2PA metadata. While creators can appeal most automated labels, disclosures are permanent for content generated via YouTube’s native AI tools or verified through C2PA standards.
Training our own AI models
PostHog is transitioning toward "self-driving" products by training proprietary models on internal user data to automate manual analysis and reduce token costs. Key initiatives include scaling session replay analysis, implementing synthetic user testing to predict friction pre-production, and leveraging behavioral data for conversion optimization. The company will perform all training in-house using anonymized data, bypassing third-party providers to maintain privacy and data sovereignty.
AI tools are only as good as your judgment
To prevent the atrophy of engineering judgment, developers should shift from passive consumption to "adversarial use" of AI. Rather than accepting LLM outputs wholesale, engineers should interrogate generated code for edge cases, implicit assumptions, and security risks. This "generate, interrogate, revise" loop ensures that AI serves as a tool for sharpening technical judgment rather than a replacement for it.
Why AI Agents Cannot Change Software Systems
LLMs excel at additive code generation via pattern-matching but lack the causal reasoning required for transformative system maintenance. They struggle to preserve system invariants and manage temporal dependencies across complex, interdependent codebases, making them currently unreliable for autonomous, system-aware PR delivery. Consequently, AI agents should be viewed as assistants that elevate the human role to high-level architectural judgment and intent rather than replacing engineering expertise.
Research
Multi-Agent LLM System for Automated Vulnerability Discovery and Reproduction
FuzzingBrain V2 is a multi-agent system designed to address LLM limitations in vulnerability detection, such as high false positive rates and suboptimal localization granularity. It integrates OSS-Fuzz for fuzzer-reproducible reporting and introduces "Suspicious Point" control-flow abstractions alongside MCP-based analysis tools to improve reasoning over complex cross-function dependencies. The system achieved a 90% detection rate on the AIxCC 2025 dataset and identified 29 confirmed zero-day vulnerabilities in real-world projects.
Spreadsheet-RL: Advancing LLM Agents on Realistic Spreadsheet Tasks
Spreadsheet-RL is a reinforcement learning fine-tuning framework designed to train LLM-based agents for complex, multi-step spreadsheet tasks within a realistic Excel environment. It introduces Spreadsheet Gym, a Python-based sandbox for multi-turn RL, and the Domain-Spreadsheet benchmark for evaluating specialized workflows in finance and supply chain management. By utilizing an automated data collection pipeline and domain-specific evaluation, Spreadsheet-RL significantly improves Pass@1 performance on both general and specialized spreadsheet benchmarks.
Harness Sensitivity Is Non-Monotone Across LLM Agent Tiers
A study of 432 runs across four model tiers refutes the assumption that higher-capability LLMs require less structural guidance. Results demonstrate that harness sensitivity is non-monotone and model-specific: Gemini 2.5 Flash performance dropped with increased harness verbosity, while Qwen3.5-122B (reasoning model) excelled under strict constraints. The research introduces a failure taxonomy showing that format violations dominate high-tier failures, while low-capability models primarily struggle with file selection.
Tool-schema compression enables agentic RAG under constrained context budgets
Agentic RAG systems face a resource conflict where tool schemas compete with retrieved context for window space. A systematic study of 14 models demonstrates that TSCG compression (44-50% token savings) mitigates context overflow, yielding a +20.5 pp EM lift at 8K context where standard JSON schemas fail. Scaling tests show compressed schemas support >800 tools compared to ~494 for JSON, establishing schema compression as essential infrastructure for context-constrained deployments.
Pimmur, can LLM simulate human collective behavior?
A systematic audit of 39 LLM-based social simulations identifies six pervasive methodological flaws (PIMMUR) that undermine their validity. The study reveals that 89.7% of research violates these principles, often due to excessive prompt control or models identifying the experimental context. Reproductions demonstrate that many "emergent" collective behaviors are actually artifacts of these flaws, indicating that current LLM simulations may capture model biases rather than authentic human social dynamics.
Code
Ripgrep AI Policy
ripgrep (rg) is a high-performance, Rust-based search tool that recursively scans directories while automatically respecting gitignore rules and skipping binary files. It utilizes SIMD optimizations and parallel directory iteration to significantly outperform traditional tools like grep and ag. For AI and LLM applications, it serves as an efficient utility for large-scale codebase indexing, data preprocessing, and RAG pipeline integration.
CoreTex – An Open-Source, Unix-like, biomimetic, flat-file AI Harness
CoreTex is a UNIX-inspired, biomimetic agentic control plane and knowledge engine built on a flat-file architecture with neuroanatomy-inspired components. It features a 5-tier memory stack using SQLite FTS5 for vectorless search and "Cerebellar" muscle memory for zero-token task execution. The system provides a secure, multi-agent environment for code execution via Deno/WASM sandboxing and integrates deeply with Obsidian vaults and external sensory peripherals.
Claude Code's $200 plan is a 17× subsidy on the raw API
coral-ai provides tools for high-throughput LLM agent inference, optimizing token economics by paying for context once rather than on every turn. It addresses the significant cost of re-reading input tokens by offering components for build-time context preparation (clean, chunk, embed, enrich, hydrate), an efficient gRPC GPU embedding server, and memory backends for agent frameworks like CrewAI and LangChain. The investment_analyst swarm layer exemplifies its parallel agent capabilities over private data.
theta: a humble approach to harness agnostic configuration
Theta is a Rust CLI and package manager for agent configurations defined by theta-spec. It allows developers to resolve, lock, and materialize agent resources—including rules, MCP tools, and skills—into a local .theta/ directory. The tool facilitates interoperability by "casting" these configurations to and from various harnesses such as Claude Code, Cursor, and GitHub Copilot.
VAEN – Package and import portable AI coding-agent Harnesses
VAEN is a portable CLI for packaging and importing agentic coding setups into OCI-backed .agent archives. It bundles instructions, skills, and MCP declarations defined in an agent.yaml manifest while ensuring secrets and credentials are never transported. The tool provides a standardized workflow to validate, build, and import these configurations into repositories for clients like Codex, Claude Code, and Copilot.