Sunday — April 26, 2026

OpenAI launches a $25,000 bio-risk bug bounty for GPT-5.5, the DELEGATE-52 benchmark reveals LLMs corrupt 25% of delegated documents, and HATS enables AI agents to debate using the Six Thinking Hats method.

Interested in AI engineering? Let's talk

News

Open source memory layer so any AI agent can do what Claude.ai and ChatGPT do

Stash is an open-source persistent memory layer for AI agents that utilizes PostgreSQL and pgvector to provide long-term, cross-session context. Unlike RAG, which focuses on document retrieval, Stash employs a background consolidation pipeline to synthesize raw episodes into structured facts, causal relationships, and hierarchical namespaces. It is MCP-native and model-agnostic, allowing agents to track goals, learn from failures, and maintain a self-model across any OpenAI-compatible backend.

GPT‑5.5 Bio Bug Bounty

OpenAI has launched a Bio Bug Bounty program for GPT-5.5 to identify universal jailbreaks related to biological risks. Vetted researchers are challenged to find a single prompt that bypasses moderation for five specific bio safety questions within Codex Desktop, with a top reward of $25,000. The initiative requires an NDA and aims to strengthen red teaming and safety safeguards for frontier LLMs.

Agents Aren't Coworkers, Embed Them in Your Software

Current AI agents often impose high cognitive load by mimicking human conversation rather than functioning as "calm technology." To improve efficiency, agents should be embedded into software using machine-friendly patterns like CLIs, declarative specs, and reconciliation loops. Integrating agents with Change Data Capture (CDC) and incremental processing engines allows them to react to precise event streams, eliminating the need for expensive polling and reducing token consumption.

Do I belong in tech anymore?

The author attributes their resignation to the "psychic toll" of pervasive AI integration, citing workflows where AI-generated code, reviews, and documentation bypass human oversight. This trend toward "vibe coding" and automated summaries erodes institutional knowledge and devalues the necessary friction of human communication in engineering. Ultimately, the text critiques the industry's shift toward AI-driven shortcuts, arguing that prioritizing speed over craftsmanship leads to a loss of professional ideals and ethical accountability.

California Coastal Community Must Reject CBP's AI-Powered Surveillance Tower

CBP is proposing the installation of an Anduril Sentry tower in San Clemente, CA, as part of its AST program. The system utilizes computer vision, radar, and long-range optics to autonomously detect, identify, and track objects of interest, such as humans and vehicles, without operator intervention. Technical and privacy concerns center on the system's 360-degree viewshed covering residential areas and the indefinite retention of imagery data used to train proprietary AI models.

Research

Memory in the Age of AI Agents

This survey addresses fragmentation in agent memory research by proposing a unified framework based on forms (token-level, parametric, latent), functions (factual, experiential, working), and dynamics. It distinguishes agent memory from RAG and context engineering while consolidating existing benchmarks and open-source frameworks. The work also outlines future research directions, including memory automation, RL integration, and multimodal systems, to establish memory as a first-class primitive for agentic intelligence.

Adding Compilation Metadata to Binaries to Make Disassembly Decidable

This paper proposes augmenting binary executables with compiler-intent metadata to enable reliable lifting into recompilable high-level representations. By capturing instruction and memory boundaries, the tool facilitates more accurate analysis and instrumentation without impacting runtime performance. The resulting metadata is 17% the size of DWARF and has been validated on real-world C/C++ binaries.

LLMs Corrupt Your Documents When You Delegate

DELEGATE-52 is a benchmark designed to evaluate LLM reliability in delegated document editing across 52 professional domains. Testing 19 LLMs reveals that even frontier models corrupt an average of 25% of document content over long workflows, with errors compounding silently over time. Performance degradation is exacerbated by document size and interaction length, and agentic tool use fails to mitigate these reliability issues.

Decoupled DiLoCo for Resilient Distributed Pre-Training

Decoupled DiLoCo improves LLM training goodput by replacing synchronous SPMD paradigms with an asynchronous, multi-learner framework. By utilizing a central synchronizer with minimum quorums and adaptive grace windows, it mitigates stalls from stragglers and hardware failures to achieve zero global downtime. The method maintains competitive performance across dense and MoE architectures while significantly increasing efficiency in large-scale, failure-prone environments.

Foundational aspects of spinor structures and exotic spinors(2025)

This review examines the topological conditions governing the existence and uniqueness of spinor structures, specifically focusing on "exotic" spinors arising from non-equivalent spacetime topologies. The authors derive a topologically corrected Dirac operator to analyze the physical implications of exotic spinor dynamics and survey current research directions in the field.

Code

A Karpathy-style LLM wiki your agents maintain (Markdown and Git)

WUPHF is a collaborative multi-agent framework that creates a shared "office" environment for AI roles like CEO, PM, and engineers. It utilizes a "shared brain" architecture consisting of per-agent notebooks and a git-native markdown wiki for persistent organizational memory. The system optimizes LLM efficiency through push-driven agent wakes and fresh sessions per turn to maximize prompt caching. It supports Claude Code and Codex providers, integrates with Telegram and OpenClaw, and includes extensible action providers for real-world task execution.

AI agents that argue with each other to improve decisions

HATS is a multi-agent orchestration framework that implements the Six Thinking Hats methodology to facilitate structured disagreement and debate among LLMs. By assigning agents specific roles—such as risk assessment, factual analysis, or creative brainstorming—the system mitigates common LLM issues like sycophancy and overconfidence. The platform includes a TypeScript-based dashboard featuring real-time 3D avatar meetings with TTS, a project-scoped Kanban board, and extensive tool integration via MCP.

Bunny Agent – Build Coding Agent SaaS via Native AI SDK UI

Bunny Agent is a multi-model coding agent and SDK built on Pi Coding Agent, designed for CLI workflows, remote sandboxing, and embedding into custom AI products. It features native AI SDK UI stream output for seamless integration with Vercel AI SDK and supports various LLM providers including Claude, Gemini, and OpenAI. The platform includes a pre-built tool harness for bash execution and file operations, offering high performance on the GAIA benchmark with support for persistent, isolated cloud sandboxes via Sandock, E2B, or Daytona.

Memweave CLI – search your AI agent's memory from the shell

memweave is an async-first Python library for AI agent memory that uses Markdown files as the source of truth and SQLite for indexing. It features a hybrid search pipeline combining BM25 keyword ranking and semantic vector search with built-in support for temporal decay and MMR re-ranking. The library is zero-infrastructure, supports offline keyword search, and includes a utility for LLM-driven fact extraction from conversation history.

ShadowPEFT – Centralized and Detachable Parameter-Efficient Fine-Tuning

ShadowPEFT is a PEFT framework that augments a frozen LLM with a parallel, detachable "Shadow" network to inject learned corrections into decoder layers. It supports both implicit architectures and explicit, smaller pretrained models, enabling modular deployment and cross-architecture adaptation. The framework is architecture-agnostic for decoder-only transformers and integrates with the Hugging Face ecosystem, though it requires disabling the KV cache for full-sequence processing.