Thursday — June 25, 2026

OpenAI unveils its Jalapeño custom chip, Wikipedia edits are found to shape LLM values and Anthropic releases 817 structured cybersecurity skills for agents.

Interested in AI engineering? Let's talk

News

OpenAI unveils its first custom chip, built by Broadcom

OpenAI has unveiled Jalapeño, a custom inference processor co-developed with Broadcom to improve performance-per-watt and reduce reliance on Nvidia GPUs. Designed with assistance from OpenAI's own models, the chip targets real-time inference workloads to lower operational costs and increase efficiency. This move enables full-stack vertical integration, allowing OpenAI to optimize hardware, kernels, and memory systems specifically for its LLM deployments.

RubyLLM: A Ruby framework for all major AI providers

RubyLLM is a unified Ruby framework providing a consistent interface for major AI providers, including OpenAI, Anthropic, and local models via Ollama. It supports complex workflows such as RAG, AI agents, and tool calling, alongside native capabilities for vision, audio transcription, and structured JSON output. The library features deep Rails integration via ActiveRecord and a built-in chat UI, streamlining the development of LLM-powered applications.

Anthropic says Alibaba illicitly extracted Claude AI model capabilities

Anthropic has accused Alibaba and its Qwen AI lab of executing a massive model distillation campaign to illicitly extract capabilities from Claude. The operation involved over 28.8 million exchanges via 25,000 fraudulent accounts, aimed at accelerating Chinese AI development toward Anthropic’s Mythos Preview performance levels. This incident follows similar extraction efforts by DeepSeek and Moonshot AI, contributing to increased U.S. regulatory scrutiny and export restrictions on advanced LLMs.

45°C cooling design cuts data center water use to near zero

NVIDIA’s Rubin architecture introduces the first 100% liquid-cooled AI infrastructure, utilizing the DSX reference design to eliminate fans and traditional air-cooling. By supporting coolant temperatures up to 45°C, the system enables chiller-less heat rejection via dry coolers, significantly reducing energy consumption and achieving near-zero water usage. This transition addresses the thermal challenges of high-density compute, allowing for a 3x increase in rack density while improving overall data center efficiency.

Reid Hoffman says SpaceX 'not an AI company', xAI 'complete train wreck'

Reid Hoffman characterizes xAI as a "train wreck" following a total cofounder exodus and labels SpaceX’s AI strategy as an attempt to buy relevance through acquisitions like Cursor rather than native innovation. He criticizes the U.S. government’s recent export controls on Anthropic’s Fable and Mythos models as unprincipled and "autocratic" regulatory intervention. Hoffman remains bullish on OpenAI and Anthropic as foundational AI utilities while transitioning from the Microsoft board to focus on Manas AI, a startup leveraging AI for drug discovery.

Research

IatroBench: Pre-Registered Evidence of Iatrogenic Harm from AI Safety Measures

IatroBench evaluates "identity-contingent withholding" in frontier models, revealing that safety-trained LLMs often refuse clinical information to patients while providing it to physicians despite identical facts. The study identifies a significant "decoupling gap" in models like Claude 3 Opus, where omission harm is triggered by the absence of professional or epistemic signals rather than specific credentials. Crucially, standard LLM-based judges fail to detect these omissions, demonstrating that automated evaluation layers inherit the same safety-driven biases present in the training phase.

Submodular Context Selection as a Pluggable Engine for LLM Agents

LLM agents face context window overflow from accumulated conversation turns, memory entries, and tool outputs. The prevalent recency truncation method is topic-blind, discarding relevant older information while retaining verbose, irrelevant recent material, which significantly impairs agents requiring long-term recall. Existing alternatives like RAG and context-compression do not address the core issue of intelligently selecting from the agent's already-present pooled context based on relevance during prompt assembly.

DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference

DualPath addresses KV-Cache I/O bottlenecks in disaggregated LLM inference by introducing a dual-path loading mechanism that utilizes idle decoding engine NICs. By loading KV-Cache into decoding engines and transferring it to prefill engines via RDMA, the system bypasses storage bandwidth saturation and improves throughput by up to 1.96x for agentic workloads while maintaining SLOs.

Wikipedia advocacy shapes LLM values

Research demonstrates that small-scale, coordinated Wikipedia editing significantly influences LLM behavior due to the high weighting of Wikipedia in training datasets. Using gradient-based data attribution methods like TrackStar and MAGIC on Llama 3.1 and 3.2, researchers found that targeted edits by the Pro-Animal Wikipedians (PAW) group disproportionately drive model responses to specific animal welfare queries. Fine-tuning validation confirmed these edits measurably reduce perplexity on targeted subjects, proving that niche data interventions can effectively shape LLM outputs.

Qwen-AgentWorld: Language World Models for General Agents

Qwen-AgentWorld (35B and 397B) are language world models trained on 10M trajectories to simulate agentic environments across 7 domains using long CoT reasoning. Developed via a three-stage pipeline of CPT, SFT, and RL, these models outperform frontier LLMs on the new AgentWorldBench. They serve both as scalable simulators for agentic RL and as unified foundation models where world-model training improves downstream agent performance.

Code

Lelu – gate OpenAI agent actions on confidence and prompt injection

Lelu is an authorization engine for AI agents that secures actions through a multi-layered pipeline featuring prompt injection detection, confidence gating via LLM log-probs, and OPA/Rego policy evaluation. It enables human-in-the-loop (HITL) workflows for high-risk decisions and includes an OAuth token vault and identity management for non-human identities (NHI). The engine integrates with major frameworks like LangChain and the Vercel AI SDK to provide audited, risk-aware agent execution.

Anthropic-Cybersecurity-Skills:817 structured cybersecurity skills for AI agents

Anthropic Cybersecurity Skills is an open-source library of 817 structured cybersecurity skills designed to provide AI agents with senior-level analyst workflows. Built on the agentskills.io standard, it features a progressive disclosure architecture that optimizes token consumption by allowing agents to scan YAML frontmatter before loading full Markdown procedures. The library covers 29 domains and is uniquely mapped across six industry frameworks, including MITRE ATT&CK, ATLAS, and NIST AI RMF, ensuring cross-platform compatibility with tools like Claude Code and GitHub Copilot.

AI-website-cloner-template: Clone any website using AI coding agents

This template enables AI coding agents to reverse-engineer websites into Next.js 16 codebases using a multi-phase pipeline of reconnaissance, design token extraction, and parallelized component construction. Optimized for Claude Code (Opus 4.7) but compatible with agents like Cursor and Aider, it automates the generation of component specs and asset management to reconstruct sites with high fidelity. The underlying stack leverages React 19, Tailwind CSS v4, and shadcn/ui.

A durable filesystem layer for AI agents

SmolFS provides durable workspace folders for AI agents, allowing data to persist across short-lived runtimes and different processes. Built with a Rust core and offering Python and TypeScript SDKs, it enables agents to mount workspaces as local directories backed by local storage or cloud infrastructure like S3 and Redis. The system manages the full volume lifecycle, including mounting, flushing, and unmounting, to ensure consistent state for agentic workflows.

Omnigent: Open-source meta harness for agents

Omnigent is an open-source AI agent framework and meta-harness that provides a unified orchestration layer for agents like Claude Code, Codex, and custom YAML-defined models. It enables cross-device session syncing, multi-agent collaboration, and secure execution via cloud sandboxes like E2B and Modal. The platform features a robust governance engine for enforcing policies, spend caps, and tool-use approvals across diverse model providers and gateways.