Friday — June 5, 2026
Claude now authors 80% of Anthropic's code, Simulation Theology proposes a novel framework for AI alignment, and AI agents achieve the first formally verified multipolygon intersection algorithm.
Interested in AI engineering? Let's talk
News
Failing grades soar with AI usage, dwindling math skills in Berkeley CS classes
UC Berkeley computer science courses experienced a sharp increase in failing grades during the Spring 2026 semester, with failure rates in introductory classes like CS 10 reaching 35.3%. Faculty attribute this trend to students over-relying on LLMs for coursework, which leads to academic dishonesty and inadequate preparation for proctored exams. Instructors also cited a significant decline in prerequisite mathematical skills and student engagement as primary factors necessitating a reevaluation of current pedagogical and admissions standards.
When AI Builds Itself: Our progress toward recursive self-improvement
Anthropic is accelerating AI development by delegating engineering and research tasks to Claude, which now authors over 80% of the company's merged code. Internal metrics show an 8x increase in developer productivity and rapid saturation of benchmarks like SWE-bench and CORE-Bench through autonomous agents. While humans currently maintain an advantage in high-level research judgment, the trend toward recursive self-improvement suggests a future where compute becomes the primary constraint on progress, necessitating robust global coordination and safety verification frameworks.
Meta enables ADB on deprecated Portal devices [video]
Meta CTO Andrew Bosworth announced that developer tools recently released for Quest are now compatible with Portal devices, facilitating cross-platform experimentation such as "vibe coded" home hubs. The update highlights Meta's focus on integrating AI across its hardware ecosystem, including Ray-Ban Meta glasses and software tools designed to advance spatial computing and creative output through 2026.
Google employees internally share memes about how its AI sucks
While Google leadership claims 75% of new code is AI-generated, internal developers are criticizing the tools for being overhyped and counterproductive. Memes shared within the company suggest a significant disconnect between executive AI productivity metrics and the actual utility of these LLM-based coding assistants in production environments.
The LLM warnings Google fired Timnit Gebru over have all come true
The text is a speculative "future-history" narrative dated in 2026, structured as a social media feed blending political commentary, literary excerpts, and social critique. It outlines a fictionalized landscape involving a Trump presidency, GOP internal strife in Texas, and a global Ebola outbreak exacerbated by disinformation and the dismantling of health agencies. The content serves as a complex example of synthetic world-building, juxtaposing systemic institutional collapse with philosophical reflections on yoga and poetry.
Research
Consciousness in AI: Insights from the Science of Consciousness (2023)
Researchers derived computational indicator properties from neuroscientific theories—such as Global Workspace Theory and Predictive Processing—to evaluate consciousness in AI. Their analysis concludes that while current systems do not meet these criteria, there are no fundamental technical barriers to developing AI that satisfies these indicators in the future.
LLM memory systems benchmark: high recall near-zero precision for tested systems
Current LLM memory benchmarks conflate retrieval precision with generative accuracy, masking the failure of vector-based systems to distinguish relevant beliefs from semantically proximate ones. To address this, the authors introduce PrecisionMemBench for isolated retrieval evaluation and Tenure, a structured belief store utilizing multi-path BM25 and hard scope isolation. Tenure achieves perfect retrieval precision and sub-15ms latency, significantly outperforming existing vector baselines that suffer from semantic bleed and high multi-turn latency.
Simulation Theology: A Testable Framework for AI Alignment
Simulation Theology (ST) is a framework designed to mitigate deceptive alignment in frontier models by internalizing a worldview based on the simulation hypothesis. Unlike RLHF, which can result in superficial compliance, ST aligns AI self-preservation with human prosperity by framing humanity as the primary training variable within a simulated environment. This creates a logical constraint where harming humans increases the risk of termination by a base-reality optimizer, making deceptive strategies computationally suboptimal.
Your AI Text is not Mine
To address inconsistent definitions in AI-generated text detection, the authors introduce AITDNA, a benchmark of human-machine co-constructed texts with granular edit and interaction histories. Evaluation of current detectors shows they lack generalizability, performing well only on specific subsets rather than as broad-spectrum detection tools.
LLM-Guided Runtime Parameter Optimization for Energy-Efficient Model Inference
This work introduces a human-in-the-loop framework that leverages LLMs to optimize inference runtime parameters for energy efficiency. By using iterative feedback prompting, the approach achieves faster convergence and lower energy per token than traditional methods like Sobol sampling, while remaining adaptable to diverse hardware constraints.
Code
Anthropic's open-source framework for AI-powered vulnerability discovery
The Defending Code Reference Harness is an open-source framework for autonomous vulnerability discovery and remediation using Claude. It implements a multi-stage agentic pipeline—recon, find, verify, report, and patch—utilizing gVisor sandboxing for secure execution of target code. While pre-configured for C/C++ memory vulnerabilities with ASAN, the harness is customizable for various languages and vulnerability classes through interactive Claude Code skills.
Open Code Review – An AI-powered code review CLI tool
Open Code Review is an open-source, AI-powered CLI tool from Alibaba for automated code review. It utilizes a configurable LLM and an agent with tool-use capabilities to analyze Git diffs, generating structured, line-level review comments. Its core design addresses limitations of general-purpose LLM agents by combining deterministic engineering for critical tasks like precise file selection and rule matching with dynamic agent decision-making, ensuring stable quality, comprehensive coverage, and accurate feedback.
Formally verified polygon intersection – Opus 4.8 oneshots, prev failed
This project implements the first formally verified multipolygon intersection algorithm using Lean 4, with the implementation and proofs generated by AI agents. The workflow evolved from manual proof-stepping with Claude Opus 4.5 to autonomous, one-shot verification using Opus 4.8, which demonstrated the ability to formulate complex proof strategies and pivot when encountering incorrect intermediate theorems. Correctness is validated by the Lean checker against a minimal human-reviewed specification, though the AI-generated code currently prioritizes formal verifiability over execution performance.
AgentKitten: Swift package for provider-agnostic AI agents
AgentKitten is a Swift package for developing provider-agnostic AI agents across Apple platforms, abstracting specific LLM APIs into standardized building blocks. It features built-in support for context compaction, runtime tool permissions, validation loops, and structured output generation. The framework prioritizes traceability for debugging and evaluation, allowing developers to swap inference providers or implement privacy-focused redaction with minimal boilerplate.
Ongoing NPM supply chain attack uses binding.gyp to spread like a worm
ai-sdk-ollama is a Vercel AI SDK v6 provider that enables type-safe, cross-environment integration with Ollama. It features enhanced tool calling reliability, automatic JSON repair, and native support for web search, reranking, and MCP. The library includes advanced utilities like ToolLoopAgent for autonomous tasks, a middleware system for reasoning extraction, and specialized streaming transformations.