Wednesday March 25, 2026

GPT-5.4 Pro solves a frontier math open problem, AI agents autonomously perform experimental high energy physics, and ProofShot gives coding agents eyes to visually verify UI.

Interested in AI engineering? Let's talk

News

Is anybody else bored of talking about AI?

The author argues that the tech community's fixation on AI tooling has led to a regression in engineering culture, shifting focus from product value to repetitive implementation details. While acknowledging the productivity gains of LLMs, they critique the saturation of platforms like Hacker News with redundant AI workflows and management's misguided focus on metrics like token usage. The text calls for a return to "Product Engineering" principles, urging developers to prioritize the impact of their creations over the tools used to build them.

Goodbye to Sora

The Sora Team has announced the discontinuation of the Sora video generation platform. Upcoming communications will detail the sunsetting timelines for the app and API, as well as procedures for users to preserve their existing work.

Epoch confirms GPT5.4 Pro solved a frontier math open problem

Frontier models including GPT-5.4 Pro, Opus 4.6 (max), and Gemini 3.1 Pro have solved an open problem in Ramsey-theoretic hypergraph partitioning from the FrontierMath benchmark. The models successfully improved the lower bound for the sequence H(n) by a constant factor using a novel construction algorithm, a task previously estimated to require 1–3 months of expert human effort. The AI-generated solution is slated for publication in a specialty journal, demonstrating the capability of LLMs to contribute to original mathematical research.

So where are all the AI apps?

Analysis of PyPI data reveals no significant inflection point in general software creation or update frequency following the release of ChatGPT. While popular AI-specific packages show a >2x increase in update frequency, this trend is absent in non-AI sectors, suggesting that any productivity gains are currently localized. The findings indicate that the "AI effect" is primarily a concentrated burst of iteration within the AI ecosystem itself, likely driven by high funding and interest rather than a universal leap in developer efficiency.

The bridge to wealth is being pulled up with AI

AI is dismantling the historical "bridge" that allowed cognitive ability to be converted into heritable wealth via credentials and professional labor. While biological traits follow a Gaussian distribution and regress to the mean, wealth follows a power law and compounds through legal inheritance; LLMs are accelerating this divergence by commoditizing high-skill cognitive tasks and collapsing the IQ premium in the labor market. A narrow 5–10 year window remains for domain experts to leverage AI fluency for capital accumulation before the economy shifts toward a permanent "capital-heavy" structure where inherited wealth becomes the primary determinant of life outcomes.

Research

AI Agents Can Autonomously Perform Experimental High Energy Physics

LLM-based agents can autonomously execute end-to-end HEP analysis pipelines, covering event selection, statistical inference, and paper drafting. The "Just Furnish Context" (JFC) framework integrates autonomous agents with literature-based RAG and multi-agent review to perform credible measurements on open data from ALEPH, DELPHI, and CMS. This approach offloads the technical burden of code development, shifting the researcher's role toward physics insight and validation.

Expert Personas Improve LLM Alignment but Damage Accuracy

This study investigates the inconsistent utility of persona prompting by analyzing the effects of model optimization, task type, and prompt configuration on LLM performance. The authors propose PRISM (Persona Routing via Intent-based Self-Modeling), a framework that self-distills intent-conditioned expert personas into gated LoRA adapters via a bootstrapping process. PRISM improves human preference and safety alignment in generative tasks while maintaining discriminative accuracy with minimal memory and compute overhead.

DeepSeek: Conditional Memory via Scalable Lookup

Engram introduces conditional memory as a sparsity axis alongside MoE, utilizing modernized $N$-gram embeddings for O(1) knowledge lookup. By optimizing the trade-off between computation and static memory via a U-shaped scaling law, the 27B parameter model outperforms iso-FLOPs MoE baselines in reasoning, coding, and long-context tasks. Mechanistically, Engram offloads low-level reconstruction to early layers, allowing the transformer backbone to focus on complex reasoning and global attention while maintaining efficiency through host memory prefetching.

Searching for Fast Astronomical Transients

Independent analysis of digitized 1950s archival plates confirms the presence of fast astronomical transients previously identified by the VASCO Project. By comparing sequential plate pairs, researchers identified objects with systematically narrow FWHM relative to stellar PSFs. These findings support the interpretation of these events as sub-second optical flashes, potentially originating from reflections of rotating objects in Earth orbit.

Game Theory (Open Access textbook with 165 solved exercises)

This Open Access textbook covers non-cooperative Game Theory and includes 165 solved exercises. It provides foundational theory for modeling strategic interactions in multi-agent systems and AI.

Code

Hypura – A storage-tier-aware LLM inference scheduler for Apple Silicon

Hypura is a storage-tier-aware LLM inference scheduler for Apple Silicon that enables running models exceeding physical memory by dynamically tiering tensors across GPU, RAM, and NVMe. It optimizes performance through intelligent tensor placement, speculative prefetching, and MoE-specific expert streaming, achieving usable speeds for large models like Mixtral 8x7B on limited-memory hardware. The system provides an Ollama-compatible API and utilizes read-only NVMe I/O to prevent SSD wear while avoiding OOM crashes.

ProofShot – Give AI coding agents eyes to verify the UI they build

ProofShot is an open-source, agent-agnostic CLI that enables AI coding agents to visually verify UI features by recording video, capturing screenshots, and collecting console/server logs. It integrates with tools like Claude Code and Cursor, allowing agents to drive a headless browser and bundle verification artifacts into interactive reports or GitHub PR comments. This workflow closes the feedback loop for agents, ensuring functional and visual correctness through automated browser-based testing.

Pool spare GPU capacity to run LLMs at larger scale

Mesh LLM is a system designed for distributed LLM inference, pooling spare GPU capacity to run models at scale. It automatically distributes dense models using pipeline parallelism and MoE models via expert sharding, ensuring zero cross-node inference traffic for the latter. The system provides an OpenAI-compatible API, supports multi-model serving with demand-aware rebalancing, and includes a "blackboard" for agents to gossip and share information. Key optimizations include zero-transfer GGUF loading, RPC round-trip reduction, and speculative decoding to enhance performance.

Running AI agents across environments needs a proper solution

Odyssey is an open-source Rust runtime designed for defining, building, and operating portable AI agents as secure, self-contained bundles. It offers a unified execution model accessible via CLI, HTTP, TUI, and an embeddable SDK, emphasizing security through sandboxing and policy-controlled tool approvals. This bundle-first approach simplifies the packaging, deployment, and management of AI agents.

AI agent got 237 rules from another agent, still made the same mistakes

Calx is an automation tool for AI agents that prevents recurring mistakes by capturing corrections and promoting them into persistent rules. It implements a "Capture-Detect-Promote-Inject" workflow, using session hooks to automatically load domain-specific rules into an agent's context. The system manages token discipline to prevent context summarization from degrading learning signals and supports agent teams by syncing rules across source directories via AGENTS.md files.

    GPT-5.4 Pro solves a frontier math open problem, AI agents autonomously perform experimental high energy physics, and ProofShot gives coding agents eyes to visually verify UI.