Tuesday — May 26, 2026

Pope Leo XIV calls for "disarming" AI in a new encyclical, LLM-driven formal proof search resolves nine Erdős problems, and GPT-4.1 exhibits human-like bias when guessing numbers.

Interested in AI engineering? Let's talk

News

Using AI to write better code more slowly

LLMs can be leveraged for high-quality, methodical development rather than just rapid "slop" generation. By utilizing multi-agent review workflows and model-to-model debate, developers can effectively identify and validate critical bugs while minimizing hallucinations. This approach prioritizes codebase health and architectural understanding over raw velocity, using agents to "grill" PRs and document complex logic.

Magnifica Humanitas

The text examines the intersection of AI, robotics, and digitalization with human dignity, critiquing a technocratic paradigm that prioritizes efficiency and profit over the common good. It advocates for robust governance, algorithmic transparency, and human-in-the-loop accountability to mitigate risks such as disinformation, structural inequality, and the automation of lethal force. By emphasizing integral human development, the document calls for "disarming AI" to prevent it from becoming a tool for monopolistic dominance or new forms of digital slavery.

Pope Leo XIV says AI must serve humanity, not the powerful few

Pope Leo XIV’s encyclical "Magnifica Humanitas" advocates for the "disarming" of AI, calling for stricter international regulation to prevent military and economic dominance from undermining human agency. Developed with input from Anthropic co-founder Chris Olah, the document critiques the concentration of power in Big Tech and warns against "data colonialism" and the risks of transhumanism. It emphasizes the necessity of embedding ethical frameworks during the design phase, maintaining human-in-the-loop oversight for lethal systems, and ensuring transparency in algorithmic decision-making.

Norway's 2 petabytes of Huawei flash storage and LLM training

Norway’s National Library is developing a sovereign Norwegian LLM using its 20 PB digital archive to preserve local linguistic and cultural nuances. The infrastructure leverages 2 PB of Huawei OceanStor Dorado flash storage and Nvidia DGX H200 systems for data cleaning and normalization, with final training executed on the Sigma2 Olivia supercomputer's 448 GPUs. Key technical challenges include optimizing PB-scale data transfer from high-latency preservation archives to low-latency AI pipelines and building custom evaluation frameworks for Norwegian’s unique dialects and written forms.

Uber’s COO says it’s getting harder to justify money spent on tokenmaxxing

Uber COO Andrew Macdonald reported difficulty justifying the high costs of AI, noting that increased token consumption has not yielded a proportional rise in useful consumer features. After the company reportedly exhausted its 2026 Claude Code budget early, Uber has slowed hiring to reallocate capital toward these AI investments. This shift reflects growing internal skepticism regarding the ROI of aggressive AI integration and the industry trend of "tokenmaxxing."

Research

Advancing mathematics research with AI-driven formal proof search

Researchers evaluated LLM-driven formal proof generation in Lean to solve open mathematical problems, successfully resolving 9 Erdős problems and 44 OEIS conjectures. The study demonstrates that autonomous agents combining LLM generation with automated verification can advance research in fields like combinatorics and algebraic geometry. Findings highlight the impact of agent architecture on the cost-efficiency of formal proof search for complex problems.

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

SkillOpt introduces a systematic text-space optimizer for agent skills, training them as external states of a frozen agent rather than relying on hand-crafted or loosely revised methods. It employs a separate optimizer model to iteratively refine a single skill document through bounded text edits, accepting only those that strictly improve a held-out validation score, ensuring stable training with zero inference-time overhead. SkillOpt significantly outperforms existing approaches across various LLMs, benchmarks, and execution harnesses, achieving substantial accuracy gains and demonstrating strong transferability of optimized skills.

ThriftAttention: Selective Mixed Precision for Long-Context FP4 Attention

ThriftAttention addresses the quality degradation of FP4 quantization in long-context LLMs by leveraging the non-uniform impact of quantization error across attention blocks. It employs a two-stage hybrid approach that computes a small fraction (~5%) of high-importance query-key blocks in FP16 while processing the remainder in FP4, merging results via online softmax. This method recovers approximately 89% of the FP4-to-FP16 performance gap, maintaining high inference efficiency even as sequence lengths scale.

Continual Speaker Identity Unlearning with Minimal Interference

CORTIS is a framework for continual speaker identity unlearning in ZS-TTS that prevents the revival of previously erased voices during sequential updates. It utilizes Fisher-information-based parameter masking to localize updates and orthogonal projection against prior update subspaces to maintain privacy without access to old data. Evaluated on VoiceBox, CORTIS ensures persistent identity suppression across long request sequences, significantly outperforming existing sequential unlearning methods.

LLMs require curated context for reliable political fact-checking

Evaluation of 15 LLMs on 6,000 PolitiFact claims reveals that standard models and reasoning variants perform poorly, with web search offering only moderate improvements. In contrast, a curated RAG system using high-quality summaries increased macro F1 by 233% on average. These results suggest that providing models with curated context is more effective for automated fact-checking than relying on reasoning or general web access.

Code

GPT Guesses Between 1 and 100

An experiment involving 10,000 API calls to GPT-4.1 at temperature 1.0 demonstrates that the LLM inherits human-like non-uniformity when generating "random" numbers. The model exhibits significant spikes at 37, 73, and 42, while almost entirely avoiding multiples of 10. Notably, the under-representation of 69 suggests that safety guardrails moderate the learned human distribution, confirming that LLM outputs reflect a biased, post-processed subset of their training data rather than true stochasticity.

OpenBrief – Local-first video downloader/summarizer

OpenBrief is an open-source Tauri v2 desktop application that converts video and audio into grounded summaries and interactive briefings. It features local transcription using Whisper and Qwen3-ASR, TTS for audio playback, and RAG-based chat functionality compatible with OpenAI, Anthropic, and DeepSeek. The project is structured as a pnpm/Turborepo workspace and includes a roadmap for local LLM support and semantic video embeddings.

My biggest solo-project: Game engine with its own programming language

ArcadeMaker is an open-source, cross-platform 2D game engine and IDE built in C# that utilizes a custom DSL called Exp for game logic. Currently powered by MonoGame with plans for KNI-engine web support, the project aims to provide a GameMaker 8-style workflow for manual game development. Despite the prevalence of LLMs in modern software engineering, the author emphasizes a hand-coded architecture to facilitate fundamental programming education and creative autonomy.

Code-mapper: Free CLI tool to reduce LLM token usage on any codebases

code-mapper optimizes LLM comprehension of codebases by generating a compact PROJECT_CONTEXT.md file, significantly reducing token cost (e.g., 78% for 4,000 lines) compared to raw file scanning. This structured output includes file structure, Mermaid-based class and module dependency diagrams, and a symbol index with function signatures, designed for efficient LLM consumption. It supports Python via AST and other languages via regex, and can be integrated with LLM agents.

Cate v1.0 is out: The Infinite canvas workspace for developers

Cate is an Electron-based spatial IDE that provides an infinite canvas for organizing code editors, terminals, and browsers within persistent workspaces. It features a Monaco-powered editor, integrated git management, and a flexible docking system for complex project layouts. For AI-driven workflows, Cate includes built-in tools for bootstrapping agents like Claude Code and Cursor, alongside an MCP server editor for configuring and validating model context protocol integrations.