Thursday — January 29, 2026

Arcee AI launches the 400B Trinity Large MoE model, a "team of rivals" multi-agent architecture intercepts 90% of LLM errors, and TuringDB outperforms Neo4j by 200x for GraphRAG.

Interested in AI engineering? Let's talk

News

Microsoft forced me to switch to Linux

The author migrated from Windows to CachyOS due to increasing OS instability, intrusive Copilot integration, and the replacement of native system components with React Native wrappers. He attributes the decline in Windows quality to Microsoft’s heavy reliance on LLM-generated code and forced AI features, which he characterizes as "slop." For technical workflows, he argues that Linux now provides a more stable environment for development and low-latency audio via PipeWire, outperforming the current AI-bloated Windows ecosystem.

Please don't say mean things about the AI I just invested a billion dollars in

This satirical piece parodies the defensive rhetoric of major AI investors, mocking pleas to suppress criticism of the technology's societal impact. It highlights the tension between massive capital investment and the ethical externalities of LLMs, including copyright infringement, environmental costs, and the proliferation of deepfakes. The text critiques the industry's tendency to prioritize ROI over the systemic harms associated with rapid generative AI deployment.

UK Government’s ‘AI Skills Hub’ was delivered by PwC for £4.1M

The UK government paid PwC £4.1 million to develop the "AI Skills Hub," a platform intended to provide AI training to 10 million workers by 2030. The site has faced heavy criticism for its poor UI, lack of original content, and legal inaccuracies, such as citing US "fair use" instead of UK "fair dealing" regarding AI intellectual property. Despite the high cost, the hub primarily functions as a directory of external links and suffers from significant accessibility and functional bugs.

Jellyfin LLM/"AI" Development Policy

Jellyfin’s new development policy prohibits LLM-generated text in communications and PR descriptions to ensure contributors maintain a deep understanding of their submissions. Code contributions must avoid "vibe coding" and "slop," requiring authors to manually review, test, and explain all changes without relying on LLM-generated justifications. The project emphasizes code quality and personal accountability, mandating that developers be able to implement review feedback themselves rather than piping it back through an LLM.

Trinity large: An open 400B sparse MoE model

Arcee AI has launched Trinity Large, a 400B sparse MoE model featuring 13B active parameters per token via a 4-of-256 routing architecture. Trained on 17T tokens using 2048 Nvidia B300 GPUs, the model utilizes momentum-based expert load balancing and z-loss regularization to achieve high inference efficiency and stability. It is available in three variants—Preview (instruct), Base (full pretraining), and TrueBase (raw 10T checkpoint)—and supports a native context window of 512k.

Research

If You Want Coherence, Orchestrate a Team of Rivals: Multi-Agent "

This paper proposes a multi-agent architecture that utilizes a "team of rivals" approach with strict role boundaries and opposing incentives to mitigate LLM fallibility. By decoupling reasoning from execution via a remote code executor, agents generate code and receive only summarized outputs, preventing context contamination. This orchestration of specialized agents—including planners, executors, and critics—achieves over 90% internal error interception while maintaining acceptable latency and cost tradeoffs.

The Shape of Reasoning: Topological Analysis of Large Language Models

Current LLM reasoning trace evaluation is labor-intensive and unreliable, with automated graph-based methods proving simplistic. This work introduces a topological data analysis (TDA)-based framework to capture the geometry of reasoning traces, enabling label-efficient, automated assessment. Empirical studies show TDA features significantly outperform standard graph metrics in predicting reasoning quality, suggesting that effective reasoning is better represented by higher-dimensional geometric structures, offering a practical signal for future reinforcement learning algorithms.

Attention Is Not What You Need

The proposed Causal Grassmann layer replaces standard self-attention by encoding token interactions as low-rank subspaces on a Grassmann manifold using Plücker coordinates. This attention-free architecture achieves linear scaling in sequence length and competitive performance on Wikitext-2 and SNLI benchmarks. By shifting computation from unstructured tensor spaces to finite-dimensional manifolds, the design provides a more structured, geometric framework for interpreting neural reasoning.

Inverse Rendering for High-Genus 3D Surface Meshes from Multi-View Images

This paper presents a mesh-based inverse rendering approach that leverages persistent homology priors to resolve topological ambiguities in 3D reconstruction. By incorporating gradient-based optimization with constraints on tunnel and handle loops, the method recovers complex high-genus geometries more robustly than traditional photometric consistency alone. Results show significant improvements in Chamfer Distance and Volume IoU over current state-of-the-art mesh-based techniques.

ARM MTE Performance in Practice (Extended Version)

This study provides a comprehensive performance analysis of ARM MTE across various microarchitectures, including Google Pixel 8/9, AmpereOne, and Apple M5. While MTE generally shows modest overheads for memory safety and security applications like CFI, specific microarchitectural bottlenecks can lead to slowdowns up to 6.64x in server workloads. The research identifies these causes and corrects prior methodological errors to inform future hardware design.

Code

A MitM proxy to see what your LLM tools are sending

Sherlock is a Python-based token tracker and traffic inspector for LLM CLI tools like Claude Code and OpenAI Codex. It operates as a local HTTP proxy to provide a real-time terminal dashboard for monitoring token consumption, context window limits, and prompt history. The tool automatically archives requests in markdown and JSON formats, enabling developers to debug and optimize prompt usage without complex configuration.

FASHN VTON v1.5 – open-source virtual try-on model

FASHN VTON v1.5 is a maskless virtual try-on model that generates photorealistic images directly in pixel space from person and garment inputs. The pipeline integrates DWPose for pose detection and human parsing, supporting bf16 inference on Ampere+ GPUs for efficient processing of tops, bottoms, and one-pieces. The model is released under the Apache-2.0 license and supports both model photos and flat-lay product shots.

TuringDB – The fastest analytical in-memory graph database in C++

TuringDB is a C++ based, in-memory column-oriented graph database optimized for high-performance analytical workloads and AI applications like GraphRAG. It utilizes a zero-lock concurrency architecture with snapshot isolation and introduces git-like versioning for graph data management, including branching and time travel. The engine supports OpenCypher and delivers millisecond query latency, outperforming Neo4j by up to 200x on deep multihop traversals.

AGENTS.lock – a package manager for Agents/Skills/MCPs

AGENTS.lock is a package manager for AI agent configurations, designed to synchronize skills, agents, MCP servers, and instructions across various LLM CLI tools such as Claude, Codex, Gemini, and Copilot. It utilizes a single AGENTS.lock TOML file as the source of truth for both project and global scopes, automating the consistent management and deployment of resources across different client environments. This eliminates the need for manual configuration across multiple platforms, simplifying agent setup and maintenance.

Ouroboros – AI agent framework that asks "why?" before writing code

Ouroboros is an LLM orchestration framework designed to transform ambiguous human requirements into precise, executable specifications through Socratic questioning and ontological analysis. It utilizes a Progressive Adaptive LLM (PAL) router to optimize costs by escalating from frugal to frontier models only when task complexity or risk warrants it. The system ensures execution resilience via automated stagnation detection and persona-based lateral thinking, supported by a three-stage evaluation pipeline ranging from mechanical linting to multi-model consensus.