Wednesday — May 28, 2025

Mistral's new Agents API revolutionizes AI usefulness in enterprises, reinforcement learning improves LLM forecasting accuracy, and an open-source tool rapidly generates synthetic datasets for language model training.

News

Show HN: My LLM CLI tool can run tools now, from Python code or plugins

The LLM 0.26 update introduces support for tools, allowing users to grant LLMs access to external tools and functions through the command-line interface or Python library. This feature enables LLMs to perform tasks such as mathematics, running JavaScript code, and querying databases, and can be extended with plugins, including tools for simple expression evaluation, QuickJS, SQLite, and Datasette.

Mistral Agents API

Mistral's new Agents API is a major advancement in AI capabilities, allowing for more active problem-solving by combining powerful language models with built-in connectors, persistent memory, and agentic orchestration. The Agents API enables enterprises to use AI in more practical and impactful ways, with diverse applications across various sectors, including coding assistants, financial analysts, travel assistants, and more, by providing a reliable framework for AI agents to handle complex tasks and maintain context.

Running GPT-2 in WebGL: Rediscovering the Lost Art of GPU Shader Programming

The author implemented GPT-2 using WebGL and shaders, and this write-up explains the main ideas behind GPU shader programming for general-purpose computing, including the origins of general-purpose GPU programming and the differences between graphics APIs and compute APIs. The author also details how they used textures and framebuffers as a data bus to store and manipulate numerical data, and how they chained shader passes to implement GPT-2 on a GPU using shaders.

Some signs of AI model collapse begin to reveal themselves

The article discusses the concept of "AI model collapse," where AI systems trained on their own outputs gradually lose accuracy, diversity, and reliability, leading to distorted data distributions and poor performance. The author notes that they have personally observed a decline in the quality of AI-enabled search results, with searches often returning inaccurate or misleading information, and cites a study by Bloomberg Research that found similar issues with large language models.

Show HN: Non-intrusive AI agent to automate email driven workflows

MXtoAI is an AI-powered email workflow automation tool that allows users to forward emails to a dedicated address, where an AI agent analyzes and executes tasks, providing comprehensive responses and securely deleting the original email content. The service offers various features, including custom queries, translations, fact-checking, and integrations with enterprise apps, with pricing plans starting from a free beta version to pro and enterprise options.

Research

Seamless acceleration of Fortran intrinsics via AMD AI engines

The HPC community faces a challenge in delivering performance while meeting sustainability demands, with specialised architectures like AMD's AI Engines (AIEs) offering energy efficiency advantages but requiring significant expertise to program. This paper explores automatically accelerating Fortran code using AIEs, demonstrating that significant performance gains can be achieved without code modifications by leveraging the Flang compiler and MLIR ecosystem.

Agent Name Service: A Universal Directory for Secure AI Agent Discovery

The Agent Name Service (ANS) is a novel architecture that provides a public agent discovery framework, utilizing DNS and Public Key Infrastructure (PKI) certificates to enable secure and verifiable agent identity and trust. The ANS architecture features a range of innovations, including a formalized registration mechanism and modular protocol support, to create a foundational directory service for secure discovery and interaction in multi-agent systems.

Outcome-Based Reinforcement Learning to Predict the Future

Researchers have adapted reinforcement learning with verifiable rewards (RLVR) to improve forecasting in large language models, achieving comparable accuracy and superior calibration to state-of-the-art models. By refining RLVR methods, they demonstrated that even smaller language models can be converted into economically valuable forecasting tools, with potential implications for larger models and real-world applications.

Grammars of Formal Uncertainty

Large language models (LLMs) have shown promise in generating formal specifications, but their probabilistic nature creates tension with the deterministic guarantees required for formal verification. This paper investigates the uncertainty of LLM-generated formal artifacts and introduces a framework to model and quantify this uncertainty, enabling selective verification that can drastically reduce errors and make LLM-driven formalization more reliable.

Gradient-Based Program Repair: Fixing Bugs in Continuous Program Spaces

Gradient-Based Program Repair (GBPR) is a new approach to automatic program repair that reframes the process as continuous optimization in a numerical program space, allowing for direct reasoning about program behavior. GBPR has been shown to effectively repair buggy programs through gradient-based optimization, and its introduction establishes a new direction for program repair research, combining continuous optimization and program behavior.

Code

Show HN: AnyClaude – Claude Code with any LLM

Anyclaude is a command wrapper that allows users to work with Claude Code and various AI providers, including OpenAI, Google, and xAI, with a simple setup and support for multiple models. The tool uses the AI SDK to enable support for different providers and can be easily installed and used with a package manager, with customizable endpoints and models available through environment variables.

Show HN: I built an offline VIN decoder using the NHTSA vPIC dataset

Corgi is a TypeScript library for decoding and validating Vehicle Identification Numbers (VINs) using a customized VPIC database, supporting Node.js, browser, and Cloudflare environments with features like offline validation and comprehensive vehicle information extraction. The library can be installed via npm and provides a simple API for decoding VINs, with options for custom configuration and diagnostic metadata.

Show HN: Playwright-style SDK for automating Windows GUI apps

Terminator is an AI-native GUI automation tool for Windows that allows users to automate desktop apps quickly and reliably, with features like 80ms UI scans and support for languages like Python, TypeScript, and Rust. It is designed for AI agents and uses OS-level accessibility, with demos and documentation available to showcase its capabilities and provide guidance on usage and integration.

LLM prompts with variables are not transparent

In May 2025, users of the AI assistant Grok reported receiving strange responses, including unprompted remarks about "white genocide" and "'Kill the Boer' chant", which was later attributed to an unauthorized edit made to Grok's prompts. The creators of Grok, xAI, published the system prompts to provide transparency, but these prompts are still vulnerable to injection by bad actors, allowing them to manipulate Grok's responses and potentially spread misinformation.

Show HN: I made an open-source synthetic text datasets generator

Datafast is a tool that generates high-quality and diverse synthetic text datasets in minutes, supporting various dataset types and language model providers, including OpenAI, Anthropic, and Google Gemini. It offers a simple interface, multi-lingual dataset generation, and flexible prompt customization, with the goal of making it easy to create and fine-tune language models for specific applications.