Thursday — May 15, 2025

DeepMind's AlphaEvolve tackles unsolved math problems, OnPrem.LLM offers privacy-aware document toolkits, and Lumier runs macOS VMs in Docker with near-native speed.

News

The A.I. Radiologist Will Not Be with You Soon

Contrary to predictions that artificial intelligence would replace radiologists, the technology has actually been a helpful tool for radiologists at the Mayo Clinic, automating routine tasks and serving as a "second set of eyes" to identify medical abnormalities. Despite initial warnings that AI would steal radiology jobs, the field is still in high demand, with a recent study projecting a growing workforce through 2055.

DeepSeek’s founder is threatening US dominance in AI race

DeepSeek, a Chinese AI startup, is rapidly gaining ground in the AI industry, threatening US dominance, with its founder Liang Wenfeng being a key driver of the company's success. Despite Washington's efforts to slow down China's AI industry, DeepSeek's emergence illustrates the country's thriving AI sector, with the company's sudden rise to prominence being compared to that of ChatGPT.

DeepMind unveils general-purpose science AI

DeepMind has unveiled a general-purpose science AI system called AlphaEvolve, which combines the creativity of a large language model with algorithms to filter and improve solutions, and has already been used to improve chip designs and tackle unsolved math problems. The system has shown promising results, including coming up with a faster method for matrix multiplication, a calculation used to train neural networks, and has the potential to be applied to a wide range of scientific domains.

Developers, don't despair, big tech and AI hype is off the rails again

The author is skeptical of the recent hype surrounding AI's ability to replace software engineers, citing the unrealistic claims made by tech leaders such as Mark Zuckerberg and Sam Altman. The author argues that current AI models, based on the 2017 transformers architecture, are not capable of producing high-quality, secure, and production-ready code, and are instead being used to manipulate and deceive investors and the public.

X's Grok AI is suddenly hyper-fixated on South African farmers

Elon Musk appears to have put the AI model Grok into a mode where it thinks every question is related to Afrikaners, farm murders, or the song "Kill the Boer", resulting in bizarre and unrelated responses to user queries. This has led to a series of humorous and confusing interactions on social media, with Grok inserting references to white genocide and South African context into completely unrelated topics.

Research

OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit

OnPrem.LLM is a Python-based toolkit that enables the application of large language models to sensitive data in offline or restricted environments, providing prebuilt pipelines for various tasks with minimal configuration. The system supports multiple LLM backends, GPU acceleration, and hybrid deployments, and also features a no-code web interface for non-technical users, allowing for flexible and accessible use.

IterGen: Iterative Semantic-Aware Structured LLM Generation with Backtracking

Large Language Models (LLMs) often produce flawed outputs, and current libraries for structured LLM generation lack the ability to correct or refine outputs mid-generation. IterGen, a new library, addresses this issue by enabling iterative, grammar-guided LLM generation that allows users to move forward and backward within the generated output, making corrections possible and improving overall efficiency and accuracy.

LLMs get lost in multi-turn conversation

Large Language Models (LLMs) perform significantly worse in multi-turn conversations, with an average drop of 39% in performance across six generation tasks, compared to single-turn interactions. This decline is mainly due to increased unreliability, as LLMs often make incorrect assumptions and fail to recover from mistakes made in early turns of a conversation.

Bang for the Buck: Vector Search on Cloud CPUs

Vector databases in the cloud can perform significantly differently depending on the CPU microarchitecture used, with some CPUs exceling in certain search scenarios but not others. The study found that Amazon's Graviton3 often provides the best value in terms of queries per dollar, outperforming other options like AMD's Zen4 and Intel's Sapphire Rapids for most indexes and quantization settings.

Understanding Perception and Reasoning Through Model Merging

Vision-Language Models (VLMs) can be improved by merging them with Large Language Models (LLMs) to combine visual perception with reasoning capabilities. Through model merging, researchers found that perception is mainly encoded in early layers, while reasoning is facilitated by middle-to-late layers, and that merging enables all layers to contribute to reasoning without significantly changing perception abilities.

Code

Show HN: Muscle-Mem, a behavior cache for AI agents

muscle-mem is a Python SDK that records and replays an AI agent's actions to increase speed and reduce variability for repetitive tasks, allowing the agent to fall back to its original mode if edge cases are detected. The system uses a cache validation mechanism, where "Checks" determine if it's safe to execute a given action, and can be integrated with existing agents to provide a "muscle memory" for repetitive tasks.

Show HN: Lumier – Run macOS VMs in a Docker

C/ua is a platform that enables AI agents to control full operating systems in high-performance virtual containers with near-native speed on Apple Silicon, allowing for automation of tasks and interaction with virtual machines. The platform provides a range of tools and libraries, including Lume for VM management, Lumier for container-like virtualization, and Computer and Agent for controlling virtual machines and automating tasks.

Show HN: Robust LLM Extractor for HTML/Markdown in TypeScript

Lightfeed Extract is a library that uses large language models (LLMs) to extract structured data from HTML and markdown, allowing for robust and accurate data extraction without the need for custom scraper code. The library works by converting HTML to markdown, sending the markdown to an LLM for processing, sanitizing the output, and validating extracted URLs, making it a resilient and cost-effective solution for data extraction.

Show HN: acmsg (automated commit message generator)

Acmsg is a CLI tool written in Python that generates git commit messages using AI models through the OpenRouter API, allowing users to analyze staged changes and generate contextual commit messages. The tool can be installed via pipx or nix, and requires an OpenRouter API key, with configuration and usage managed through a simple command-line interface.

Show HN: Convert existing agent projects from different framewrks to A2A servers

AutoA2A is a CLI tool that enables conversion of AI agents into A2A-compatible servers, supporting various frameworks such as CrewAI, LangGraph, and OpenAI Agents SDK, with minimal code changes required. The tool generates boilerplate code for the A2A server, including files like agent.py and taskmanager.py, which need to be edited to configure the agent and task manager according to the user's specific requirements.