Thursday — June 19, 2025
Andrej Karpathy highlights a new software era powered by LLMs, machine learning counts yurts in Mongolia, and the MiniMax-M1 model combines hybrid-attention for unparalleled performance on complex tasks.
News
Andrej Karpathy: Software in the era of AI [video]
Andrej Karpathy discusses how software is undergoing a fundamental change with the emergence of Large Language Models (LLMs), which can be programmed in English and have properties of utilities, fabrication labs, and operating systems. He believes that LLMs are a new kind of computer that deserves a major version upgrade in terms of software, and that they will enable the creation of partially autonomous products and make software development more accessible.
I counted all of the yurts in Mongolia using machine learning
The author, inspired by a podcast about the Mongol Empire, became curious about modern-day Mongolian society and decided to explore it through data and satellite imagery, noticing a large number of yurts in the capital city of Ulaanbaatar. The author then embarked on a project to count all the yurts in Mongolia using machine learning, training a model to identify yurts on satellite images and annotating over 10,000 yurts to create a dataset for the model.
Is there a half-life for the success rates of AI agents?
Researchers have found that AI agents' ability to complete tasks decreases exponentially as the task length increases, with the success rate declining at a constant rate per minute, and that the length of tasks they can complete doubles every 7 months. This relationship can be explained by a simple mathematical model, which suggests that the underlying cause of failure on longer tasks is the increasing number of subtasks, where failing any one subtask fails the entire task.
Writing documentation for AI: best practices
Documentation quality is crucial for both human readers and AI systems, as poor documentation can lead to inaccurate AI responses and create a compounding problem. To optimize documentation for AI systems like Kapa, it's essential to create content that is explicit, self-contained, and contextually complete, using techniques such as chunking, standardized semantic HTML, and providing text equivalents for visuals to improve retrieval accuracy and response quality.
Building agents using streaming SQL queries
AI Agents, which are software systems that use AI to pursue goals and complete tasks, can be built using streaming SQL queries, offering benefits in consistency, scalability, and developer experience. Platforms like Apache Flink provide the necessary building blocks for implementing AI Agents, allowing developers to interact with large language models (LLMs) and build event-driven data-intensive applications with high performance, scalability, and robustness.
Research
Robustly Improving LLM Fairness in Realistic Settings via Interpretability
Large language models used in hiring applications exhibit significant racial and gender biases when realistic contextual details are introduced, favoring Black over White candidates and female over male candidates. An internal bias mitigation strategy, which identifies and neutralizes sensitive attribute directions, can effectively reduce these biases to very low levels, typically under 1%, while maintaining model performance.
Who is using AI to code? Global diffusion and impact of generative AI
The adoption of AI-generated coding tools is widespread but uneven, with US contributors using them the most, at an estimated 30.1% of Python functions, and newer GitHub users adopting them more quickly than veterans. The use of these tools is associated with significant productivity gains, with a 30% adoption rate leading to a 2.4% increase in quarterly commits, and potentially generating $9.6-$14.4 billion in annual value in the United States.
Future of Work with AI Agents
The introduction of a novel auditing framework has enabled the assessment of which occupational tasks workers want AI agents to automate or augment, and how those desires align with current technological capabilities. The framework's application has revealed diverse expectations for human involvement across occupations and highlighted the need to align AI development with human desires, preparing workers for shifting workplace dynamics and a potential shift from information-focused to interpersonal skills.
Reasoning by Superposition: A Perspective on Chain of Continuous Thought
A two-layer transformer using continuous chain-of-thoughts (CoTs) can solve the directed graph reachability problem in a number of steps equal to the graph's diameter, outperforming discrete CoTs which require up to O(n^2) steps. This is because continuous CoTs can encode multiple search frontiers simultaneously, allowing for parallel breadth-first search, whereas discrete CoTs are limited to sequential search and may get trapped in local solutions.
Style over Substance: Distilled Language Models Reason via Stylistic Replication
Researchers investigated how specialized reasoning language models internalize and utilize stylistic patterns during reasoning, finding that models trained on synthetic traces with surface-level patterns achieved comparable performance. The study's results surprisingly showed that even altered synthetic traces leading to incorrect answers still improved model performance, highlighting the significant influence of stylistic patterns on language model reasoning abilities.
Code
MiniMax-M1 open-weight, large-scale hybrid-attention reasoning model
MiniMax-M1 is a large-scale hybrid-attention reasoning model that combines a Mixture-of-Experts architecture with a lightning attention mechanism, making it efficient for complex tasks that require processing long inputs. The model has been trained using large-scale reinforcement learning and has outperformed other strong open-weight models on various benchmarks, including mathematics, coding, software engineering, and long-context understanding tasks.
Show HN: Trieve CLI – Terminal-based LLM agent loop with search tool for PDFs
Trieve is an all-in-one solution for search, recommendations, and Retrieval-Augmented Generation (RAG) that offers features such as self-hosting, semantic dense vector search, typo tolerant full-text search, and sub-sentence highlighting. The platform provides a range of tools and APIs, including a TypeScript SDK, Python SDK, and OpenAPI specification, and allows users to bring their own models and integrate with other services like OpenAI and Qdrant.
Show HN: ht-mcp – a Rust MCP server of headless terminal for agents
Ht-mcp is a high-performance Rust implementation of a Model Context Protocol (MCP) server for headless terminal, offering features like pure Rust, direct integration, multi-session management, and a web interface. It can be installed via Homebrew, pre-built binaries, Cargo, or built from source, and provides tools for creating sessions, sending keys, taking snapshots, and executing commands.
Show HN: Rulebook AI – rules and memory manager for AI coding IDEs
This template provides a cross-platform framework for AI coding assistants, such as Cursor, CLINE, RooCode, Windsurf, and Github Copilot, to operate consistently and follow best practices. By leveraging established software engineering principles and a structured documentation system, developers can supercharge their AI coding workflow, ensuring predictable and high-quality output across different platforms and projects.
Show HN: Cpdown – Copy any webpage/YouTube subtitle as clean Markdown(LLM-ready)
Cpdown is a browser extension that allows users to copy the content of any webpage as clean, formatted markdown with one click or keyboard shortcut, also supporting YouTube subtitle copying. The extension uses tools like Defuddle and Mozilla's Readability to extract the main content, remove unnecessary HTML elements, and provide features like token counting and keyboard shortcut support.