Friday — October 10, 2025

Figure 03, a 3rd generation humanoid robot, is unveiled with advanced features, while researchers propose BlockRank to improve In-Context Retrieval efficiency, and Open-Agent, an open-source Agentic AI system, is released for customizable multi-agent collaboration.

News

Two things LLM coding agents are still bad at

The author has been trying to use Large Language Models (LLMs) for coding help, but finds their approach to code feels awkward, citing two main reasons: LLMs don't use copy-paste functionality like humans do, and their problem-solving approach is alien, relying on assumptions and brute force rather than asking questions. The author believes these quirks mean LLMs are not replacing human developers, but rather acting like overconfident interns that are difficult to fully work with.

Figure 03, our 3rd generation humanoid robot

Figure 03 is a 3rd generation humanoid robot designed to be a general-purpose robot that can perform human-like tasks and learn directly from people, with features such as a redesigned sensory suite and hand system, soft goods, and wireless charging. The robot has been engineered from the ground-up for high-volume manufacturing, with a new supply chain and manufacturing process, making it more cost-effective and scalable for use in homes and commercial applications.

McKinsey wonders how to sell AI apps with no measurable benefits

McKinsey & Company is warning software vendors that they risk increasing prices for AI-powered applications without delivering measurable benefits, such as cost savings or productivity boosts, to their customers. The consultancy identifies three main challenges to AI software monetization, including the inability to show quantifiable returns on investment, underinvestment in change management, and unpredictable pricing models, and suggests that vendors need to rethink their pricing strategies to successfully navigate the AI market.

Launch HN: Extend (YC W23) – Turn your messiest documents into data

Extend is a document processing platform that provides a suite of APIs and tooling to help technical teams ship production-ready pipelines in record time, achieving accuracy of over 99% and reducing implementation time from months to days. The platform is trusted by top companies, including Brex, Flatiron, and Vendr, who have seen significant improvements in accuracy and efficiency after implementing Extend's solution.

The AI valuation bubble is now getting silly

The current AI valuation bubble is being compared to the dotcom bubble of the late 1990s, with extremely stretched valuations and high concentration risk in the market. The development of AI is expected to have society-changing effects, but it's impossible to know the speed of adoption and which companies will earn extraordinary returns, making it difficult to predict when the bubble will burst.

Research

DeepMind's paper reveals Google's new direction on RAG: In-Context Retreival

In-context Ranking (ICR) is an effective Information Retrieval paradigm that leverages large language models (LLMs), but its efficiency is hindered by quadratic scaling of attention operations with context length. The proposed BlockRank method addresses this issue by adapting the attention operation to exploit inherent structures in LLMs, resulting in a more efficient and effective solution that matches or outperforms existing state-of-the-art rankers.

Barbarians at the Gate: How AI Is Upending Systems Research

Artificial Intelligence (AI) is transforming the research process by automating the discovery of new solutions, particularly in systems research where reliable verifiers can accurately determine solution effectiveness. The AI-Driven Research for Systems (ADRS) approach has been shown to discover algorithms that outperform human-designed ones, and its adoption is expected to shift the focus of human researchers from algorithm design to problem formulation and strategic guidance.

Advancing medical artificial intelligence using a century of cases

Researchers created a benchmark called CPC-Bench to evaluate the performance of large language models (LLMs) in medical diagnosis and presentation, and found that LLMs can outperform physicians in complex text-based differential diagnosis and generate high-quality medical presentations. However, LLMs still struggle with image interpretation and literature retrieval, highlighting areas for further improvement in medical artificial intelligence.

Evaluating LLM Generated Detection Rules in Cybersecurity

An open-source evaluation framework and benchmark metrics have been developed to assess the effectiveness of Large Language Model (LLM)-generated cybersecurity rules, providing a realistic evaluation of their usefulness. The framework uses a holdout set-based methodology to compare LLM-generated rules to human-generated ones, offering three key metrics to measure their effectiveness in a multifaceted way.

Self-Correction Bench: Revealing and Addressing LLM Self-Correction Blind Spot

Large language models (LLMs) have a significant limitation, known as the Self-Correction Blind Spot, where they are unable to correct their own errors despite being able to correct identical errors from external sources. This blind spot, which affects an average of 64.5% of tested models, may be influenced by training data and can be significantly reduced by simple interventions, such as adding a "Wait" prompt, highlighting a potential path to enhancing LLM reliability.

Code

Open-Source Agentic AI

Open-Agent is an open-source, customizable Agentic AI system that integrates multiple AI models to work together seamlessly, allowing for multi-agent collaboration and self-hosting. It provides a framework for users to deploy and modify their own Agentic AI, with features such as spec and context engineering, and welcomes contributions from the community to enhance its capabilities.

Show HN: Open-Source Voice AI Badge Powered by ESP32+WebRTC

To participate in the VapiCon 2025 Hardware Workshop, users must install ESP-IDF v5.5.1 or later and have either an AtomS3R or Atomic Echo Base device, then clone the workshop repository and configure their WiFi credentials and Bearer Token. The project can then be built and flashed to the device using the provided commands, with troubleshooting guides available for common issues such as build errors and missing dependencies.

Show HN: SHAI – a (yet another) open-source, terminal-native AI coding assistant

Shai is a coding agent that lives in the terminal, written in Rust, and can be installed using a simple command to provide a pair programming buddy experience. It can be configured to use various providers, run in headless mode, and even act as a shell assistant to propose fixes for failed commands, with customization options available through configuration files and custom agents.

Show HN: I Built Claude Code for CUDA in 18 Hours (Open Source)

RightNow CLI is an AI-powered CUDA development assistant that helps users write, optimize, and debug GPU code, offering features such as code completion, integrated debugging, and advanced AI capabilities. The tool is available for free with no credit card required, and users can upgrade to premium models and features as needed, with support for various operating systems, including Windows, Linux, and macOS.

Show HN: In-Context Index for In-Context Retrieval

PageIndex is a vectorless, reasoning-based RAG system that represents documents as hierarchical tree structures, enabling large language models (LLMs) to navigate and retrieve information through structure and reasoning. PageIndex MCP exposes this tree index directly to LLMs, allowing platforms like Claude and Cursor to reason over document structure and retrieve information without vector databases, and can be used to chat with long PDFs in a human-like way.