Saturday July 26, 2025

Researchers successfully performed experimental surgery using an AI-driven surgical robot, a new study explores sparse attention trade-offs in Transformer LLMs, and a novel terminal app called Baag enables running multiple AI coding agents on the same project.

News

Show HN: Price Per Token – LLM API Pricing Data

The current pricing for major LLM APIs, including OpenAI, Anthropic, and Google, ranges from $0.07 to $150.00 per 1 million tokens, with varying costs for input and output. The prices listed are subject to change and may use tiered pricing based on prompt length, with providers having different methods for counting tokens.

Quantitative AI progress needs accurate and transparent evaluation

As technology matures, the focus shifts from qualitative achievements to quantitative measurements, such as cost, efficiency, and scalability, in order to scale up from proofs of concept to mass adoption. This transition is now happening in AI, where the key questions are no longer about feasibility, but about optimizing cost-efficiency, safety, and scalability, and standardized benchmarks and transparent evaluation will become increasingly important to accurately measure progress.

Experimental surgery performed by AI-driven surgical robot

Researchers at John Hopkins University have successfully performed experimental surgery using an AI-driven surgical robot, where a ChatGPT-like AI controlled a DaVinci robot to perform a gallbladder-removal surgery on pig organs. The AI was able to perform the surgery without human intervention, marking a significant step forward in the development of autonomous surgical systems that can potentially revolutionize the field of surgery.

What if AI made the world’s economic growth explode?

The world economy has experienced accelerating growth over the centuries, from 0.1% a year on average before 1700 to 2.8% in the 20th century, driven by technological advancements such as the spinning jenny and steam engine. The article explores the potential for artificial intelligence to further explode economic growth, upending markets for goods, services, and financial assets, as well as labor.

The Mythical Machine-Month Paradox – How much could AI change programming?

The software industry is facing an identity crisis due to the rise of generative AI, which has led to mass layoffs and concerns about the future of software engineering, with some predicting that 95% of code will be written by AI by the end of the decade. However, the complexity of software development, which involves creating and refining a theoretical model to solve a problem, cannot be fully replicated by AI, as it requires understanding the user's intent, accounting for edge cases and interactions, and testing, making it unlikely that AI will replace human software engineers entirely.

Research

The Sparse Frontier: Sparse Attention Trade-Offs in Transformer LLMs

Researchers conducted a thorough comparison of sparse attention methods in Transformer LLMs, exploring their efficiency and accuracy trade-offs at various model scales and sequence lengths. The study's findings highlight the potential of sparse attention to enhance long-context capabilities, but also reveal that it is not a universal solution and requires careful evaluation of trade-offs, with optimal strategies varying across tasks and phases.

Explainable Mapper: Charting LLM Embedding Spaces Using Perturbation-Based

Large language models produce complex embeddings that capture semantic and syntactic relationships, and analyzing these embeddings via mapper graphs can reveal their underlying structures. A new framework uses semi-automatic annotation and customizable agents to explore and explain the properties of these embeddings, allowing for scalable and automated analysis of their characteristics and robustness.

TaxCalcBench: Can AI file your taxes? (not yet)

Current AI models are unable to accurately file personal income taxes, with state-of-the-art models successfully calculating less than a third of federal income tax returns even in simplified scenarios. The models' limitations include misusing tax tables, making calculation errors, and incorrectly determining eligibility, highlighting the need for additional infrastructure to apply large language models to this task.

WhoFi: Deep Person Re-Identification via Wi-Fi Channel Signal Encoding

WhoFi is a novel pipeline that uses Wi-Fi signals for person re-identification, addressing challenges like poor lighting and occlusion that hinder traditional visual-based methods. The approach, which extracts biometric features from Channel State Information and uses a Deep Neural Network, achieves competitive results compared to state-of-the-art methods, demonstrating its effectiveness in identifying individuals via Wi-Fi signals.

Setol: SemiEmpirical Theory of (Deep) Learning

The SemiEmpirical Theory of Learning (SETOL) provides a formal explanation for the performance of State-Of-The-Art Neural Networks, using techniques from statistical mechanics and other fields to derive new mathematical preconditions for ideal learning. SETOL is tested on simple and complex neural networks, demonstrating excellent agreement with theoretical assumptions and showing that its layer quality metrics align well with existing metrics, such as Heavy-Tailed Self-Regularization (HTSR) alpha.

Code

Zignal: A zero-dependency image processing library

Zignal is a zero-dependency image processing library written in Zig, inspired by the dlib library, and is currently in early stages of development with a changing API. The library features various image processing capabilities, including color space conversions, matrix operations, and geometry, as well as a Canvas drawing API, and is being used internally by Ameli for their makeup virtual try-on feature.

Show HN: Baag – Easily run multiple AI coding agents on the same project

Baag is a terminal app that allows users to run Claude Code, Gemini, or Codex in separate isolated workspaces within the same project, enabling features like creating and managing worktrees, submitting changes, and configuring preferences. The app can be installed and updated using simple commands, and it supports various dependencies, including git, node, and tmux, to provide a personalized development experience.

Show HN: MicroMonitor – Lightweight Server Monitoring Built by AI in 24 Hours

MicroMonitor is a lightweight server monitoring tool that provides real-time insights into server health, offering features such as live updates, process tracking, and smart alerts, all while using minimal resources. It is a free, open-source, and self-hosted solution designed for developers, small businesses, and individuals who want simple and effective server monitoring, and was uniquely built entirely by an autonomous AI system.

Coze Studio: all-in-one AI agent development tool

Coze Studio is an all-in-one AI agent development tool that provides a convenient environment for developing, deploying, and managing AI agents, offering features such as model management, agent building, and workflow creation. The open-source version of Coze Studio can be deployed and used for free, with detailed guides and documentation available for quickstart, development, and troubleshooting, although some features are limited to the commercial version.

Show HN: Narev – Open-Source FinOps for AI and Cloud Spend

Narev is an open-source, self-hosted FinOps platform that helps users analyze and optimize their AI and cloud spend by unifying cost and usage data from various providers, including AWS, Azure, GCP, and OpenAI. The platform provides real-time dashboards, standardized analytics, and actionable recommendations while keeping user data private and under their control.

    Researchers successfully performed experimental surgery using an AI-driven surgical robot, a new study explores sparse attention trade-offs in Transformer LLMs, and a novel terminal app called Baag enables running multiple AI coding agents on the same project.