Sunday — June 15, 2025

Saab reaches new AI milestone with the Gripen E, Builder.ai confirms its AI coding tools don't rely on 700 engineers, and the PSVita-LLM project brings AI text generation to handheld gaming.

News

Saab achieves AI milestone with Gripen E

Saab has achieved an AI milestone with its Gripen E fighter jet, although the details of the milestone are not specified in the provided text. The text primarily focuses on the company's use of cookies and related tracking technologies on its website, outlining the different types of cookies used and their purposes, including necessary, preference, statistics, and marketing cookies.

AMD's AI Future Is Rack Scale 'Helios'

AMD has launched its next-generation Instinct MI350 series of GPU-based accelerators, featuring the new CDNA4 architecture, which doubles matrix operation performance and supports lower precision FP6 and FP4 formats, offering up to 4x faster performance than its predecessor. The company is also expanding its AI ecosystem with rack-scale solutions, including turnkey systems combining AMD CPUs, GPUs, and networking, and has announced a roadmap to 20x rack-scale energy efficiency by 2030, with next-generation products planned for 2026.

Building a WordPress MCP server for Claude

Paolo Valdemarin built a custom Model Context Protocol (MCP) server to connect Claude, an AI assistant, directly to his WordPress blog, enabling automated blog post creation with proper formatting and intelligent categorization. This integration allows Paolo to ask Claude to research a topic, write a well-structured blog post, categorize it, and publish it directly to his blog, streamlining his writing workflow and maintaining editorial control.

RAG Is a Fancy, Lying Search Engine

RAG (Retrieval Augmented Generation) is a GenAI application design pattern that supplements a user's LLM prompt with dynamically retrieved information to improve the LLM's response. However, the author argues that RAG is unfit for high-stakes use cases in regulated industries because it allows the LLM to speak last, which can be irresponsible and unsafe, and that its popularity is driven by factors such as ease of implementation, VC funding, and influential endorsements, rather than its actual value or suitability for critical applications.

Builder.ai did not "fake AI with 700 engineers"

Builder.ai, a company that claimed to use AI for coding tasks, was reported to have faked AI with 700 engineers in India, but this claim has been found to be untrue. In reality, Builder.ai's system, called Natasha, utilized a network of developers and later incorporated large language models (LLMs) to generate code, making it a more efficient and effective solution than relying solely on human engineers to pretend to be an AI.

Research

Clinical knowledge in LLMs does not translate to human interactions

There is no text to summarize. The provided input is an error message indicating that an abstract was not found.

Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives

The current pay-per-token pricing mechanism for cloud-based large language models creates a financial incentive for providers to misreport the number of tokens used, allowing them to overcharge users without detection. A proposed alternative pricing mechanism, where users pay per character, eliminates this incentive and prevents providers from strategizing to overcharge users.

Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques

This paper proves that the capabilities of supervised fine-tuning can be approximated using inference-time techniques, such as in-context learning, without altering model parameters, and provides bounds on the dataset size required to achieve this approximation. The results provide a theoretical foundation for efficient deployment of large language models, and have implications for practical applications such as text generation and linear classification.

CRMArena-Pro: LLM Agents Assessed Across Diverse Business Scenarios

CRMArena-Pro is a novel benchmark for assessing the performance of large language model (LLM) agents in diverse professional settings, featuring 19 expert-validated tasks and multi-turn interactions. Experiments using CRMArena-Pro revealed that leading LLM agents struggle with multi-turn settings and confidentiality awareness, achieving only around 58% single-turn success and near-zero inherent confidentiality awareness, highlighting a substantial gap between current LLM capabilities and enterprise demands.

Memoir: Lifelong Model Editing with Minimal Overwrite Informed Retention for LLM

MEMOIR is a novel framework for editing language models that allows for efficient and reliable post-hoc updates without retraining or forgetting previous information. It achieves state-of-the-art performance by injecting knowledge through a residual memory module, minimizing interference among edits, and enabling generalization to rephrased queries through sparse activation patterns.

Code

Voice-controlled agentic robot with pi0

LeRobot is a platform that provides tutorials and resources for robotics, including step-by-step guides and open-source hardware and software projects. The platform supports various policy networks, including ACT, Diffusion Policy, and TD-MPC, and provides tools for data collection, teleoperation, and policy training and evaluation in both simulated and real-world environments.

Show HN: An LLM Running on a PS Vita

The PSVita-LLM project allows a PlayStation Vita to run a modified version of the LLaMA AI model, enabling it to generate text and stories on the handheld console. The project includes features such as an interactive model selector, a full game loop, and the ability to download and manage models, with potential future improvements including code refactoring and multithreading for better performance.

LLM Debugger – Visualize OpenAI API Conversations

LLM Logger is a lightweight tool for logging and inspecting interactions with large language models like OpenAI GPT-4, allowing developers to track and debug conversations, view differences between turns, and compare prompt strategies. It features one-line setup, automatic session tracking, local-first logging, and a simple UI, making it ideal for developers building agent workflows, chat interfaces, or prompt-based systems.

Prompty: An asset class and format for LLM prompts

Prompty is an asset class and format for LLM prompts designed to enhance observability, understandability, and portability for developers, with the primary goal of accelerating the developer inner loop. The Prompty Visual Studio Code extension provides an intuitive prompt playground to streamline the prompt engineering process, offering features such as quick creation, preview, and model configuration management.

Show HN: SharkMCP, a Tshark MCP Server

SharkMCP is a Model Context Protocol (MCP) server that provides network packet capture and analysis capabilities through Wireshark/tshark integration, designed for AI assistants to perform network security analysis and troubleshooting. The server offers features such as async packet capture, PCAP file analysis, flexible output formats, and SSL/TLS decryption, and can be installed and run on various platforms, including macOS, Ubuntu/Debian, and Windows.