Tuesday — June 17, 2025

Salesforce finds LLM agents struggle with CRM tasks, a study warns of cognitive debt from AI-assisted essay writing, and Apple's project brings OpenAI-compatible API to on-device models.

News

Generative AI coding tools and agents do not work for me

The author, a software engineer, shares their personal experience with Generative AI coding tools, stating that they do not make them faster or more efficient, as they still need to thoroughly review and understand the generated code to ensure its quality and reliability. The author believes that relying on AI-generated code without proper review is irresponsible, especially when working on projects with legal obligations and financial stakes, and that AI tools cannot replace the need for human judgment and expertise in software development.

Salesforce study finds LLM agents flunk CRM and confidentiality tests

A study by Salesforce found that large language model (LLM) agents performed poorly on customer relationship management (CRM) tasks and failed to understand the need for customer confidentiality, achieving a 58% success rate on single-step tasks and 35% on multi-step tasks. The study highlights significant gaps in the capabilities of current LLM agents and the demands of real-world enterprise scenarios.

'It is a better programmer than me': The reality of being laid off due to AI

The increasing use of artificial intelligence is posing a significant threat to jobs, with many roles being automated, leaving workers like Jane, a 45-year-old former human resources manager, suddenly unemployed. Experts warn that anyone whose job is done on a computer all day is at risk of being replaced by AI, with one engineer stating "it's just a matter of time" before these jobs are automated.

The Illusion of Thinking: A Reality Check on AI Reasoning

A recent paper by Apple, "The Illusion of Thinking," challenges the assumptions about the capabilities of large language models (LLMs) by testing their reasoning abilities under controlled conditions, revealing that even top-tier models suddenly and completely collapse when faced with complex tasks. The study highlights the limitations of current LLMs, showing that they can produce fluent but incorrect reasoning, and emphasizes the need for a clearer understanding of their capabilities and limitations to build more robust systems.

Defense Department signs OpenAI for $200M 'frontier AI' pilot project

The US Department of Defense has awarded a $200 million contract to OpenAI for a pilot program to develop "frontier AI" capabilities, with the DoD stating the deal covers "warfighting" while OpenAI mentions healthcare and cyber defense. The contract is part of OpenAI's new "OpenAI for Government" initiative, which aims to bring the company's technology to the public sector.

Research

Accumulation of cognitive debt when using an AI assistant for essay writing task

This study found that participants who used large language models (LLMs) to assist with essay writing exhibited weaker brain connectivity and lower cognitive engagement compared to those who used search engines or wrote without tools. The results also showed that LLM users struggled with self-reported ownership of their work, memory recall, and quoting their own writing, raising concerns about the long-term educational implications of relying on LLMs.

Breaking Quadratic Barriers: A Non-Attention LLM for Ultra-Long Context Horizons

This novel architecture for large language models handles long context windows by combining components like State Space blocks and Multi Resolution Convolution layers, avoiding the quadratic memory and computation issues of traditional Transformer designs. The model uses a combination of these components, along with a Recurrent Supervisor and Retrieval Augmented External Memory, to efficiently process sequences of hundreds of thousands to millions of tokens.

Appraisal-Based Chain-of-Emotion Improves AI Persona Accuracy

The development of digital artificial agents that can simulate human emotions is a challenging field, but large language models (LLMs) may help address these challenges by identifying patterns in situational appraisal. A study found that a new chain-of-emotion architecture using LLMs outperformed standard architectures in simulating emotions, particularly in video games, and provides evidence for constructing affective agents based on cognitive processes represented in language models.

Open-Source RISC-V: Energy Efficiency of Superscalar, Out-of-Order Execution

Open-source RISC-V cores, such as CVA6S+ and C910, have been developed and modified to achieve high performance and compliance with RISC-V standards, with CVA6S+ showing a 34.4% performance improvement over its predecessor. The analysis of these cores reveals that while high-performance cores like C910 exhibit increased area and energy consumption, they can still be competitive in energy efficiency, challenging the notion that high performance comes at a significant cost in area and energy efficiency.

ZjsComponent: A Pragmatic Approach to Reusable UI Fragments for Web Development

ZjsComponent is a lightweight, framework-agnostic web component that allows developers to create modular, reusable UI elements with minimal overhead, requiring only a browser that can load and execute Javascript. It enables dynamic loading and isolation of HTML+JS fragments, providing a simple way to build reusable interfaces with significant DOM and code isolation, lifecycle hooks, and traditional class methods, all without dependencies or build steps.

Code

Show HN: Trieve CLI – Terminal-Based LLM Agent Loop with Search Tool for PDFs

Trieve is an all-in-one solution for search, recommendations, and Retrieval-Augmented Generation (RAG) that offers features such as self-hosting, semantic dense vector search, typo tolerant full-text search, and sub-sentence highlighting. The platform provides a range of tools and APIs, including a TypeScript SDK, Python SDK, and OpenAPI specification, and allows users to bring their own models and integrate with other services like OpenAI and Qdrant.

vibetunnel - turn any browser into a terminal and command your agents on the go

VibeTunnel is a tool that allows users to access their Mac terminal from any device with a web browser, enabling remote monitoring and control of terminal sessions, including AI agents and development environments. It offers features such as zero-configuration setup, secure tunneling, and mobile readiness, making it easy to use and access terminal sessions from anywhere.

Apple-on-device-OpenAI: OpenAI-compatible API server for Apple on-device models

This project creates a SwiftUI application that implements an OpenAI-compatible API server using Apple's on-device Foundation Models, allowing for local AI processing through familiar OpenAI API endpoints. The application is designed as a GUI app to avoid Apple's rate limiting policies for command-line tools, and it provides features such as streaming support, model availability checks, and compatibility with OpenAI's API.

Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

Maximal Update Parametrization (μP) is a technique that allows for stable hyperparameters across neural network sizes, making it easier to tune extremely large neural networks. The μP package provides a tool to implement this technique in PyTorch models, enabling effortless and less error-prone implementation of μP, and includes features such as automatic scaling of parameters and learning rates.

Show HN: Bolt Automations – Save Tokens and Time on Bolt.new

Bolt Automations is a Chrome extension that helps users save tokens and time by automatically managing their bolt.new workflow, switching to discussion mode after task completion, and sending real-time Discord notifications. The extension also offers optional AI-powered summaries, stores credentials locally for privacy, and is available for installation from the Chrome Web Store or GitHub Releases.