Sunday June 1, 2025

Cerebras shatters speed records with Llama 4 Maverick at 2,500 tokens/s, researchers unveil TrojanStego for covert data embedding in language models, and Rigorous AI Reviewer offers free manuscript analysis in days.

News

AI Responses May Include Mistakes

The author of the article attempted to look up information on an IBM PS/2 model using Google's AI-powered search, but the results were inconsistent and often completely incorrect, with the AI "hallucinating" plausible-sounding answers. The author notes that this highlights a major issue with AI-powered search, where incorrect answers can be convincing and misleading, especially for non-experts, and that users should be cautious when relying on these results.

Cerebras achieves 2,500T/s on Llama 4 Maverick (400B)

Cerebras has broken the 2,500 tokens per second barrier with its Llama 4 Maverick 400B model, more than doubling the performance of Nvidia's flagship solution, which achieved 1,000 tokens per second. This record-breaking performance makes Cerebras the optimal solution for Llama 4 in any deployment scenario, offering significantly faster inference speeds than other vendors, including Nvidia, SambaNova, Amazon, Groq, Google, and Microsoft Azure.

Sguaba: Hard-to-misuse rigid body transforms for engineers

Helsing has developed and open-sourced Sguaba, a Rust crate that provides a robust and safe way to handle rigid body transforms and coordinate systems, making it easier for engineers to work with complex spatial data without needing to be experts in linear algebra. Sguaba strongly types coordinates and vectors with their respective coordinate system, implements conversions between them, and provides a simple and safe API for transformations, making it difficult to accidentally mix up different coordinate systems or conventions.

The 55% Regret Club: How AI-First Companies Are Learning Lessons the Hard Way

55% of companies that replaced humans with AI now admit they made wrong decisions about those layoffs, with many experiencing significant regret and costly reversals, highlighting the importance of a human-first approach to AI implementation. This phenomenon, dubbed the "55% Regret Club," is supported by comprehensive research from Orgvue, which surveyed over 1,100 C-suite and senior leaders, and is exemplified by high-profile failures at companies such as IBM, McDonald's, and Klarna, where AI was used to replace humans rather than augment their capabilities.

AI didn't kill Stack Overflow

Stack Overflow, once a thriving online community for developers to share knowledge and solve problems, has been in decline due to a combination of factors, including its own self-governance experiment gone wrong and the rise of large language models like ChatGPT. The site's reputation system, which was initially a key to its success, ultimately led to a culture of strict moderation and a focus on transactional Q&A, stripping away the human element that made it great and leaving it vulnerable to disruption by AI-powered tools.

Research

Enhancing Code Quality with Generative AI: Boosting Developer Warning Compliance

Programmers often ignore warnings from static analysis tools due to the potential for false-positives and confusing messages. Large language models can help increase compliance by simplifying warnings, explaining their significance, and suggesting potential fixes, making it easier for developers to address and resolve issues.

TrojanStego: Your Language Model Can Be a Steganographic Agent

Researchers have proposed a novel threat model called TrojanStego, where an adversary fine-tunes a large language model to embed sensitive information into its outputs through linguistic steganography. Experimental results show that compromised models can reliably transmit secrets with high accuracy, while maintaining their utility and evading human detection, highlighting a new class of covert and dangerous data exfiltration attacks.

YOLO-World: Real-Time Open-Vocabulary Object Detection

The YOLO-World approach enhances the YOLO detector series with open-vocabulary detection capabilities through vision-language modeling and pre-training on large-scale datasets. YOLO-World achieves state-of-the-art performance in detecting a wide range of objects, including a 35.4 AP with 52.0 FPS on the LVIS dataset, and excels in downstream tasks such as object detection and open-vocabulary instance segmentation.

Atlas: Learning to Optimally Memorize the Context at Test Time

Transformers have become the standard for sequence modeling due to their effectiveness, but their limitations in handling long sequences have led researchers to explore alternative architectures. The proposed ATLAS long-term memory module and DeepTransformers architecture address these limitations, demonstrating superior performance in various tasks, including language modeling and long-context understanding, and achieving significant improvements over existing models.

Beyond Attention: Toward Machines with Intrinsic Higher Mental States

This work proposes a new approach to determining relevance in machine learning models, inspired by cellular neurobiological evidence, which enables models to pre-select relevant information before applying attention. The approach, which involves triadic neuronal-level modulation loops, leads to significantly faster learning and reduced computational demand, with results demonstrated in various applications including reinforcement learning, computer vision, and natural language question answering.

Code

Show HN: AI Peer Reviewer – Multiagent system for scientific manuscript analysis

The Rigorous AI Reviewer is a cloud-based tool that provides comprehensive manuscript analysis and feedback, available for free, with users able to upload their manuscripts and receive a PDF report within 1-2 working days. The tool is part of a larger project aimed at making scientific knowledge creation, evaluation, and distribution more transparent, cheaper, faster, and better, with multiple components in development, including a manuscript fit evaluation tool.

Show HN: I built an AI agent that turns ROS 2's turtlesim into a digital artist

The turtlesim_agent project is an AI agent that uses natural language to control the classic ROS turtlesim simulator, allowing users to create drawings by describing shapes and intentions in plain English. The agent, powered by LangChain, interprets text-based instructions and translates them into visual drawings, enabling creative expression and interaction with the simulated environment.

Show HN: Tracking Merged PRs by OpenAI's Codex and GitHub's Copilot

The statistics show that Copilot has a total of 10,920 pull requests (PRs) with a merge rate of 29.27%, while Codex has 73,791 PRs with a significantly higher merge rate of 83.05%. These statistics can be explored further through an interactive dashboard and were gathered from GitHub search queries for Copilot and Codex PRs.

Google AI Edge Gallery

The Google AI Edge Gallery is an experimental app that allows users to explore and experience the capabilities of cutting-edge Generative AI models directly on their Android and iOS devices, without needing an internet connection. The app features various tools, including image analysis, prompt lab, AI chat, and performance insights, and allows users to choose from different models and even bring their own models to test.

Chatterbox, Resemble AI's production-grade open source TTS model

Chatterbox is an open-source text-to-speech (TTS) model developed by Resemble AI, which has been benchmarked against leading closed-source systems and offers features such as emotion exaggeration control. The model can be easily installed and used to generate high-quality speech, with options for customizing voice and expression, and is also available as a competitively priced TTS service for production use.