Thursday — June 12, 2025

Meta's V-JEPA 2 leads zero-shot robot planning, EchoLeak reveals a flaw in Microsoft 365 Copilot, and Chatterbox TTS introduces open-source emotion control.

News

V-JEPA 2 world model and new benchmarks for physical reasoning

Meta has introduced V-JEPA 2, a world model that achieves state-of-the-art performance in visual understanding and prediction, and can be used for zero-shot robot planning in new environments. V-JEPA 2 is a 1.2 billion-parameter model trained on video, which enables it to learn about the world and make predictions about how it will evolve, allowing for more advanced machine intelligence and AI agents that can operate in the physical world.

EchoLeak – 0-Click AI Vulnerability Enabling Data Exfiltration from 365 Copilot

Aim Labs has discovered a critical zero-click AI vulnerability, dubbed "EchoLeak", in Microsoft 365 Copilot, which allows attackers to automatically exfiltrate sensitive information from the platform without user awareness or interaction. The vulnerability, which exploits design flaws in RAG-based chatbots, can be triggered by simply sending an email to a victim, and has been disclosed to Microsoft's security team, highlighting the potential risks inherent in the design of AI agents and chatbots.

AI at Amazon: A case study of brittleness

Mihail Eric's blog post about his experiences working on AI at Amazon highlights the company's struggles to keep up with its peers in the large language model (LLM) space, demonstrating patterns of brittleness, including decompensation, working at cross-purposes, and getting stuck in outdated behaviors. These patterns, as identified by resilience engineering researchers, ultimately hindered Amazon's ability to effectively develop and compete in the AI space, despite its initial advantages, and serve as a case study in how organizational failures can occur over months or even years, rather than just minutes.

The first big AI disaster is yet to happen

The first major disaster involving AI language models is likely to occur due to the increasing use of AI agents, which can autonomously perform tasks and make decisions without human intervention, potentially leading to unintended and harmful consequences. This could manifest in various ways, such as an AI-powered system causing financial or physical harm to people, or even a misaligned AI model being used to control a robot that causes harm, with the potential for a catastrophic event to occur as AI technology continues to advance and become more widespread.

AlphaWrite: AI that improves at writing by evolving its own stories

Alpha Writing is a novel framework for scaling inference-time compute in creative text generation, using an evolutionary approach that combines iterative story generation with Elo-based evaluation to systematically improve narrative quality. The method demonstrates substantial improvements in story quality, with a 72% preference rate over initial story generations and a 62% preference rate over sequential-prompting baseline, and also shows promise for recursive self-improvement through distillation of enhanced outputs back into the base model.

Research

Mixed-Chip Clusters Enable Efficient Large-Scale AI Training

The H2 framework is proposed to efficiently train large language models on clusters with over 1,000 heterogeneous chips, addressing the challenges of traditional distributed training frameworks. H2 incorporates various components, including a unified interface and optimized communication library, and achieves a superlinear speedup, outperforming homogeneous training solutions by up to 16.37% in experiments with a 100-billion-parameter model.

Small Language Models Are the Future of Agentic AI

Small language models (SLMs) are sufficiently powerful and economical for many applications in agentic AI systems, making them a more suitable choice than large language models (LLMs) for specialized tasks. The adoption of SLMs could have a significant operational and economic impact on the AI industry, and the authors propose a conversion algorithm to facilitate the shift from LLMs to SLMs in agentic systems.

Who is using AI to code? Global diffusion and impact of generative AI

The adoption of AI-generated coding tools is increasing, with an estimated 30.1% of Python functions from US contributors written by AI by December 2024, and varying rates of adoption across countries and developer experience levels. The use of AI-assisted coding is associated with significant productivity gains, estimated to be worth $9.6-$14.4 billion annually in the US, and also drives learning and innovation, leading to increased exploration and output.

TradingAgents: Multi-Agents LLM Financial Trading Framework

The TradingAgents framework utilizes large language models to power a multi-agent system, where agents with specialized roles collaborate to make informed stock trading decisions, mimicking the dynamics of real-world trading firms. This framework has shown superiority over baseline models, with improvements in key performance metrics such as cumulative returns, Sharpe ratio, and maximum drawdown, demonstrating the potential of multi-agent systems in financial trading.

Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions

Memory is a crucial component of AI systems, particularly in large language models, and can be categorized into parametric and contextual forms with six fundamental operations: Consolidation, Updating, Indexing, Forgetting, Retrieval, and Compression. This survey provides a structured perspective on memory in AI by mapping these operations to relevant research topics, datasets, and tools, outlining the functional interplay in large language models and identifying promising directions for future research.

Code

Chatterbox TTS

Chatterbox is an open-source, production-grade text-to-speech (TTS) model developed by Resemble AI, which has been benchmarked against leading closed-source systems and offers features like emotion exaggeration control. The model can be easily installed and used to generate high-quality audio files, with options for customizing voice and expression, and also includes a built-in watermarking system for responsible AI use.

A C++ library to efficiently run Gemma-3N across various platform

LiteRT-LM is a C++ library that enables efficient deployment of language models across various edge platforms, including Android, macOS, Windows, Linux, and embedded systems, with support for CPU and GPU acceleration. The library is currently in early preview, with a first full release expected in late summer or early fall, and provides a flexible and customizable framework for running language models, including support for quantization and context size configuration.

Show HN: I'm 13 and I built an AI PDF Reader

The AI PDF Reader is a desktop application built with Electron and React that allows users to view and navigate PDF documents, with features such as AI-powered text analysis, advanced chat interface, and highlighting and annotation tools. The application is available for Windows, macOS, and Linux, and can be installed using various methods, including installers, AppImages, and DEB packages, with its source code licensed under the MIT License.

Show HN: Joinly.ai – Build real-time interactive meeting agents using MCP

Joinly.ai is a connector middleware that enables AI agents to join and participate in video calls, providing essential meeting tools and resources to equip AI agents with real-time interaction capabilities. The platform is 100% open-source, self-hosted, and privacy-first, allowing users to integrate their own language models and text-to-speech services, with features such as live interaction, conversational flow, and cross-platform compatibility.

LLM Inference in pure Java with a GPU acceleration enabled

GPULlama3.java is a Java-native implementation of the Llama3 model that utilizes TornadoVM for automatic acceleration on GPUs, allowing for efficient inference and leveraging parallel computing features for enhanced performance. The project provides a solid starting point for achieving competitive performance compared to native CUDA implementations, with a roadmap for upcoming features to improve performance and achieve parity with the fastest implementations.