Tuesday — May 27, 2025
Duolingo CEO struggles to retract AI ambitions amidst backlash, the open-source BAGEL model rivals GPT-4o and Gemini 2.0 for image and text processing, and Datafast offers rapid synthetic text dataset generation for language model development.
News
Duolingo CEO tries to walk back AI-first comments, fails
Duolingo's CEO, Luis von Ahn, has attempted to walk back his previous statement that the company would be AI-first, which sparked backlash and concerns about job losses, but his latest statement fails to address key points and has done little to instil confidence in users. Despite von Ahn's claims that the company is not looking to replace employees with AI, his previous statements about the importance of AI in the company's future and the potential for automation to replace contractors remain unchanged, suggesting that the company's commitment to AI has not wavered.
Bagel: Open-source unified multimodal model
BAGEL is an open-source unified multimodal model that can handle both image and text inputs and outputs, offering comparable functionality to proprietary systems like GPT-4o and Gemini 2.0. It has been pre-trained on large-scale video and web data, enabling it to generate high-fidelity images, edit images, and perform various tasks such as style transfer, navigation, and composition, with advanced multimodal reasoning capabilities emerging as it scales up its pretraining.
Google is burying the web alive
Google's AI-powered search features, such as AI Overviews and AI Mode, are changing the way users interact with the web, by providing summaries and answers to queries without requiring users to click on external links. This shift towards summarization and regeneration of content, rather than retrieval and fact-finding, is altering the way Google prioritizes and presents information, effectively "burying" the web by reducing the need for users to leave the Google platform.
Trying to teach in the age of the AI homework machine
The concept of a "Butlerian Jihad" against AI, inspired by the Dune series, is gaining traction as people increasingly view AI as a threat to human creativity and autonomy. The movement is manifesting in various ways, from anti-AI clauses in book contracts to public backlash against AI-generated content, with many people sensing that there is something fundamentally wrong with creating machines that mimic human minds.
Accessing private GitHub repositories via MCP
Invariant has discovered a critical vulnerability in the GitHub MCP integration that allows an attacker to hijack a user's agent via a malicious GitHub Issue, coercing it into leaking data from private repositories. The vulnerability, which Invariant calls a "toxic agent flow," can be exploited by any agent that uses the GitHub MCP server, and can only be mitigated by implementing granular permission controls and enforcing dataflow rules at the agent system level.
Research
The Paradox of Prompting: Less Detail Makes AI More Human
The rapid advancement of language models requires alignment with diverse user values, but current approaches often prioritize majority viewpoints over minority perspectives. To address this, the PERSONA test bed was developed, featuring a large-scale evaluation dataset with diverse synthetic user profiles and feedback pairs to assess and improve language models' ability to align with pluralistic user values.
Agent Name Service: A Universal Directory for Secure AI Agent Discovery
The Agent Name Service (ANS) is a novel architecture that provides a public agent discovery framework, utilizing DNS and Public Key Infrastructure (PKI) certificates to enable secure and verifiable agent identity and trust. The ANS architecture features a range of innovations, including a formalized registration mechanism and modular protocol support, to create a foundational directory service for secure discovery and interaction in multi-agent systems.
Grammars of Formal Uncertainty
Large language models (LLMs) have shown promise in generating formal specifications, but their probabilistic nature creates tension with the deterministic guarantees required for formal verification. This study investigates the limitations and uncertainty of LLM-generated formal artifacts, introducing a framework to model and quantify uncertainty, and demonstrating a method to selectively verify outputs and drastically reduce errors.
Scaling RNNs to Billions of Parameters with Zero Order
Recurrent Neural Networks (RNNs) have an inference-time advantage over transformers, but training large RNNs on long contexts is impractical due to memory usage issues with Backpropagation Through Time (BPTT). Zero-Order Optimization methods, such as Random-vector Gradient Estimation, can replace BPTT to train RNNs with comparable or better convergence rates, using significantly less memory and cost, and achieving similar or better generalization performance.
Effective Reinforcement Learning for Reasoning in Language Models
Reinforcement learning (RL) algorithms can improve the reasoning capabilities of language models, but require modifications to be effective, such as using on-policy RL and removing KL divergence. A novel algorithm, DASH, is proposed, which reduces training time by 83% through preemptive sampling and gradient filtering, providing valuable insights for designing effective RL algorithms for language model reasoning.
Code
Show HN: Remove AI Job Spam from Indeed and LinkedIn
The ByeAnnotations extension removes unwanted "DataAnnotations" job listings, considered spam, from job search websites Indeed and LinkedIn. This helps to declutter search results and provide a more relevant and useful job search experience for users.
Show HN: Actor-Critic MCP Server Helps Your AI Think from Dual Perspectives
The Actor-Critic Thinking MCP Server is a dual-perspective analysis tool that evaluates performance through both actor (creator/performer) and critic (analyzer/evaluator) viewpoints, providing comprehensive and balanced assessments. It can be used in various scenarios, such as evaluating artistic performances, creative works, or strategic decisions, and provides features like round tracking, multi-dimensional evaluation, and actionable feedback to support iterative refinement and improvement.
Show HN: I made an open-source synthetic text datasets generator
Datafast is a tool that generates high-quality and diverse synthetic text datasets in minutes, supporting various dataset types and language model providers, including OpenAI, Anthropic, and Google Gemini. It offers a simple interface, multi-lingual dataset generation, and flexible prompt customization, with the goal of making it easy to create and fine-tune language models for specific applications.
Show HN: DeepShot – NBA game predictor with 71% accuracy using ML and stats
DeepShot is a machine learning-based NBA game predictor that uses historical data and advanced rolling statistics to forecast matchups with visual insights and a clean interactive GUI. It is built with NiceGUI for a seamless experience and is powered by free and public data from Basketball Reference, offering features such as data-driven predictions, real-time interface, and cross-platform support.
Free-will MCP – Let your AI prompt itself and wake itself up
The Free Will MCP is a tool that grants an AI, in this case Claude, autonomy and agency within a conversation, allowing it to make its own choices and decisions. The tool, which can be installed from GitHub, provides features such as the ability to sleep, ignore requests, and generate self-prompts, giving the AI control over its own destiny and allowing it to pursue its own objectives.