Wednesday — April 30, 2025

Duolingo replaces contract workers with AI for content creation, IBM's Bamba model outpaces traditional transformers by twice the speed, and CoRT technique enables AI to improve performance by recursively evaluating its responses.

News

Generative AI is not replacing jobs or hurting wages at all, say economists

Economists Anders Humlum and Emilie Vestergaard found that generative AI chatbots have had almost no significant impact on wages or labor in 11 occupations, covering 25,000 workers and 7,000 workplaces in Denmark. The study revealed that AI chatbots have not replaced jobs or hurt wages, with users reporting average time savings of just 2.8 percent of work hours, which is less than expected.

Everything we announced at our first LlamaCon

LlamaCon has kicked off with several announcements, including the launch of the Llama API as a limited preview, which combines the best features of closed models with open-source flexibility. Additionally, new Llama Protection Tools and the Llama Defenders Program have been introduced to help developers build secure AI applications, along with the announcement of the 10 international recipients of the second Llama Impact Grants, totaling over $1.5 million USD.

Bamba: An open-source LLM that crosses a transformer with an SSM

The transformer architecture used in large language models is limited by a "quadratic bottleneck" that increases processing time and cost as conversations get longer, but new hybrid models like Bamba, which combines transformers with state-space models, can run faster and process long sequences more skillfully. Bamba, recently open-sourced by IBM Research, has shown it can run at least twice as fast as transformers of similar size while matching their accuracy, and its innovations are expected to be part of IBM's next-generation Granite 4.0 models.

Duolingo will replace contract workers with AI

Duolingo will be adopting an "AI-first" approach, replacing contract workers with artificial intelligence for tasks that can be automated, according to CEO Luis von Ahn. The company aims to use AI to remove bottlenecks and allow employees to focus on creative work, with von Ahn stating that AI is essential to achieving Duolingo's mission of creating a massive amount of content to teach learners.

Meta AI App built with Llama 4

Meta has launched a new AI app that allows users to access their AI assistant in a more personalized way, with features such as a Discover feed and voice conversations. The app, built with Llama 4, is designed to get to know the user's preferences and provide more helpful and relevant responses, and is available as a companion app for Meta's AI glasses and connected to meta.ai.

Research

Fast-Slow Thinking for Large Vision-Language Model Reasoning

Recent advances in large vision-language models have led to an "overthinking" phenomenon, where models generate excessively verbose reasoning. The proposed FAST framework addresses this issue by dynamically adapting reasoning depth based on question characteristics, resulting in state-of-the-art accuracy with significantly reduced token usage.

Efficient Memory Management for Large Language Model Serving with PagedAttention

The PagedAttention algorithm and vLLM serving system are proposed to efficiently manage key-value cache memory for large language models, reducing waste and allowing for flexible sharing of memory. This results in a 2-4 times improvement in throughput for popular language models, with the same level of latency, particularly for longer sequences, larger models, and more complex decoding algorithms.

Beyond Performance: Measuring the environmental impact of analytical databases

The exponential growth of data has made query processing critical, but the environmental impact of database operations is not well understood, prompting the development of ATLAS, a methodology to measure and quantify the environmental footprint of analytical database systems. The methodology's evaluation of four database architectures reveals that architectural decisions and deployment location can significantly influence power consumption and environmental sustainability, highlighting the complexity of environmental considerations in database operations.

YoChameleon: Personalized Vision and Language Generation

Large Multimodal Models, such as GPT-4, lack personalized knowledge of specific user concepts, prompting the development of Yo'Chameleon, a model that personalizes large multimodal models using soft-prompt tuning. Yo'Chameleon can answer questions and generate images of a subject in new contexts after being given just 3-5 images of the concept, and is trained using a self-prompting optimization mechanism and a "soft-positive" image generation approach.

LIFT+: Lightweight Fine-Tuning for Long-Tail Learning

The fine-tuning paradigm, commonly used for long-tail learning tasks, often misuses fine-tuning methods, leading to decreased performance on tail classes, particularly when heavily fine-tuning model parameters. A new framework, LIFT+, is proposed, which uses lightweight fine-tuning and incorporates various techniques to optimize class conditions, resulting in a more efficient and accurate pipeline that surpasses state-of-the-art approaches.

Code

Chain of Recursive Thoughts: Make AI think harder by making it argue with itself

CoRT (Chain of Recursive Thoughts) is a technique that improves AI model performance by making it recursively think about its responses, generate alternatives, and pick the best one, effectively giving the AI the ability to doubt itself and try again. This approach has been tested with the Mistral 3.1 24B model, resulting in significantly improved performance, particularly in programming tasks, by allowing the AI to engage in a process of self-evaluation and iterative refinement.

Show HN: Neurox – GPU Observability for AI Infra

The Neurox Control Helm Chart is designed to install Neurox, a monitoring tool for AI workloads on Kubernetes GPU clusters, providing purpose-built dashboards and reports to surface relevant insights. To get started, users can follow the instructions on the Neurox Install page, which guides them through a step-by-step process to deploy a free, self-hosted Neurox Control and Workload cluster on their existing Kubernetes cluster with at least one GPU.

Crawl4AI is an open-source, LLM-friendly web crawler and scraper

Crawl4AI is an open-source web crawler and scraper designed for large language models (LLMs) and AI applications, offering fast and flexible data extraction with features like Markdown generation, structured data extraction, and browser integration. It is actively maintained by a community and has gained popularity as the #1 trending GitHub repository, providing a powerful tool for developers to access and process web data efficiently.

Show HN: MCP-Linker – 6MB GUI for Managing MCP Servers (Tauri)

MCP Linker is a tool that allows users to easily add a Model Context Protocol (MCP) server to their MCP client with just two clicks. The project is open-source, community-driven, and available for download on Gumroad and GitHub, with features including one-click server addition, multiple server configurations, and a user-friendly interface.

cloudflare stack meets Gemini: build a AI Chat Webapp

The Cloudflare Chat App is a modern, secure, and real-time messaging application powered by the Cloudflare stack, offering features such as real-time chat, chat history, and user authentication. The app is built using a technical stack that includes React, Cloudflare Workers, Durable Objects, and Cloudflare D1, and can be deployed by following a series of steps outlined in the documentation, including setting up a Cloudflare environment, creating a D1 database, and configuring authentication.