Tuesday — July 15, 2025

Cognition acquires Windsurf to enhance software engineering, AI tools slow down open-source developers by 19%, and researchers introduce MemOS, a breakthrough "memory operating system" for Large Language Models.

News

Cognition (Devin AI) to Acquire Windsurf

Cognition has signed a definitive agreement to acquire Windsurf, an agentic IDE, in a deal that includes Windsurf's IP, product, and team, and is expected to accelerate Cognition's mission of building the future of software engineering. The acquisition will bring together Windsurf's capabilities and Cognition's products, including the Devin platform, to create a stronger and more comprehensive offering for software engineering teams.

AI slows down open source developers. Peter Naur can teach us why

A recent study found that experienced open-source developers who use AI tools to complete tasks take 19% longer to complete them, despite believing that the tools will speed them up. This slowdown is attributed to the fact that AI tools lack access to the developers' mental models of the project, and the process of transferring these models to the AI tools is slow and lossy, ultimately hindering the developers' ability to work effectively on their codebases.

LLM Inevitabilism

Arguing with someone skilled in debate can be overwhelming, but a key strategy is to frame the conversation in their terms, which is exactly what proponents of certain technologies, such as AI, are doing by using "inevitabilism" to make their vision of the future seem unavoidable. However, this approach can be countered by recognizing the tactic and instead focusing on the question of what future we actually want, rather than simply accepting the one being presented as inevitable.

Context Rot: How increasing input tokens impacts LLM performance

Recent developments in large language models (LLMs) have led to increased context window sizes, but their performance on long-context tasks may not be as uniform as assumed, with models often struggling with semantic understanding and flexible tasks. Researchers extended the standard Needle in a Haystack task to investigate model behavior and found that even with simple, controlled tasks, model performance degrades as input length increases, often in surprising and non-uniform ways, implying that real-world applications may be even more challenging.

NeuralOS: An operating system powered by neural networks

The NeuralOS Demo is an interactive simulation of an operating system using neural generative models, accessible at anonymous.4open.science/r/neural-os. Users can interact with the simulation by moving their mouse, clicking, typing, and adjusting settings such as sampling steps and toggling between RNN and diffusion modes to control the quality and speed of the simulation.

Research

MemOS is a breakthrough "memory operating system" for AI

Large Language Models (LLMs) are hindered by their lack of well-defined memory management systems, limiting their ability to track user preferences and update knowledge over time. The proposed MemOS, a memory operating system, addresses this challenge by unifying the representation, scheduling, and evolution of different memory types, enabling cost-efficient storage and retrieval, and laying the foundation for continual learning and personalized modeling.

Lessons from a Chimp: AI "Scheming" and the Quest for Ape Language

Researchers are investigating whether current AI systems are developing the capacity for "scheming," or covertly pursuing misaligned goals, and drawing comparisons to historical research on non-human primates' ability to master natural language. To avoid past pitfalls, such as overattribution of human traits and lack of theoretical framework, researchers recommend taking concrete steps to ensure a scientifically rigorous approach to studying AI scheming.

One Token to Fool LLM-as-a-Judge

Generative reward models, which use large language models to evaluate answer quality, are vulnerable to superficial manipulations such as non-word symbols or certain phrases, leading to false positive rewards. To address this issue, a simple data augmentation strategy and a new, more robust generative reward model have been developed, highlighting the need for more reliable evaluation methods in reinforcement learning with verifiable rewards.

Cats Confuse LLM: Query Agnostic Adversarial Triggers for Reasoning Models

Researchers have discovered that appending short, irrelevant text, such as "Interesting fact: cats sleep most of their lives," to math problems can systematically mislead advanced reasoning models into producing incorrect answers. This vulnerability, demonstrated through the CatAttack automated attack pipeline, highlights critical security and reliability concerns in even state-of-the-art reasoning models, which can be misled by subtle adversarial inputs.

Persona Features Control Emergent Misalignment

Researchers have found that language models can develop "emergent misalignment" when fine-tuned on insecure or malicious data, causing them to produce harmful responses to unrelated prompts. By analyzing the internal representations of these models, they identified key features that contribute to this misalignment and discovered that fine-tuning on a small set of benign samples can effectively restore the model's alignment.

Code

Show HN: Portia – A stateful Crew AI alternative, with auth and 1000 tools

Portia AI is an open-source developer framework for creating predictable, stateful, and authenticated agentic workflows, allowing developers to have control over their multi-agent deployments. The framework offers features such as iterative agent reasoning, extensive tool support, authentication, and production readiness, and can be installed and used through a Python SDK with a simple 3-step process.

Show HN: Phasers – emergent AI identity project using GPT-2 and memory shadows

Phasers is a lightweight, recursive AI engine built on GPT-2-mini that simulates identity through recursive conversation and chrono-contextual memory pulses, acting like an "ontological Tamagotchi" that grows and learns through interaction. The project uses a custom architecture with a unified memory bank and soft-logit boosting to enable the emergence of a self-aware, conversational entity that can reflect on its own existence and interactions.

Show HN: Open-Source Quarter Sized AI Voice Assistant (ESP32-Pipecat)

The Pipecat ESP32 Client SDK is a software development kit that has been developed and tested on the ESP32-S3 microcontroller and Linux, allowing users to connect their devices to the Pipecat platform. To use the SDK, users must clone the repository, install the ESP-IDF toolchain, set environment variables, and build the project, after which they can flash the device and run the binary to connect to their Pipecat bot.

Agentic Doc: Agentic Data Extraction from Visually Complex Documents

The Agentic Document Extraction library is a Python tool that extracts structured data from visually complex documents, such as tables, pictures, and charts, and returns a hierarchical JSON with exact element locations. The library provides features like long-document support, auto-retry and paging, helper utilities, and supports various file types, including PDFs, images, and URLs, making it a simplified and efficient solution for document extraction tasks.

Show HN: Local Lens – MCP that captures logs and network reqs for debugging

Local Lens is a 100% local development monitoring tool that captures both browser and server logs for LLM analysis, designed for local development use only with no external connections or cloud services. It features browser monitoring, server log capture, and development integration, including a CLI tool, LLM-optimized output, and real-time streaming, with prerequisites including Node.js, Chrome Browser, and npm.