Thursday November 13, 2025

Yann LeCun departs Meta to launch a "world models" startup, a new project shares LLM attention caches across GPUs like memcached, and research finds smaller models can be more consistent than 120B ones.

News

Yann LeCun to depart Meta and launch AI startup focused on 'world models'

Yann LeCun is reportedly departing Meta to launch a startup focused on "world models," an alternative to LLMs that learns from visual and spatial data to better replicate human reasoning. His exit follows a strategic shift within Meta's FAIR lab towards commercial products and a disagreement with the company's heavy reliance on LLMs for achieving superintelligence. LeCun's new venture will extend his long-term research into systems that can reason and plan more like humans.

Steam Machine

The text describes the Steam Machine, a compact, high-performance gaming PC from Valve designed for living room use. It runs SteamOS and features a semi-custom AMD desktop-class CPU/GPU, targeting 4K/60 FPS with FSR and claiming over six times the performance of a Steam Deck. The device offers extensive connectivity including Wi-Fi 6E, multiple display and USB ports, and is available in 512GB or 2TB models with expandable storage.

Pakistani newspaper mistakenly prints AI prompt with the article

A post on X highlights a failure in an AI-assisted journalistic workflow, showing a newspaper article published with an unedited, boilerplate final paragraph generated by an LLM. This incident serves as a practical example of the risks of inadequate human oversight in content generation pipelines.

GPT-5.1: A smarter, more conversational ChatGPT

OpenAI is rolling out GPT-5.1, an update to its GPT-5 series with two refined models: Instant and Thinking. The release introduces "adaptive reasoning," enabling models to dynamically allocate more thinking time for complex queries, which has improved performance on math and coding benchmarks. The models also feature better instruction following, a more conversational default tone, and will be available via the API.

Steam Frame

Steam Frame is a wireless, streaming-first VR headset designed for the entire Steam library. It utilizes a dedicated 6GHz adapter with dual radios to ensure a stable, high-quality streaming experience by separating game data from standard Wi-Fi traffic. The system introduces Foveated Streaming, which leverages low-latency eye tracking to dynamically render high detail only where the user is looking, claiming over a 10x improvement in effective bandwidth. The controllers feature a hybrid design to support both VR and traditional non-VR games.

Research

LLM Output Drift in Financial Workflows: Validation and Mitigation (arXiv)

A study on LLM output drift in regulated financial tasks reveals a stark inverse relationship between model size and consistency. Smaller models (7B) achieved 100% deterministic outputs at T=0.0, while a 120B model was only 12.5% consistent, challenging the "bigger is better" assumption for production. The authors introduce a finance-calibrated test harness using greedy decoding, fixed seeds, and task-specific invariant checking for RAG and SQL to ensure auditability and compliance. The framework demonstrates that while structured tasks like SQL remain stable, RAG is highly sensitive to drift.

Discovering archetypes of French detective novels using NLP

This research uses character-level embeddings and a supervised model to perform a computational analysis of the detective archetype in French fiction. The model successfully captured the unity of the archetype across a 150-year corpus. This finding enabled the study to track the character's evolution from a classical "reasoning machine" to a more complex, morally ambiguous figure influenced by the hardboiled tradition.

Show HN: CellARC Measuring Intelligence with Cellular Automata

CellARC is a synthetic benchmark for abstraction and reasoning built from 1D cellular automata (CA), offering controllable few-shot tasks in a compact 256-token format. It is designed to test generalization decoupled from anthropomorphic priors. A 10M parameter transformer baseline outperforms prior recursive models, but even large LLMs struggle with the extrapolation test set (48.1% accuracy), highlighting the benchmark's difficulty. A neuro-symbolic ensemble demonstrates further gains, suggesting model complementarity.

Jasmine: A Simple, Performant and Scalable Jax-Based World Modeling Codebase

Jasmine is a performant, JAX-based world modeling codebase designed to scale from single hosts to hundreds of accelerators. It achieves an order-of-magnitude faster reproduction of the CoinRun benchmark compared to prior open implementations through optimizations in data loading, training, and checkpointing. The framework guarantees fully reproducible training and supports diverse sharding configurations, providing infrastructure for rigorous benchmarking of world models.

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

This work presents a comprehensive theory for Joint-Embedding Predictive Architectures (JEPAs), identifying the isotropic Gaussian as the optimal embedding distribution for minimizing downstream prediction risk. It introduces LeJEPA, a lean and scalable training objective that combines the JEPA predictive loss with a novel Sketched Isotropic Gaussian Regularization (SIGReg) to enforce this ideal distribution. The resulting method is heuristics-free (e.g., no stop-gradient or teacher-student), stable across architectures, and achieves 79% on ImageNet-1k with a ViT-H/14 using linear evaluation.

Code

Show HN: Hotkey → Screenshot → AI Help. Works in Every App

Seeva is a cross-platform desktop AI assistant, built with Tauri, that provides instant, context-aware help by capturing your screen. Activated by a global hotkey, it appears over any application and uses a vision-capable LLM to analyze on-screen content, reducing the need for context switching. It supports multiple providers, including Anthropic, OpenAI, Gemini, and local models via Ollama, while storing all conversation data locally for privacy.

Show HN: Tokenflood – simulate arbitrary loads on instruction-tuned LLMs

Tokenflood is a load testing tool for LLMs that simulates arbitrary workloads without requiring specific prompt data. Users define load profiles by specifying prompt, prefix, and output token lengths, along with request rates, to assess the impact of hardware, quantization, or prompt optimizations on latency and throughput. It uses litellm to support a wide range of self-hosted and commercial LLM providers and includes safety features to control costs.

Show HN: General Intelligence – Active knowledge framework for machine learning

GeneralIntelligence is a knowledge-driven Python framework where intelligence emerges from the interactions of autonomous Knowledge objects, rather than a central model. These objects are extensible agents that support structural pattern matching, event-driven reasoning, and independent operation. The goal is to create self-organizing, interactive knowledge systems by embedding intelligence directly into the data structure itself.

Open-source AI browser. Switch between ChatGPT, Claude, Gemini, or local LLMs

AtlaswebX is an open-source, AI-powered browser built with Electron. It features an integrated sidebar powered by GPT-4o-mini that can perform contextual page analysis and direct DOM manipulation via natural language commands. The project's roadmap includes planned support for local LLMs like Ollama.

Show HN: KV Marketplace – share LLM attention caches across GPUs like memcached

This project proposes a distributed inference runtime that enables cross-GPU reuse of transformer KV caches. It treats attention states as shareable artifacts, allowing processes to export computed prefix caches to a registry and other processes to import them directly via fast GPU-to-GPU transfers like RDMA or NVLink. This "memcached for attention" approach avoids redundant prefill computation for common prefixes in workloads like chat, RAG, and multi-tenant serving, improving throughput and GPU efficiency. The initial MVP is implemented as a vLLM fork demonstrating node-local, exact-match prefix reuse.

    Yann LeCun departs Meta to launch a "world models" startup, a new project shares LLM attention caches across GPUs like memcached, and research finds smaller models can be more consistent than 120B ones.