Sunday — December 21, 2025
AI's long-task completion horizon doubles every 7 months, DeepMind proposes "patchwork AGI" safety with virtual sandbox economies, and TimeCapsule LLM trains models exclusively on 19th-century text.
News
Reflections on AI at the End of 2025
Initial skepticism regarding LLMs as "stochastic parrots" has largely diminished, with evidence suggesting they possess internal representations. Chain of Thought (CoT) is now a fundamental method to improve LLM output, functioning as internal search and learning to converge to useful replies, without fundamentally altering the underlying architecture. Reinforcement learning with verifiable rewards is anticipated to be the next major advancement, enabling continued scaling and progress in tasks like program optimization. While some researchers explore alternatives to Transformers, the author posits that current differentiable LLMs could still achieve AGI by approximating discrete reasoning, a capability increasingly validated by benchmarks like ARC.
Show HN: HN Wrapped 2025 - an LLM reviews your year on HN
HN Wrapped is a service that summarizes a user's activity on Hacker News, providing insights into trends and predictions. It is developed by Kadoa, a company leveraging AI agents for web data.
Measuring AI Ability to Complete Long Tasks: Opus 4.5 has 50% horizon of 4h49M
METR proposes measuring AI performance by the "length" of tasks AI agents can reliably complete, defined by human expert completion time. Their research shows this capability has been exponentially increasing for the past 6 years, doubling approximately every 7 months. Extrapolating this trend suggests that within a decade, frontier AI systems could autonomously handle tasks requiring weeks or months of human effort, providing a more practical metric for forecasting real-world AI impact.
AI will make our children stupid
The article contends that AI poses an existential threat to intelligence by enabling complete cognitive offloading, allowing children to outsource thinking entirely. It argues that this process, particularly in tasks like writing, detaches understanding from creation, fostering cognitive laziness and eliminating the essential "friction" required for genuine learning, thereby risking a "stupidogenic" society.
School security AI flagged clarinet as a gun. Exec says it wasn't an error
An AI security system, ZeroEyes, triggered a school lockdown after misidentifying a student's clarinet as a rifle, with human review failing to prevent police response. This incident highlights concerns about high false positive rates in AI detection systems, with prior instances of misidentifying benign objects. Critics label these expensive, unproven AI tools as "security theater," citing their potential to cause undue stress and desensitize responders, while vendors prioritize a "better safe than sorry" stance. Despite these limitations and a lack of transparent efficacy data, schools are expanding deployment, prompting questions on responsible AI application and resource allocation in critical safety.
Research
Signaling in the Age of AI: Evidence from Cover Letters
Research on an online labor platform shows an AI-powered cover letter tool increased textual alignment and callback rates. While editing AI-generated drafts correlated with hiring success, the tool reduced cover letters' signal content, evidenced by a 51% drop in the correlation between alignment and callbacks. Consequently, employers shifted to alternative signals like prior work histories.
Distributional AGI Safety (DeepMind)
Traditional AI safety research primarily focuses on safeguarding individual AI systems for a monolithic AGI, overlooking the "patchwork AGI" hypothesis where general capabilities emerge from coordinated sub-AGI agents. Given the rapid deployment of advanced AI agents with tool-use and coordination abilities, this paper argues for urgent consideration of patchwork AGI safety. It proposes a framework for distributional AGI safety centered on virtual agentic sandbox economies, governed by robust market mechanisms, auditability, reputation management, and oversight to mitigate collective risks.
Multicell-Fold: geometric learning in folding multicellular life
A geometric deep learning model is proposed to predict multicellular folding and embryogenesis, addressing the challenge of understanding complex cellular interactions. The model utilizes a unified graph data structure, integrating cellular interactions and cell junction networks, to capture convoluted spatial dynamics. It achieves interpretable 4-D morphological sequence alignment and predicts local cell rearrangements at single-cell resolution. This work reveals that cell geometries and junction networks jointly regulate morphogenesis, offering a novel paradigm for developmental biology.
Code
MIRA – An open-source persistent AI entity with memory
MIRA is an open-source, self-directed digital entity designed for continuous, persistent LLM interaction, leveraging asynchronous processing and active context window manipulation. It features an autonomous memory system where information decays unless referenced, complemented by non-decaying "domaindoc" content for long-form data and persona, with built-in token management. An event-driven architecture enables dynamic, self-contained tool activation and deactivation, allowing MIRA to manage complex, long-horizon tasks without human intervention, with a strong preference for Claude Opus 4.5.
Open Source Historical LLM trained exclusively on 19th century text
The TimeCapsule LLM project trains language models from scratch exclusively on historical data from specific time periods and locations, such as 1800-1875 London. This Selective Temporal Training (STT) aims to eliminate modern bias and accurately emulate the linguistic style, vocabulary, and worldview of a past era, rather than fine-tuning pre-trained models. Early iterations, built on nanoGPT and Phi 1.5, demonstrate evolving capabilities from era-accurate but incoherent language to recalling specific historical events, while addressing challenges like factual hallucination and tokenization issues with increasing dataset sizes up to 90GB.
Open-sourced Jarvis: free,local alternative to Wispr Flow(230 stars in 2 weeks)
Jarvis AI Assistant is a free, 100% open-source, and local-capable voice dictation application for macOS and iOS. It leverages local Whisper models for offline transcription and supports local LLMs via Ollama for advanced text generation, rephrasing, and grammar correction, ensuring privacy. Users can also opt for cloud services like Deepgram and Gemini for speed and accuracy.
Show HN: I vibe-coded a working macOS driver for an obsolete laser engraver
This project delivers an unofficial, community-developed native macOS CUPS printer driver for Epilog Zing laser engravers, enabling direct printing from any macOS application. It's a Swift port of the LibLaserCut Java driver, offering features like standard and 3D greyscale raster engraving, HPGL vector cutting, and multiple DPI resolutions (100-1000 DPI). The driver supports Epilog Zing 16 and 24 models, is a universal binary for Apple Silicon and Intel Macs, and utilizes the LPD protocol for network communication.
Up to date free LLM resources
This resource compiles free and startup-friendly LLM APIs, segmenting them into completely free tiers, trial credit offerings, and dedicated startup credit programs. Completely free providers like OpenRouter offer extensive model variety, Groq excels in ultra-fast inference, and Cerebras supports high-volume usage. Trial credits are available from platforms such as Baseten and AI21, while major cloud providers (AWS, Google Cloud, Azure) and AI platforms (Together AI, Anthropic) provide significant credits for eligible startups. The guide stresses these resources are for development and research, recommending budgeting for paid LLM usage in production.