Thursday November 6, 2025

OpenAI seeks U.S. loan guarantees for a $1T expansion, an LLM agent reverse-engineers web apps into automations, and an AI scientist automates six months of human research in a single run.

News

What Happened to Piracy? Copyright Enforcement Fades as AI Giants Rise

The tech industry has reversed its long-standing, aggressive stance on copyright enforcement, now allegedly using pirated content on a massive scale to train LLMs. Lawsuits reveal that major AI developers like Meta have knowingly sourced training data from notorious piracy sites such as Library Genesis. This shift has been met with a lack of federal prosecution, leaving enforcement to civil litigation by rights holders.

OpenAI asks U.S. for loan guarantees to fund $1T AI expansion

OpenAI is seeking U.S. federal loan guarantees to help finance a massive AI infrastructure expansion that could exceed $1 trillion. According to its CFO, this strategy aims to lower borrowing costs and attract a wider range of capital by protecting lenders from default. The move highlights the escalating capital intensity of AI development, with planned outlays for projects like the "Stargate" data center far exceeding the company's projected revenues. An IPO is not currently being considered.

Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer

Brain-IT is a new SOTA method for reconstructing images from fMRI data using a Brain Interaction Transformer (BIT). The model maps voxels to shared, cross-subject functional clusters to predict two complementary image features: low-level structural features to initialize a diffusion process and high-level semantic features to guide it. This brain-inspired architecture leads to more faithful reconstructions and is highly data-efficient, achieving results with 1 hour of fMRI data that are comparable to prior methods trained on 40 hours.

Tinder is testing an AI feature that learns about you from your Camera Roll

Tinder is testing an AI feature called "Chemistry" that analyzes users' Camera Roll photos and questionnaire answers to infer interests and personality for improved matchmaking. The company is also using an LLM-powered system to moderate messages by nudging users before they send potentially offensive content.

The AI Ick

The text explores the negative human perception of AI-generated content, often identified by stylistic tells that LLMs ironically learned from human writing in their training data. This output is widely perceived as "soulless" and lacking authorial intent, causing people to devalue it even when it's indistinguishable from human work. The article also asserts that AI detection tools are fundamentally unreliable and biased, frequently misclassifying human writing and posing significant ethical challenges for their use.

Research

Kosmos: An AI Scientist for Autonomous Discovery

Kosmos is an AI scientist that automates data-driven discovery, overcoming the coherence limitations of prior agents. Its core innovation is a structured world model that facilitates information sharing between a data analysis agent and a literature search agent over hundreds of rollouts. This allows the system to execute tens of thousands of lines of code and read over a thousand papers in a single run to generate traceable scientific reports. The system has produced novel and reproduced discoveries, with collaborators equating a single run to six months of human research.

Pre-training under infinite compute

This paper investigates LLM pre-training in a data-constrained, compute-rich setting, finding that standard methods overfit. The authors show that aggressive regularization, specifically a weight decay 30x larger than standard, enables monotonic loss improvement with parameter scaling. Ensembling independently trained models further lowers the achievable loss asymptote, and the combined techniques yield a 5.17x data efficiency improvement. These gains can be distilled into a student model 8x smaller while retaining 83% of the benefit and are shown to generalize to downstream tasks.

Fast GPU bounding boxes on tree-structured scenes via the bracket matching stack

This paper presents a fast, parallel GPU algorithm for computing bounding boxes in tree-structured scenes, a task that is difficult to parallelize efficiently. The core contribution is a novel solution to the parentheses matching problem, which is mapped from a PRAM abstraction to modern GPU hardware using compute shaders. The method achieves significant speedups over sequential CPU approaches, reaching a high fraction of theoretical GPU throughput, and is generalizable to other domains like parsing.

The Physics of News, Rumors, and Opinions

This review proposes a statistical physics framework for modeling the complex dynamics of modern information ecosystems, including phenomena like misinformation cascades and opinion polarization. It covers foundational concepts from complex networks and social dynamics, such as epidemic and spin models, and applies them to analyze information spreading and opinion formation. The work synthesizes theoretical models with empirical findings from large-scale data analytics to provide insights into these high-impact societal issues.

Continuous Autoregressive Language Models

Continuous Autoregressive Language Models (CALM) are proposed to overcome the token-by-token generation bottleneck in LLMs. The model uses a high-fidelity autoencoder to compress a chunk of K tokens into a single continuous vector, shifting the paradigm from next-token to next-vector prediction. This approach reduces the number of generative steps by a factor of K and is trained using a novel likelihood-free framework. Experiments demonstrate that CALM significantly improves the performance-compute trade-off compared to strong discrete baselines.

Code

Show HN: I was tired of ROS2, so I rewrote it in Rust

HORUS is a production-grade robotics framework built in Rust, designed for real-time performance and memory safety as a faster alternative to ROS2. It achieves sub-microsecond IPC latency and high throughput using a lock-free, zero-copy shared memory architecture. The framework features a unified CLI for project management and supports multi-language development in Rust, Python, and C++.

Show HN: sudocode – manage specs, tasks, and context-as-code for coding agents

sudocode is a lightweight, git-native context management system for coding agents that lives directly in your repository. It enables agents to track context over long-horizon tasks and collaborate by treating the git repo as a distributed context database, capturing user intent as specs and agent activity as issues. The system uses a 4-tiered abstraction model (Spec, Issue, Agent, Artifact) to represent context from high-level requirements down to code diffs. Key features include a dual human-readable (Markdown) and machine-optimized (JSONL/SQLite) format, bidirectional linking, and a graph-based structure for planning and managing dependencies.

Fantasy – Build AI agents with Go. Multi-provider, multi-model, one API

Fantasy is a Go library for building AI agents with a unified, multi-provider, and multi-model API. It allows developers to equip agents with custom tools and compiles to native machine code. The project is currently in preview and lacks support for multi-modal models or provider-side tools like web search.

Show HN: Reverse Engineer Web Apps

web-hacker is a tool that uses an LLM agent to reverse-engineer web applications into automations. It monitors user actions via the Chrome DevTools Protocol to capture network traffic and browser state. Based on this capture and a task description, the agent generates a portable JSON "Routine" that defines the API flow using sequential operations like navigate and fetch, effectively creating a private API for any web app.

Show HN: Dynamic Code Execution with MCP: A More Efficient Approach

The aiter-app project demonstrates a dynamic, in-memory approach for LLM agents to perform code execution with MCP servers, eliminating the filesystem overhead of generating and managing individual tool files. Its codex-mcp agent discovers tools and executes code snippets directly from the live MCP connection at runtime, enabled by the Vercel AI SDK. While this simplifies tool management, the project also highlights a core MCP protocol limitation: the absence of enforced output schemas, which complicates chaining tool calls.

    OpenAI seeks U.S. loan guarantees for a $1T expansion, an LLM agent reverse-engineers web apps into automations, and an AI scientist automates six months of human research in a single run.