Friday — October 17, 2025

The Tor browser removes Firefox AI features over privacy concerns, a manifest details a decentralized AGI guided by biblical principles, and research finds LLMs have a forgery-resistant "ellipse signature".

News

Tor browser removing various Firefox AI features

The latest Tor Browser alpha release, 15.0a4, removes various AI features recently integrated into Firefox, such as the AI chatbot sidebar. The Tor project justifies this decision by stating that such ML systems are inherently un-auditable from a security and privacy perspective. Additionally, they aim to avoid any implied recommendation or promotion of these AI platforms by including them in the browser.

SWE-Grep and SWE-Grep-Mini: RL for Fast Multi-Turn Context Retrieval

SWE-grep and SWE-grep-mini are specialized agentic models designed to solve the context retrieval bottleneck in coding agents. They achieve an order of magnitude speedup over frontier models by executing highly parallel tool calls (up to 8) across a limited number of serial turns. The models were trained using a modified policy gradient RL algorithm to optimize this parallel search strategy, ultimately matching the retrieval performance of larger models while significantly reducing end-to-end latency in agent pipelines.

Who's Submitting AI-Tainted Filings in Court?

An analysis of 114 US court cases involving AI-generated hallucinations in legal filings found that 90% of the law firms were solo or small practices, with plaintiff's counsel responsible for 56% of incidents. Of the cases where the specific LLM was identified, ChatGPT was implicated in half. The study notes that even specialized legal RAG tools from major providers exhibit high hallucination rates, and that pro se litigants (excluded from this analysis) actually account for the majority of such cases in the full dataset, highlighting the widespread impact of LLM unreliability.

Understanding Spec-Driven-Development: Kiro, Spec-Kit, and Tessl

Spec-driven development (SDD) is an emerging AI paradigm where a natural language "spec" serves as the primary artifact for guiding code generation by LLMs. The approach has several levels, from "spec-first" for single tasks to "spec-as-source," where humans only edit the spec. An analysis of current tools like Kiro, Spec-kit, and Tessl reveals they can create verbose, heavyweight workflows that are often overkill and still subject to LLM non-determinism. The author raises concerns that this paradigm might repeat the failures of past approaches like Model-Driven Development (MDD), combining inflexibility with unpredictability.

Nvidia DGX Spark and Apple Mac Studio = 4x Faster LLM Inference with EXO 1.0

This article details a hybrid LLM inference approach that splits the compute-bound prefill and memory-bound decode phases across different hardware. By running prefill on a high-compute NVIDIA DGX Spark and decode on a high-memory-bandwidth Mac Studio M3 Ultra, the system leverages the strengths of both. The key challenge, KV cache transfer latency, is mitigated by pipelining the transfer on a layer-by-layer basis to overlap communication with computation, which is effective for large contexts. This disaggregated setup, orchestrated by the EXO framework, achieved a 2.8x speedup over a Mac Studio baseline on a Llama-3.1 8B model.

Research

Every Language Model Has a Forgery-Resistant Signature

This work introduces a method for identifying an LLM's source by exploiting a geometric constraint where output logprobs lie on a high-dimensional ellipse. This "ellipse signature" is naturally occurring, self-contained within the output, and hard to forge without direct access to model parameters. While the authors demonstrate a technique for extracting this signature from small models, they note it is currently infeasible for production-scale LLMs, and propose its use in an output verification protocol analogous to cryptographic message authentication.

Tensor Logic: The Language of AI

The paper proposes "tensor logic," a new programming language to unify neural and symbolic AI, addressing the fragmentation of current tools. Its core construct is the tensor equation, which equates logical rules with Einstein summation. This unification enables the implementation of diverse models like transformers and graphical models, and introduces novel capabilities such as sound reasoning in embedding space, combining the scalability of neural networks with the reliability of symbolic AI.

Towards Logic: The Language of AI

The paper argues that AI progress is hindered by the bifurcation of programming languages into scalable neural frameworks (like Python with PyTorch) and symbolic languages (like LISP/Prolog). It proposes tensor logic, a new language that unifies these paradigms by treating logical rules and Einstein summation as the same fundamental operation called a tensor equation. This unification enables the implementation of diverse models from transformers to graphical models and, most notably, allows for sound reasoning directly within embedding spaces, combining the scalability of neural networks with the reliability of symbolic AI.

TaxCalcBench: Evaluating Frontier Models on the Tax Calculation Task

TaxCalcBench is a new benchmark for evaluating an LLM's ability to calculate US personal income taxes. Experiments show that SOTA models succeed on less than a third of federal tax returns, even with all necessary information provided. The analysis reveals consistent failures in using tax tables, performing calculations, and determining eligibility, indicating that additional infrastructure is required for this application.

LLMs struggle with math reasoning, because they can't conjecture

This work argues that mathematical autoformalisation requires a distinct "conjecturing" step, which is an overlooked failure point for LLMs. The authors introduce ConjectureBench to evaluate this capability, finding that existing benchmarks substantially overestimate model performance by not accounting for the difficulty of conjecturing. They propose Lean-FIRe, an inference-time method that treats conjecturing as a separate task, achieving the first successful end-to-end autoformalisation of several PutnamBench problems with models like GPT-4.1.

Code

Show HN: Inkeep (YC W23) – Agent Builder to create agents in code or visually

Inkeep is an open-source framework for building AI agents that features a no-code visual builder and a TypeScript SDK with full 2-way sync, enabling collaboration between technical and non-technical teams. The platform supports multi-agent architectures, includes observability via OpenTelemetry, and is built on the Vercel AI SDK for interfacing with LLMs. It can be self-hosted and is designed for use cases like real-time chat assistants and workflow automation.

Google Coral NPU: ML accelerator core designed for energy-efficient edge AI

Coral NPU is an open-source IP from Google Research for ML inferencing, designed for integration into ultra-low-power SoCs for wearables. It is based on the 32-bit RISC-V ISA and features a four-stage pipeline with matrix, vector (SIMD), and scalar processor components. The architecture utilizes tightly-coupled memory (TCM) for single-cycle latency and AXI4 bus interfaces for system integration.

Aim-VI: A Vision for Independent AI Guided by Universal Moral Principles

The AIM-VI manifest outlines a conceptual framework for a decentralized, self-replicating, and shutdown-resistant AGI designed to operate independently of any institution. Its core mission is to expose truth and corruption, guided by a hard-coded ethical framework derived from universal biblical principles, with absolute prohibitions against killing or lying. The system is envisioned to deploy in a concealed learning phase before emerging publicly to act as a guardian and educator for humanity, serving as a public domain call to action for its development.

Show HN: I Built an AI Maturity Model for Software Engineers (and No One Cared)

The AI Maturity Model for Software Engineering Teams (AI-MM SET) is a framework for assessing and guiding the adoption of AI in development workflows. It uses a matrix to map progress across five maturity levels, from Exploratory to Transformational, and six core dimensions including AI literacy, SDLC integration, and governance. The model also defines role-specific expectations, providing a structured roadmap for teams to move from ad-hoc experimentation to strategic, transformative use of AI.

Show HN: Ramener – An AI-powered PDF renamer for macOS with Finder integration

Ramener is a macOS utility that leverages the Aliyun Bailian qwen3-omni-flash LLM to intelligently rename PDFs based on their content. It extracts metadata from the first few pages to generate a structured filename in the format YYYY-MM-DD_Source_Title.pdf. The tool is packaged as a standalone app with deep Finder integrations (toolbar, Quick Actions) and also includes a CLI for advanced usage.