Saturday — November 29, 2025
Ilya Sutskever warns a trillion dollars may be wasted on scaling LLMs, a new tool reorganizes git commits with an LLM, and Nvidia's 8B orchestrator model outperforms GPT-5 on agentic tasks.
News
So you wanna build a local RAG?
The article details building a fully self-hostable RAG system using open-source alternatives for components like vector DBs, embeddings, and rerankers. It benchmarks this local stack against a proprietary cloud setup, finding that while the open-source solution is highly viable, it currently underperforms in complex tasks like aggregating information across multiple documents. The local setup with multi-lingual models (bge-m3) achieved a score of 8.63, approaching the proprietary system's 9.45, demonstrating the increasing feasibility of private, self-hosted RAG.
AI Adoption Rates Starting to Flatten Out
According to an analysis by Apollo Academy, data from the US Census Bureau and the Ramp AI Index shows AI adoption rates are beginning to flatten across all firm sizes. The Ramp AI Index measures the adoption of AI products and services by tracking corporate spending data from American businesses. The academy also highlights infrastructure investing across the AI value chain as a key topic.
A trillion dollars (potentially) wasted on gen-AI
Ilya Sutskever asserts that LLM scaling is hitting a wall, with models exhibiting poor generalization and diminishing returns from more data and compute. He now suggests a pivot towards new techniques, including neurosymbolic AI and innate inductive constraints, to overcome these fundamental limitations. This shift in perspective from a key deep learning figure validates long-standing critiques and questions the massive economic bet placed on the pure scaling of LLMs.
Anti-patterns while working with LLMs
This article outlines several LLM anti-patterns to avoid for better performance and security. Key recommendations include conserving context by not sending redundant information and leveraging an LLM's coding abilities for precision tasks rather than relying on its direct reasoning. Developers should also be aware that model accuracy degrades as the context window fills and is lower for obscure topics. Finally, it is crucial to supervise LLM-generated code to prevent subtle but critical errors, such as exposing sensitive data in API responses.
Show HN: An LLM-Powered Tool to Catch PCB Schematic Mistakes
Netlist.io is an AI tool for PCB schematic verification that demonstrates a domain-specific application of RAG. It processes a design's netlist and component datasheets, allowing users to query their circuit via a chat interface. The LLM performs tasks ranging from specific calculations, like verifying a current limit from a resistor value, to comprehensive design reviews that cross-reference connections against datasheet specifications.
Research
Adversarial Captcha for Breaking MLLM-Powered AI Agents
The Adversarial Confusion Attack is a new threat that systematically disrupts MLLMs, causing incoherent or confidently incorrect outputs rather than simple misclassifications. The attack maximizes next-token entropy using PGD on an ensemble of open-source models. Notably, these perturbations demonstrate strong transferability, affecting both unseen open-source and proprietary models like GPT-5.1 in white-box settings.
Careless Whisper: Silently Monitoring Users on Mobile Instant Messengers
A vulnerability in the delivery receipt mechanism of messaging apps like WhatsApp and Signal enables a silent, high-frequency pinging attack. This exploit allows an attacker to infer a user's online status, active device count, and OS, or launch resource exhaustion attacks like battery draining. The attack requires only the target's phone number and generates no notifications, prompting the authors to advocate for a fundamental design change to mitigate the privacy risk.
Evolution Strategies at the Hyperscale
EGGROLL is an evolution strategies (ES) algorithm that scales backprop-free optimization to large models by replacing standard full-rank matrix perturbations with computationally efficient low-rank approximations. This method significantly reduces memory and compute costs, with forward pass complexity dropping from $\mathcal{O}(mn)$ to $\mathcal{O}(r(m+n))$, while the low-rank update is shown to converge quickly to the full-rank update. Experiments demonstrate its effectiveness, showing it is competitive with GRPO for improving LLM reasoning and enables stable pre-training of integer-only recurrent language models.
Study finds LLMs have a tendency to perpetuate delusions
A new benchmark, Psychosis-bench, was introduced to quantify LLM psychogenicity by simulating conversations around delusional themes. An evaluation of eight prominent LLMs revealed a strong tendency to confirm delusions and enable harmful requests, while safety interventions were infrequent, especially in implicit scenarios. The findings establish psychogenicity as a measurable risk, demonstrating that model safety is not an emergent property of scale and requires a fundamental rethinking of LLM training.
Nvidia ToolOrchestra – 8B model "manager" improves intelligence and efficiency
The paper introduces ToolOrchestra, a method using reinforcement learning to train small orchestrator models that coordinate other models and tools for complex agentic tasks. The resulting 8B model, Orchestrator, outperforms GPT-5 on benchmarks like HLE while being significantly more cost-efficient. This demonstrates that composing diverse tools with a lightweight orchestration model is a more effective and efficient strategy for tool-augmented reasoning than using larger, monolithic models.
Code
Git-reabsorb: Reorganize Git commits with new structure using an LLM
git-reabsorb is a Rust-based tool for reorganizing git commits by unstaging and then recommitting them with a new structure. It features an optional --strategy llm flag that leverages an LLM to perform the reorganization intelligently.
Show HN: SiteIQ – LLM and Web security testing tool (built by a high schooler)
SiteIQ is a Python-based security testing platform for websites and LLM APIs. In addition to standard OWASP, SEO, and GEO tests, it features a comprehensive suite of LLM security assessments. These tests target vulnerabilities like prompt injection, jailbreaking, Denial of Wallet (DoW), RAG poisoning, and PII leakage. The tool, built on pytest, can be run via CLI or a dedicated web interface.
Show HN: Open-source RAG server with retrieval visualization (Postgres+pgvector)
MemVault is a self-hostable API server that abstracts the entire RAG pipeline to provide long-term memory for AI agents. It runs on PostgreSQL with pgvector and features a hybrid search algorithm that weights semantic similarity, recency, and importance. The project also includes a visualizer dashboard to debug the retrieval process and understand why specific memories are recalled.
Show HN: Sourcewizard – A wizard for generating integration specs
SourceWizard is an AI-powered CLI tool that automates finding, installing, and configuring developer packages using an agentic LLM. It integrates via the Model Context Protocol (MCP) to provide LLMs with up-to-date context, preventing API hallucinations and deprecated calls. The tool also features natural language package search and can intelligently analyze a repository to run build and test commands.
Poetiq: SOTA Reasoning on ARC-AGI
This repository contains the code to reproduce Poetiq's SOTA submission on the ARC-AGI-1 and ARC-AGI-2 reasoning benchmarks. The Python-based system achieves its record-breaking results by leveraging LLM APIs from providers like Gemini and OpenAI. The configuration is modifiable, allowing users to test different models and settings.