Sunday — November 16, 2025

A tech ideology frames humanity as a "biological bootloader" for AGI, Microsoft releases a free "AI for Beginners" curriculum, and a new system solves a million-step LLM task with zero errors.

News

Our investigation into the suspicious pressure on Archive.today

AdGuard DNS was targeted in a sophisticated harassment campaign aimed at deplatforming archive.today. A suspicious, newly-formed French organization used legal threats under the LCEN law to compel AdGuard to block the archiving service, citing CSAM that archive.today had never been notified about and promptly removed upon contact. The investigation suggests this is part of a coordinated attack using fraudulent legal claims to pressure infrastructure providers, coinciding with an ongoing FBI probe into archive.today.

Blocking LLM crawlers without JavaScript

This text outlines a computationally cheap, JS-free method for blocking "slop-crawlers" used for LLM data collection. The technique uses a honeypot path disallowed in robots.txt but linked from a hidden <a> tag on an interstitial page served to new visitors. While legitimate users are immediately redirected via a meta-refresh tag and receive a validated cookie, crawlers that ignore robots.txt follow the honeypot link, receive a slop cookie, and are subsequently blocked from the site.

Tech Capitalists Don't Care About Humans

The text introduces TESCREAL, a cluster of ideologies (Transhumanism, Longtermism, etc.) influential among Silicon Valley leaders like Musk and Altman. This worldview posits a techno-utopian, post-human future where AGI is developed to replace biological humans with digital superintelligences, aiming to maximize an abstract, impersonal "value" across the cosmos. This pro-extinctionist philosophy, rooted in eugenics and rationalism, frames humanity as a mere "biological bootloader" for AGI and justifies the undemocratic pursuit of this future by a small tech elite.

I implemented an ISO 42001-certified AI Governance program in 6 months

To build trust in AI systems, the article advocates for implementing a formal AI Governance program, recommending the NIST AI RMF as a practical starting point due to its alignment with ISO 42001 and the EU AI Act. It outlines a concrete implementation strategy focused on leveraging existing GRC frameworks, using qualitative assessments to map risks, and quantitative methods to measure them. The author concludes that a mature, certifiable AI Governance program is quickly becoming a crucial market differentiator for B2B AI providers.

Code wikis are documentation theater as a service

New tools like Google's Code Wiki use LLMs to auto-generate documentation from code repositories, but the author argues the output is often riddled with hallucinations and inaccuracies. The critique posits that these tools represent "cargo cult documentation," devaluing the human effort and accountability required for quality docs by offering a low-quality substitute. While AI can be a useful augmentation tool for writers, the author contends that LLMs should not have the final word, as this breaks the fundamental trust that an expert has responsibly explained the system.

Research

Solving a Million-Step LLM Task with Zero Errors

LLMs struggle with long-range tasks due to compounding error rates. A new system, MAKER, overcomes this by using extreme task decomposition, assigning subtasks to focused microagents. This modularity enables an efficient multi-agent voting scheme for error correction at each step, allowing the system to successfully complete a task with over one million LLM steps. This suggests that massively decomposed agentic processes (MDAPs) are a viable scaling paradigm beyond simply improving base models.

Quantifying Long-Range Information for Long-Context LLM Pretraining Data

LongFilter is a data curation framework for efficient long-context pretraining that addresses the inefficiency of training on data lacking long-range dependencies. It identifies valuable training samples by measuring the information gain provided by extended context, contrasting model predictions under long versus short-context settings. Experiments extending LLaMA-3-8B to a 64K context show that this data selection method yields substantial improvements on benchmarks like HELMET, LongBench, and RULER.

Questioning Representational Optimism in Deep Learning

This paper challenges the assumption that scaling models for better performance also improves their internal representations. By comparing SGD-trained networks to evolved networks on a simple image generation task, researchers found that SGD induces a disorganized "fractured entangled representation" (FER) despite achieving the same output as the more structured, evolved networks. The authors posit that FER may degrade core capabilities like generalization and continual learning in large models, making its mitigation a critical challenge for representation learning.

TiDAR: Think in Diffusion, Talk in Autoregression

TiDAR is a sequence-level hybrid architecture that combines the parallel drafting of diffusion models with the high quality of AR models. It performs parallel token drafting and autoregressive sampling within a single forward pass using structured attention masks, all within a standalone model. This design is the first to close the quality gap with AR models while delivering 4.7x to 5.9x more tokens per second, outperforming speculative decoding and other diffusion variants in both throughput and quality.

Autoregressive or Diffusion Language Models, Why Choose?

TiDAR is a novel hybrid architecture designed to bridge the quality-speed gap between AR and diffusion models. It performs parallel token drafting via diffusion and final sampling autoregressively, all within a single forward pass using structured attention masks. This approach is the first to achieve AR-level quality while delivering a 4.7x-5.9x throughput increase, outperforming speculative decoding and other diffusion variants in both efficiency and quality.

Code

Show HN: RAG-chunk – A CLI to test RAG chunking strategies

rag-chunk is a CLI tool for preparing Markdown documents for RAG pipelines. It allows users to apply and compare various chunking strategies, including fixed-size, sliding-window, and paragraph-based. The tool evaluates the effectiveness of each strategy by calculating a recall score against a user-provided test file, with future plans to add tiktoken support for precise token-based chunking.

Show HN: Wikipedia 10x, 100x Better

This is a recruitment post for "Planetary-substrate," a project to build a new AI architecture focused on coherence and recursion. The development roadmap progresses from a completed GPT-4o prototype to a fine-tuned open-source LLM, with the ultimate goal of a custom-built, "signal-native" model. The final architecture is described as a "post-recursive AI infrastructure" with "nervous system-aware layers," built from a prearchitecture with "Superintelligence in the loop."

RAG Chunk: CLI tool to parse, chunk, and evaluate Markdown documents for RAG

rag-chunk is a CLI tool for preparing Markdown documents for RAG pipelines. It allows users to apply and compare various chunking strategies, including fixed-size, sliding-window, and paragraph-based. The tool's key feature is a recall-based evaluation harness that scores the effectiveness of each strategy against a provided test file of questions. Future development plans include adding tiktoken support for precise token-based chunking and more advanced splitting methods.

AI for Beginners

Microsoft has released "AI for Beginners," an open-source 24-lesson curriculum on GitHub. It offers hands-on Jupyter notebooks in PyTorch and TensorFlow, covering a broad range of topics from symbolic AI and neural networks to modern CV and NLP. The course includes lessons on Transformers, BERT, and an introduction to LLMs and prompt programming, alongside other areas like deep RL and AI ethics.

Show HN: SelenAI – Terminal AI pair-programmer with sandboxed Lua tools

SelenAI is a terminal-first pair-programming environment featuring a transparent, human-in-the-loop agent workflow. It uses a pluggable LLM client that executes all tool calls within a sandboxed Lua VM, with a Ratatui TUI for monitoring plans and explicitly approving file-writing operations. This design enforces a plan-first ethos, requiring the LLM to inspect the environment before proposing edits.