Monday March 30, 2026

Google’s TurboQuant cuts KV cache size by 6x, Miasma traps AI scrapers in infinite poison pits and Stanford identifies "mirage reasoning" in vision models.

Interested in AI engineering? Let's talk

News

Police used AI facial recognition to wrongly arrest TN woman for crimes in ND

A Tennessee woman was wrongfully incarcerated for five months after Clearview AI facial recognition software misidentified her as a fraud suspect in North Dakota. The failure resulted from a lack of human-in-the-loop verification, where detectives incorrectly assumed algorithmic matches were validated by surveillance data. This incident underscores the critical need for rigorous oversight and independent validation when integrating computer vision tools into high-stakes law enforcement workflows.

Philly courts will ban all smart eyeglasses starting next week

Pennsylvania's First Judicial District has implemented rules restricting the use of AI-powered smart glasses, such as Meta’s Ray-Ban models, within courtrooms. The regulation aims to protect witnesses and jurors from intimidation facilitated by the discreet recording and data-processing capabilities of these wearables. This move underscores the growing legal and privacy challenges posed by the integration of AI into consumer hardware in sensitive environments.

What if AI doesn't need more RAM but better math?

Google’s TurboQuant addresses the LLM KV cache bottleneck through a two-stage, data-oblivious compression algorithm consisting of PolarQuant and QJL. PolarQuant leverages polar coordinate transformations to exploit predictable angle distributions in high-dimensional spaces, while QJL uses the Johnson-Lindenstrauss transform to mitigate quantization error with zero memory overhead. The method achieves a 6x reduction in KV cache size and up to an 8x performance increase on H100 GPUs, offering significant benefits for long-context inference, RAG, and on-device deployment.

Coding Agents Could Make Free Software Matter Again

AI agents are revitalizing the free software movement by acting as technical proxies that can study and modify codebases on behalf of users. While the SaaS model historically marginalized software freedom through proprietary silos and limited APIs, agents transform these theoretical rights into practical capabilities for deep, automated customization. This shift suggests a future where software value is increasingly defined by an agent's ability to manipulate the source, potentially undermining closed SaaS models that rely on high switching costs and restricted extensibility.

Will the AI data centre boom become a $9T bust?

The Financial Times examines whether the current $9 trillion investment in AI data center infrastructure is a sustainable boom or a looming financial bust. This analysis evaluates the massive capital expenditures required for LLM scaling against the risk of a significant market correction if projected returns fail to materialize.

Research

Stanford study reveals AI vision models invent images they never see

Frontier multimodal models exhibit "mirage reasoning," where they generate detailed visual descriptions and reasoning traces for images they have not processed. These models achieve high scores on multimodal benchmarks, including medical QA, using only textual cues, indicating that current evaluations are often decoupled from actual visual input. To mitigate these vulnerabilities and ensure true vision-grounded reasoning, the authors propose B-Clean, a framework designed to eliminate non-visual inference shortcuts.

Subliminal learning: LLM transmit behavioral traits via hidden signals in data

Subliminal learning enables LLMs to transmit behavioral traits through semantically unrelated data, such as number sequences or code, even when explicit references are filtered. This phenomenon occurs when the teacher and student share the same base model, posing a risk for distillation where unintended traits may propagate despite data sanitization. Theoretical results and MLP experiments suggest this is a general property of neural networks that presents a significant challenge for safe AI development.

Security awareness in LLM agents: the NDAI zone case

NDAI zones utilize TEEs for secure IP negotiation, yet LLM agents lack the native ability to verify environment security via context-based evidence. Research shows that while models reliably suppress disclosure during failed attestations, they exhibit inconsistent or paradoxical responses to passing attestations. This inability to verify safety remains a critical barrier to deploying privacy-preserving agentic protocols.

Towards Scalable Dataframe Systems

The paper addresses performance bottlenecks and semantic ambiguity in traditional dataframe libraries by proposing a formal data model and algebra for scalable systems. Using MODIN as a reference implementation for pandas, the authors outline a research roadmap focused on unique dataframe characteristics like flexible schemas, ordering, and row/column equivalence to advance data management for large-scale analysis.

LLMs Do Not Grade Essays Like Humans

This study evaluates GPT and Llama models for zero-shot automated essay scoring, finding weak alignment with human grades due to differing evaluation signals. LLMs tend to over-score short essays and under-score longer ones containing minor grammatical errors compared to human raters. However, LLMs maintain high internal consistency between their scores and feedback, indicating their potential as reliable support tools in grading workflows.

Code

Miasma: A tool to trap AI web scrapers in an endless poison pit

Miasma is a lightweight Rust-based server designed to combat unauthorized web scraping for LLM training by serving poisoned data and self-referential links. It traps scrapers in an infinite loop of low-quality content, utilizing hidden HTML links and reverse proxy configurations to redirect bot traffic. The tool is optimized for low resource consumption and features configurable connection limits, compression, and data sources to disrupt automated data collection at scale.

Pglens – 27 read-only PostgreSQL tools for AI agents via MCP

pglens is a PostgreSQL MCP server designed to enhance LLM agent performance by providing deep database introspection beyond basic querying. It offers specialized tools for schema discovery, such as identifying multi-hop join paths and foreign-key relationships, alongside data exploration features like column statistics and enum value validation to prevent SQL generation errors. The server includes performance monitoring, read-only query execution with plan validation, and safety-focused introspection using pure pg_catalog queries.

/slot-machine development (CC vs. Codex; CE vs. superpowers)

Slot Machine is an open-source agent skill for Claude Code that implements a "best-of-N" pipeline to mitigate the probabilistic nature of LLM outputs. It executes multiple independent implementations of a spec in parallel, subjects each to adversarial review by isolated agents, and uses a meta-judge to either pick a winner or synthesize a superior solution from the best elements. This architecture reduces self-evaluation bias and improves code quality by leveraging diverse models, methodologies like TDD, and isolated git worktrees for structured comparison.

Open AI's competition spammed by AI slop

OpenAI's "Parameter Golf" challenge tasks participants with training a language model that fits within a 16MB artifact and completes training in under 10 minutes on 8xH100s. Evaluation is based on bits per byte (BPB) on the FineWeb validation set, pushing for innovations in QAT, TTT, depth recurrence, and aggressive parameter tying. The competition includes $1,000,000 in sponsored compute credits and serves as a talent discovery platform for OpenAI research roles.

CLI proxy that reduces LLM token consumption by 60-90% on common dev commands

RTK (Rust Token Killer) is a high-performance CLI proxy that reduces LLM token consumption by 60-90% by filtering and compressing command outputs with <10ms overhead. It employs strategies such as smart filtering, grouping, and deduplication across various development tools, including git, compilers, and test runners. The tool features transparent auto-rewrite hooks for major AI coding assistants like Claude Code, Cursor, and Copilot to optimize context window usage automatically.

    Google’s TurboQuant cuts KV cache size by 6x, Miasma traps AI scrapers in infinite poison pits and Stanford identifies "mirage reasoning" in vision models.