Monday — June 8, 2026

SDSU installs 1,300 AI cameras in student dorms, MoE routing metadata is found to leak sensitive text, and Nightwatch launches as an open-source AI SRE.

Interested in AI engineering? Let's talk

News

The OnlyFans Economy of American AI

US frontier LLMs have reached an S-curve plateau, where high costs and restrictive rate limits no longer justify the performance multiplier over emerging alternatives. Engineering benchmarks indicate that Chinese models, specifically Qwen 3.7 Max, offer superior utility for complex workflows through native extended-thinking and significantly better cost-efficiency. This shift suggests a move away from inflated valuations toward pragmatic, wallet-driven model selection via providers like OpenRouter and DeepSeek.

SDSU Wired Its Dorms with 1,300 AI Cameras Without Telling Students

SDSU has deployed a $1.3 million network of over 1,300 AI-enabled Avigilon cameras, integrating advanced computer vision capabilities like facial recognition, LPR, and behavioral analysis across campus and residential areas. Although the university claims to limit usage to basic motion detection, the hardware's native support for granular tracking and object detection raises significant privacy and transparency concerns. This move mirrors a broader trend of integrating sophisticated AI surveillance infrastructure into educational environments.

Automated QA and Testing with AI

AI-assisted development significantly increases velocity but often sacrifices structural code quality. To counter this, LLMs can be leveraged as QA agents to automate complex integration tests, regression analysis, and subjective quality assessments that were previously manual. By analyzing commits and simulating production environments, these agents raise the release quality bar, effectively compensating for the trade-offs of automatic programming.

Data centers consumed 264B gallons of water as drought hits nearly 63% of US

AI data centers are projected to consume 264 billion gallons of water in 2025, driven by the intensive evaporative cooling requirements of high-density compute clusters. This consumption rate of 550 million gallons per day coincides with severe drought conditions affecting 63% of the US, leading to increased scrutiny of hyperscale infrastructure expansion by companies like Microsoft, Google, and Meta. Beyond water usage, the rapid scaling of AI facilities is straining electrical grids, prompting debates over resource allocation and infrastructure costs between tech giants and local communities.

VibeOS: First ever AI-native operating system

vibeOS is an AI-native operating system powered by Claude Code that enables real-time application and widget generation through natural language prompts. Built on a NextJS and React stack, it integrates MCP via daedalus and supports seamless browser-to-agent handoffs using onkernel. The platform is available as a Dockerized container for secure, local execution.

Research

Expert Selections in MoE Transformer Models Reveal Almost as Much as Text

Researchers developed a text-reconstruction attack on MoE models that recovers tokens from expert routing decisions with up to 91.2% top-1 accuracy using a transformer-based decoder. This demonstrates that MoE routing metadata leaks substantial information, comparable to embedding inversion, posing privacy risks in distributed inference and side-channel scenarios. The findings suggest that expert selections should be treated as sensitive data.

Price Evolution, Production Frontiers, and Market Competition in LLM Inference

This paper presents the first economic analysis of LLM token pricing, documenting a 600-fold decline and proposing a "Tiered Super-Moore" hypothesis where economy/mid-tier models outpace Moore's Law, while flagship models maintain a reasoning premium. A critical market inflection point in May 2024 shifted price acceleration from technology-driven to competition-driven. Cost reduction is primarily attributed to software and architectural innovation, not GPU hardware, with technological frontier shift driving a peak Malmquist Productivity Index in 2024Q1-Q4. Architectural innovation also explains the U.S./Chinese training cost gap, alongside a sharp decline in market concentration.

Mitigating the LLM Rerun Crisis for Minimized-Inference-Cost Web Automation

To address the "Rerun Crisis" of linear cost scaling in continuous inference web agents, a Compile-and-Execute architecture decouples reasoning from execution. By generating a deterministic JSON workflow blueprint from a sanitized DOM in a one-shot LLM invocation, the system shifts inference complexity from O(M x N) to amortized O(1). This approach reduces per-workflow costs to under $0.10 and enables near-100% reliability through modular HITL patching.

MemGraphRAG: Memory-Based Multi-Agent System for Graph RAG

MemGraphRAG addresses the structural fragmentation and logical inconsistencies of traditional GraphRAG by utilizing a memory-based multi-agent system for graph construction. By leveraging shared memory to provide global context, the framework ensures thematic consistency and connectivity across large-scale, unstructured corpora. It also introduces a memory-aware hierarchical retrieval algorithm, delivering SOTA performance on complex reasoning tasks with high efficiency.

How do AI agents spend your money?

This study analyzes token consumption in agentic coding tasks using SWE-bench Verified, finding that agents consume 1000x more tokens than code chat, primarily driven by input volume. Token usage is highly stochastic and does not correlate linearly with accuracy, which often saturates at higher costs. Frontier models show significant efficiency variances, fail to accurately predict their own token expenditures, and demonstrate that human-rated task difficulty is a poor predictor of actual computational cost.

Code

Lathe – Use LLMs to learn a new domain, not skip past it

Lathe is an experimental tool designed to use LLMs for hands-on technical education rather than automated task execution. It combines a Golang CLI with specialized LLM skills for Claude Code, Cursor, and Codex to generate, verify, and manage multi-part tutorials within a dedicated local UI. Key features include automated code verification in isolated scratch environments, provenance tracking of research sources, and customizable "voices" to control the tone and style of the generated prose.

Nightwatch, The open-source, read-only AI SRE

ninoxAI is an open-source, read-only AI SRE designed to automate incident investigation by clustering alert storms and performing root cause analysis. It utilizes a tool-calling AI agent with a ReAct loop to query live infrastructure—including K8s, AWS, and Docker—to propose human-gated fixes while maintaining a strict read-only boundary. The platform is local-first and monitoring-agnostic, supporting various LLM providers including Anthropic, OpenAI, and local deployments via Ollama or vLLM.

Obsidian-agent-bridge – let AI agents read, write, and deepen Obsidian vaults

obsidian-agent-bridge is a library that enables AI agents to interface with Obsidian vaults as living knowledge graphs via the Local REST API. Its core functionality, deepenNode, allows LLMs to synthesize new observations into existing markdown files with automated deduplication and [[wikilink]] management. This provides a structured alternative to standard vector database memory, allowing agents to build and navigate complex knowledge architectures through tool calls.

Avibe – your AI agent lives on your machine, reachable from your phone

Avibe is a local-first Agent OS that enables remote access to CLI-based agents like Claude Code, Codex, and OpenCode while keeping code and keys on the local machine. It provides a unified workbench and chat-based interface for scheduling, background execution, and cross-agent skill management via an "Agent Harness." The system uses secure tunneling and WebSockets to facilitate remote control from mobile or browser environments without proxying private data or requiring public inbound ports.

AgentCrew – a Markdown-first operating system for AI coding agents

AgentCrew is a Markdown-first methodology that structures individual coding agents into disciplined teams using roles, routing, and quality gates. It integrates with existing tools like Claude Code or Cursor to enforce rigorous development cycles through task classification, specialized roles (Developer, Tester, Reviewer), and local state management. The system prioritizes human-in-the-loop control, ensuring agents follow structured protocols for testing and review while requiring final human approval for all code changes.