Sunday — February 22, 2026
Taalas prints Llama 3.1 directly onto silicon for 17,000 tokens/sec, research shows AI substitutes human labor at a 97% cost reduction, and a nano-GPT model runs natively on N64 hardware.
Interested in AI engineering? Let's talk
News
I verified my LinkedIn identity. Here's what I handed over
LinkedIn identity verification is outsourced to Persona, a third-party service that collects extensive physical and behavioral biometrics, including facial geometry and "hesitation detection." Persona utilizes "legitimate interest" clauses to use identity documents as training data for its AI models and employs subprocessors like OpenAI, Anthropic, and Groqcloud for data extraction and analysis. Despite claims of EU data residency, the involvement of 16 US-based subprocessors subjects the data to the US CLOUD Act, allowing government access regardless of physical server location.
Cord: Coordinating Trees of AI Agents
Cord is a framework that enables AI agents to dynamically decompose complex goals into structured task trees at runtime, moving beyond the static workflows of LangGraph or CrewAI. It introduces spawn and fork primitives to control context flow, allowing subtasks to either start with a clean slate or inherit results from siblings. Built using MCP tools and a shared SQLite database, Cord allows agents to autonomously manage dependencies, parallelism, and human-in-the-loop interactions through a learnable coordination protocol.
Meta Deployed AI and It Is Killing Our Agency
Meta's transition to AI-managed account monitoring and identity verification is causing systemic false positives, leading to immediate bans for legitimate agency accounts. The automated parameters lack a human-in-the-loop override, creating a circular failure where users cannot access the appeal tools required to contest the AI's decision. Internal support confirms that these automated systems currently operate without a manual intervention pathway for verified professional users.
How Taalas "prints" LLM onto a chip?
Taalas has developed a fixed-function ASIC that runs Llama 3.1 8B at 17,000 tokens/sec by hardwiring model weights directly into silicon transistors. This architecture eliminates the memory wall by streaming data through sequential physical layers, bypassing the need for external DRAM/HBM. The chip uses on-chip SRAM for KV cache and LoRA adapters, delivering 10x improvements in inference speed, energy efficiency, and TCO compared to traditional GPU-based systems.
The Internet Is Becoming a Dark Forest – and AI Is the Hunter
AI agents and LLMs are automating the full security lifecycle, enabling autonomous reconnaissance and exploitation at machine speed. Tools like PentAGI and Claude demonstrate that traditional perimeter defenses and scannable Zero Trust implementations are increasingly vulnerable to AI-driven discovery. The shift toward Zero Visibility architecture, exemplified by OpenNHP, aims to eliminate the attack surface by making infrastructure invisible until cryptographic identity is proven.
Research
Large Language Model Reasoning Failures
This survey categorizes LLM reasoning failures into embodied and non-embodied (informal and formal) types, further classifying them as fundamental, application-specific, or robustness-related. It analyzes root causes and mitigation strategies for these systemic weaknesses to guide the development of more reliable models. A curated GitHub repository of research works is provided to support ongoing efforts in the field.
Interactive Tools for Gaussian Splat Selection with AI and Human in the Loop
This research introduces an interactive toolset for 3DGS selection and segmentation, facilitating precise object extraction and scene editing. It employs an AI-driven method to propagate 2D masks to 3DGS and integrates a custom Video Diffusion Model for user-guided local editing. The system enables granular control over in-the-wild captures without requiring additional optimization.
Payrolls to Prompts: Firm-Level Evidence on the Substitution of Labor for AI
This study provides micro-level evidence of generative AI substituting for contracted human labor using a difference-in-differences analysis of firm spending data. Following the release of ChatGPT, firms with high exposure to online labor marketplaces significantly reduced labor expenditures while increasing AI model provider spending. The findings indicate a substitution ratio where $1 of displaced labor costs approximately $0.03 in AI spending, highlighting substantial cost savings in outsourced task production.
The Fundamental Limits of LLMs at Scale
This framework formalizes the theoretical ceilings of LLM scaling across five domains: hallucination, context compression, reasoning degradation, retrieval fragility, and multimodal misalignment. By applying computability theory and information-theoretic bounds, it proves that irreducible errors arise from undecidability, finite description length, and softmax crowding. The study identifies where scaling saturates and proposes mitigations like bounded-oracle retrieval and positional curricula to navigate these fundamental computational and statistical limits.
End-to-End Test-Time Training for Long Context
TTT-E2E treats long-context modeling as a continual learning problem, using a sliding-window Transformer that compresses context into its weights via test-time next-token prediction. By utilizing meta-learning during training to optimize test-time initialization, the model matches the scaling performance of full-attention Transformers while maintaining the constant inference latency of RNNs. This approach achieves a 2.7x speedup over full attention at 128K context lengths.
Code
AI uBlock Blacklist
This repository provides a manually curated uBlock Origin blacklist designed to filter AI-generated content farms and "slop." It targets low-utility websites that abuse SEO, lack human expertise, and risk spreading LLM hallucinations. The project utilizes specific heuristics and Google Dorks to identify unedited generative output, offering a surgical alternative to broader blocklists that might otherwise hide legitimate AI tools.
Llama 3.1 70B on a single RTX 3090 via NVMe-to-GPU bypassing the CPU
NTransformer is a high-efficiency C++/CUDA LLM inference engine designed to run large models like Llama 70B on consumer GPUs by streaming layers through a 3-tier adaptive caching system. It utilizes a custom gpu-nvme-direct backend to bypass the CPU, overlapping NVMe reads, PCIe DMA, and GPU compute via a double-buffered SLEP streaming pipeline. Key optimizations include cosine-similarity-based layer skipping, self-speculative decoding using VRAM-resident layers, and support for various GGUF quantization formats with zero external dependencies.
zclaw: personal AI assistant in under 888 KB, running on an ESP32
zclaw is a C-based AI personal assistant for ESP32 microcontrollers with a minimal firmware footprint of under 888 KiB. It supports natural language tool composition for GPIO control, persistent memory, and scheduled tasks using LLM providers like Anthropic, OpenAI, and OpenRouter. The system integrates via Telegram or web relays and includes utilities for secure provisioning and latency benchmarking.
Palantir's secret weapon isn't AI – it's Ontology. An open-source deep dive
Palantir’s Ontology strategy shifts data architecture from passive storage to an operational "digital twin" that integrates semantic objects with kinetic actions. This framework provides a governed foundation for AI-driven decision-making by implementing version control and branching for real-world operations. It aims to bridge the gap between raw data engineering and autonomous AI execution.
Happy Zelda's 40th first LLM running on N64 hardware (4MB RAM, 93MHz)
Legend of Elya is a tech demo featuring a nano-GPT transformer running natively on the N64's VR4300 CPU. The 819K parameter model utilizes a 4-layer architecture with Q8 quantization and Q8.7 fixed-point math, achieving 1–3 tok/s without FPU assistance. It operates with a 256 byte-level vocabulary and a 64-token context window, fitting entirely within the console's base 4MB RAM.