Friday — January 2, 2026

Zara replaces photo shoots with AI-dressed models, researchers prove transformers achieve high-precision Bayesian inference, and AgentAudit automates LLM security scanning in CI/CD.

Interested in AI engineering? Let's talk

News

Build a Deep Learning Library

This resource provides a comprehensive guide to building a deep learning library from scratch using NumPy. It covers essential components such as a custom autograd engine, tensor operations, optimizers, and neural network modules. The project culminates in implementing and training models like MNIST, CNNs, and ResNets.

Building an internal agent: Code-driven vs. LLM-driven workflows

LLM-driven workflows often struggle with determinism in complex tasks, leading to reliability issues in internal agents. To address this, a hybrid architecture was developed that supports both LLM-orchestrated and code-driven coordinators using Python scripts. This approach allows for deterministic execution and better performance while still utilizing LLMs as subagents for tasks requiring specific intelligence.

Tasker – An open-source desktop agent for browser and OS automation

Tasker is an open-source, MIT-licensed browser automation tool that leverages AI to execute workflows defined through natural language or manual recording. It features local execution for data privacy, visual debugging, and AI-driven adaptation to handle dynamic UI changes. The platform supports complex logic, including variables and loops, to automate repetitive web tasks across any site.

AI Futures Model: Dec 2025 Update

The AI Futures Model provides updated timelines for milestones including Automated Coder (AC), Superhuman AI Researcher (SAR), and ASI. Median forecasts for AC have shifted to 2031–2032, reflecting more conservative modeling of AI R&D automation and diminishing returns in software efficiency compared to previous iterations. The framework extrapolates METR coding time horizons while accounting for compute constraints and the potential for a "taste-only singularity" to drive fast takeoff speeds post-automation.

Zara uses AI to dress models virtually rather than book new photo shoots

Zara is deploying AI to digitally manipulate existing campaign imagery, enabling the virtual application of new apparel and background synthesis without additional physical shoots. This strategy, which mirrors H&M’s "digital twin" initiative, optimizes e-commerce pipelines and reduces operational overhead by scaling content production. While the retailer is currently securing model consent and maintaining traditional compensation, the move signals a broader shift toward AI-driven operational discipline in the fashion industry.

Research

The Bayesian Geometry of Transformer Attention

Researchers used "Bayesian wind tunnels" to prove that transformers achieve high-precision Bayesian inference ($10^{-3}$–$10^{-4}$ bits) in controlled environments where memorization is impossible. Mechanistically, residual streams serve as belief substrates, FFNs perform posterior updates, and attention handles content-addressable routing via a low-dimensional value manifold. This geometric design explains the architectural necessity of hierarchical attention for Bayesian reasoning, contrasting sharply with the failure of flat MLP architectures.

An LLM-Driven Multi-Agent Framework for Telescope Proposal Peer Review

AstroReview is an open-source, agent-based framework that automates telescope proposal reviews through a three-stage pipeline focusing on scientific merit, feasibility, and meta-review. By employing task isolation and explicit reasoning traces to mitigate hallucinations, the system achieves 87% accuracy in identifying accepted proposals without domain-specific fine-tuning. Furthermore, its iterative feedback loop, integrated with a Proposal Authoring Agent, increased draft acceptance rates by 66%, demonstrating a scalable and auditable approach to high-throughput peer review.

MHC: Manifold-Constrained Hyper-Connections

Manifold-Constrained Hyper-Connections (mHC) address the training instability and memory overhead of Hyper-Connections (HC) by restoring the identity mapping property through manifold projection. This framework optimizes infrastructure and connectivity patterns, enabling superior scalability and performance for large-scale foundational models.

MHC: Manifold-Constrained Hyper-Connections

Exposed: Shedding Blacklight on Online Privacy

A study combining anonymized browsing data with Blacklight tracking metrics reveals that >99% of users encounter ad trackers, while over 50% are exposed to invasive techniques like canvas fingerprinting and keylogging within 48 hours. Major organizations, primarily Google, can monitor >50% of web activity for more than half of users. Surveillance risk is driven by browsing content rather than volume, with demographic disparities persisting across age and race.

Code

A local-first financial auditor using IBM Granite, MCP, and SQLite

This privacy-centric financial platform utilizes a local-first agentic architecture powered by Ollama and MCP. It employs granite3.3:8b to interpret natural language and orchestrate SQL-backed tools, offloading arithmetic to SQLite to ensure 100% mathematical accuracy. The system integrates LLM-based vendor normalization and a FastAPI backend to transform raw bank data into structured, auditable insights.

Feature detection exploration in Lidar DEMs via differential decomp

RESIDUALS is a framework for feature detection in Digital Elevation Models (DEMs) using systematic signal decomposition and differential analysis. It evaluates nearly 40,000 combinations of decomposition and upsampling methods through a 4-level hierarchy to generate feature-specific extraction filters and divergence metrics. This approach enables precise identification of terrain features and infrastructure by isolating method-specific residuals and quantifying uncertainty across the parameter space.

GitHub Action for AI/LLM Security Scanning in CI/CD

AgentAudit is a GitHub Action that automates security scanning for AI agent endpoints within CI/CD pipelines to detect prompt injection, jailbreaking, and data exfiltration. It features configurable scan modes and severity-based failure thresholds, providing detailed risk scores and reports to facilitate automated security gating for LLM applications.

A holiday side project to make Anki flashcards without breaking flow

MasterFlasher is an Android application that automates flashcard creation for AnkiDroid using a multi-stage Gemini pipeline. It processes text, URLs, PDFs, and voice inputs through a local SQLite-backed inbox, utilizing Readability and pdf.js for content extraction. The AI workflow performs fact extraction, scoring, and generation with strict JSON schema validation, operating on a BYOK model with secure on-device key storage via Android KeyStore.

Gene – a Lisp-like language built around a generic "Gene" data type

Gene is a homoiconic, Lisp-like language implemented as a Nim-based bytecode VM featuring a unique data structure that unifies types, properties, and children. It supports advanced metaprogramming through macros and includes a native LLM runtime via llama.cpp for local GGUF inference. The language utilizes NaN-boxed values and a stack-based VM with computed-goto dispatch for high-performance execution.