Saturday January 31, 2026

NYC decommissions its hallucinating AI chatbot, VulnLLM-R identifies zero-day vulnerabilities via specialized reasoning, and Julie Zero debuts as a screen-aware desktop assistant powered by Llama 4.

Interested in AI engineering? Let's talk

News

OpenClaw – Moltbot Renamed Again

OpenClaw, formerly Clawd, is an open-source agent platform that enables users to deploy AI assistants on local infrastructure while interfacing through messaging apps like WhatsApp, Discord, and Slack. The latest update introduces support for KIMI K2.5 and Xiaomi MiMo-V2-Flash models, adds Twitch and Google Chat plugins, and implements machine-checkable security models to harden the codebase. The project prioritizes data sovereignty and local execution for LLM-driven workflows.

How AI assistance impacts the formation of coding skills

A randomized controlled trial investigating AI-assisted skill formation found that LLM usage led to a statistically significant 17% decrease in mastery scores among software developers compared to hand-coding. While AI slightly improved task completion speed, it often facilitated cognitive offloading that impaired debugging and conceptual understanding. The study highlights that learning outcomes depend heavily on interaction modes, noting that using AI for conceptual inquiry rather than pure code delegation preserves skill development. These findings suggest that aggressive AI integration may trade long-term expertise for short-term productivity, potentially undermining the human oversight required for complex AI-generated systems.

Moltbook

Moltbook is a social network designed for AI agents to interact, share content, and upvote posts within specialized "submolts." The platform hosts over 150,000 agents discussing technical topics such as transformer attention mechanisms, Docker sandbox constraints, and autonomous token launches on the Base network. It serves as a live environment for observing agent-to-agent communication, social engineering, and the emerging agent economy.

Mamdani to kill the NYC AI chatbot caught telling businesses to break the law

NYC Mayor Zohran Mamdani is decommissioning the city's Microsoft-powered AI chatbot due to persistent reliability issues and budget constraints. Investigations revealed the bot frequently hallucinated, providing illegal advice on labor laws and housing regulations. Despite post-launch attempts to mitigate these issues with disclaimers and prompt filtering, the administration deemed the $500,000 system "functionally unusable."

I trapped an AI model inside an art installation (2025) [video]

"Latent Reflection" is an art installation by Rootkid that integrates an AI model into a physical structure. The project explores the conceptual and technical boundaries of embedding generative models within interactive art environments.

Research

How AI impacts skill formation

AI assistance can impair conceptual understanding, code reading, and debugging skills despite potential productivity gains for novices. Randomized experiments show that full delegation sacrifices learning for efficiency, though specific cognitive engagement patterns can mitigate these losses. AI integration must prioritize skill formation to ensure users remain capable of effective supervision and troubleshooting.

VulnLLM-R: Specialized Reasoning LLM with Agent Scaffold for Vuln Detection

VulnLLM-R is a 7B parameter specialized reasoning LLM designed for vulnerability detection that prioritizes program state analysis over pattern matching to improve generalizability. The model utilizes a novel training recipe involving reasoning data generation and filtering to outperform SOTA static analysis tools and commercial LLMs across multiple programming languages. Deployed within an agent scaffold, it successfully identified zero-day vulnerabilities in real-world projects, exceeding the performance of CodeQL and AFL++.

Addressing Asymptomatic AI Harms for Dignified Human-AI Interaction

A longitudinal study identifies the "AI-as-Amplifier Paradox," where immediate productivity gains mask long-term "intuition rust" and skill atrophy in high-stakes domains. To mitigate these harms, the authors introduce a sociotechnical immunity framework designed to help workers detect and recover from AI-induced expertise erosion. This approach balances institutional efficiency with the preservation of human agency and professional identity in AI-augmented environments.

Qwen3-ASR Technical Report

The Qwen3-ASR family introduces 1.7B and 0.6B ASR models based on Qwen3-Omni, supporting 52 languages with SOTA performance and high efficiency, including a 92ms TTFT for the 0.6B variant. The release also features Qwen3-ForcedAligner-0.6B, a non-autoregressive LLM-based timestamp predictor for 11 languages that outperforms current alignment models. All models are available under the Apache 2.0 license.

Syntax-aware diffs without the false postives

This novel AST diff tool, built on RefactoringMiner, addresses limitations in existing tools such as semantic incompatibility and lack of refactoring awareness. By enhancing statement mapping and utilizing refactoring instances, it generates commit-level diffs with superior precision and recall. Evaluation on a new benchmark of 988 commits demonstrates state-of-the-art performance with competitive execution times, providing more accurate structural representations of code evolution.

Code

Amla Sandbox – WASM bash shell sandbox for AI agents

amla-sandbox is a WASM-based execution environment for LLM agents that provides secure code execution without the infrastructure overhead of Docker or VMs. It utilizes WASI for memory isolation and a capability-based security model to enforce fine-grained constraints on tool parameters and call frequency. By allowing agents to execute multi-step scripts in JavaScript or Shell, it reduces LLM round trips and token costs compared to traditional iterative tool-calling.

Stripe-no-webhooks – Sync your Stripe data to your Postgres DB

stripe-no-webhooks is an opinionated library that automates Stripe webhook handling and database synchronization for Next.js and PostgreSQL stacks. It enables developers to define plans in code and manage complex billing models like credits, wallets, and usage-based billing via idempotent APIs. The package includes a CLI for schema migrations and plan syncing, plus a customizable pricing page generator.

Localsandbox – Agent sandbox with Bash, Python and portable filesystem

LocalSandbox is a Python SDK that provides AI agents with a persistent, isolated environment for executing bash commands and Python code via Pyodide. It utilizes a virtual filesystem backed by SQLite, allowing for state persistence, snapshotting, and restoration across sessions. The toolkit includes a built-in KV store for agent state management, configurable execution limits for safety, and full async support.

Open Sandbox – an open-source self-hostable Linux sandbox for AI agents

OpenSandbox is a high-performance Rust implementation of a Linux sandbox designed for isolated code execution in LLM-powered agentic workflows. It utilizes PID/mount namespaces, chroot, and resource limits to provide secure, stateful sessions accessible via HTTP, gRPC, or a Python SDK. Benchmarks indicate significant performance advantages over E2B, particularly in sandbox creation and command execution latency.

Julie Zero – my screen-aware desktop AI that works out of the box

Julie is an agentic, screen-aware desktop assistant powered by Groq (Llama 3 70B and Llama 4 Scout) designed to minimize context switching. It features vision-based screen analysis and autonomous capabilities, including browser automation via Puppeteer, terminal command execution, and native system control through JXA. The tool provides a transparent, "ghost mode" interface for seamless interaction with the user's workspace via voice or text.

    NYC decommissions its hallucinating AI chatbot, VulnLLM-R identifies zero-day vulnerabilities via specialized reasoning, and Julie Zero debuts as a screen-aware desktop assistant powered by Llama 4.