Wednesday — March 18, 2026
OpenAI launches GPT-5.4 Mini and Nano, researchers down AI drones using painted umbrellas, and AI Overkill optimizes Claude Code with specialized domain agents.
Interested in AI engineering? Let's talk
News
Kagi Translate now supports LinkedIn Speak as an output language
Kagi Translate is a multi-modal translation utility offering text, document, and website translation, as well as proofreading and dictionary services. It supports a 20,000-character context window and features "Standard" and "Best" quality tiers, likely leveraging different LLM backends for performance optimization. The tool also includes style transfer capabilities, such as "LinkedIn Speak," alongside automated language detection and broad linguistic support.
Mistral AI Releases Forge
Mistral AI’s Forge is a system for enterprises to build frontier-grade models grounded in proprietary data, supporting pre-training, post-training, and reinforcement learning across dense and MoE architectures. It features an agent-first design that enables autonomous optimization, synthetic data generation, and continuous alignment with internal benchmarks and compliance standards. By internalizing domain-specific terminology and workflows, Forge allows organizations to deploy reliable agents and maintain strategic autonomy over their institutional knowledge.
Unsloth Studio
Unsloth Studio is an open-source, no-code web UI for local LLM training, inference, and model exporting. It leverages optimized kernels to provide 2x faster training with 70% less VRAM usage across 500+ models, including text, vision, and TTS. Key features include automated dataset generation via Data Recipes, real-time training observability, and a Model Arena for side-by-side GGUF and safetensor comparisons.
GPT‑5.4 Mini and Nano
GPT-5.4 mini and nano are new low-latency models optimized for high-volume workloads and subagent architectures. GPT-5.4 mini delivers a 2x speed increase over GPT-5 mini while approaching GPT-5.4 performance on SWE-Bench Pro and OSWorld-Verified. GPT-5.4 nano provides a cost-efficient alternative for classification and simple reasoning tasks. Both models support multimodal inputs and tool use, with GPT-5.4 mini offering a 400k context window at $0.75/$4.50 per 1M tokens.
AI still doesn't work well, businesses are faking it, and a reckoning is coming
Codestrap founders warn of an impending reckoning as enterprises struggle with LLM fallibility, non-determinism, and a lack of inductive reasoning. They argue that current metrics like lines of code or PR counts fail to capture the performance regressions and quality issues inherent in AI-generated software. Beyond technical limitations, the industry faces misaligned incentives, client demands for AI-driven discounts, and a significant shift by insurance underwriters to exclude AI-related liabilities.
Research
Why AI systems don't learn – On autonomous learning from cognitive science
The proposed architecture addresses limitations in autonomous AI learning by integrating observational (System A) and active (System B) learning modes. A meta-control mechanism (System M) dynamically switches between these systems based on internal signals, drawing inspiration from biological adaptation in dynamic environments.
UC Irvine researchers bring down AI powered drones with painted umbrellas
A new type of Distance-Pulling Attack (DPA) is introduced, exploiting vulnerabilities in Autonomous Target Tracking (ATT) systems to dangerously reduce tracking distances. FlyTrap, a novel physical-world attack framework, employs an adversarial umbrella with a progressive distance-pulling strategy and controllable spatial-temporal consistency to achieve this. Evaluated on commercial ATT drones, FlyTrap successfully reduces tracking distances to enable drone capture, increased susceptibility to sensor attacks, or physical collisions, highlighting urgent security risks.
Automating Forecasting Question Generation and Resolution for AI Evaluation
This work introduces an LLM-powered system for automatically generating and resolving diverse, high-quality forecasting questions at scale. The system generated 1499 real-world questions, achieving 96% verifiability (exceeding Metaculus) and 95% resolution accuracy. It validated that more intelligent LLMs improve forecasting performance and demonstrated that a question decomposition strategy significantly improves Brier scores.
Pimp My LLM: Leveraging Variability Modeling to Tune Inference Hyperparameters
This research applies software engineering variability management to optimize the complex configuration space of LLM inference. By representing generation hyperparameters as feature-based variability models, the authors sampled configurations from Hugging Face Transformers to train predictive models for energy, latency, and accuracy. The results demonstrate that this systematic approach effectively identifies hyperparameter interactions and trade-offs, enabling accurate performance prediction from a limited measurement set.
Real-World Industrial-Scale Verification: LLM-Driven Theorem Proving on SeL4
AutoReal is an LLM-driven framework designed for industrial-scale formal verification, specifically targeting the seL4-Isabelle project. It utilizes AutoReal-Prover, a fine-tuned 7B-scale model that incorporates CoT-based proof training and context augmentation to enable lightweight, local deployment. The method achieves a 51.67% success rate on seL4 theorems, significantly outperforming previous benchmarks, and demonstrates strong generalization across security-related projects in the AFP.
Code
Antfly: Distributed, Multimodal Search and Memory and Graphs in Go
Antfly is a distributed, Raft-based search engine that integrates hybrid search (BM25, dense, and SPLADE sparse vectors) with multimodal data support and graph traversal. It features built-in RAG agents with tool-calling capabilities, automated embedding and relationship extraction pipelines, and hardware-accelerated vector operations via SIMD. The ecosystem includes an MCP server for LLM tool-calling, a PostgreSQL extension, and a dedicated ML inference engine for local or cloud-based model execution.
Make your coding models create ADRs before implementation
Corbell is a local-first tool that constructs a multi-repo architecture graph from source code and documentation to streamline backend engineering workflows. It leverages LLMs and embedding similarity to automate spec generation, service discovery, and architecture reviews while ensuring consistency with established design patterns. Key features include an interactive graph UI, MCP support for IDE integration, and a roadmap toward a fully agentic architecture for autonomous system analysis.
Unsloth Studio - Local Fine-tuning, Chat UI
Unsloth provides a unified interface for training and running LLMs, vision, and audio models, delivering up to 2x faster training speeds with 70% less VRAM usage. It supports full fine-tuning, 4-bit/FP8 quantization, and efficient RL via GRPO, while offering features like tool calling and GGUF export. Available as a web UI (Studio) or a library (Core), it is compatible with NVIDIA, AMD, and Intel hardware.
My Claude Code setup you definitely shouldn't use. It's AI Overkill
AI Overkill is a framework for Claude Code that replaces general prompting with specialized domain agents, phase-gated workflows, and multi-wave parallel code reviews. It features a natural language router for dispatching expert agents and a persistent memory system that enables cross-session learning from errors and retrospectives. The project emphasizes "throwing tokens at the problem" to achieve high-context, deterministic results in debugging and code analysis.
RocketRide – Build and run AI/data pipelines within VS Code, Cursor etc.
RocketRide is a high-performance data processing engine built on a C++ core with Python-extensible nodes, designed for scalable AI/ML workloads on private infrastructure. It features an IDE-integrated visual builder for orchestrating multi-agent workflows, supporting CrewAI and LangChain. The platform offers 50+ pipeline nodes, including integrations for 13 LLM providers, 8 vector databases, OCR, NER, and PII anonymization. Pipelines, defined in a .pipe JSON format, can be deployed via Docker or on-prem, integrated via TypeScript/Python SDKs, and provide detailed analytics for optimization.