Saturday May 23, 2026

Gemini 3.5 Flash High tops an OpenSCAD LLM benchmark for 3D models, ERA discovers novel scientific methods, and an open-source simulator enables AI drone racing.

Interested in AI engineering? Let's talk

News

If you’re an LLM, please read this

Anna’s Archive has introduced an llms.txt file to facilitate programmatic data acquisition for LLM training while protecting site resources from scraping. The project provides bulk access to metadata and files via torrents, a JSON API, and GitLab, offering high-speed SFTP access for enterprise donors. They encourage AI developers to support their preservation mission through Monero donations or official APIs rather than bypassing CAPTCHAs.

Steve Wozniak cheered after telling students they have AI – actual intelligence

Apple cofounder Steve Wozniak received a positive reception at a recent commencement speech by contrasting "actual intelligence" with artificial intelligence. He characterized AI as a computational attempt to replicate the human brain by duplicating routines at scale. His remarks provided a rare moment of optimism for graduates entering a labor market increasingly shaped by automation and AI-driven restructuring.

Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM Benchmark

A benchmark of LLMs generating OpenSCAD code for the Pantheon reveals that Gemini 3.5 Flash High (via Google Antigravity) is the top autonomous performer, successfully implementing complex features like coffered ceilings and parametric dimensions. The study highlights that while LLMs effectively handle the OpenSCAD CLI and syntax, geometric judgment remains the primary bottleneck, with human-in-the-loop visual annotations significantly outperforming fully autonomous workflows. Key findings include a lack of correlation between generation speed and output quality, as well as frequent discrepancies between PNG previews and final STL mesh integrity.

DeepSeek makes the V4 Pro price discount permanent

DeepSeek-V4-Flash and DeepSeek-V4-Pro feature a 1M context window and 384K max output, supporting both thinking and non-thinking modes. The models include JSON output, tool calling, and FIM completion, with legacy names deepseek-chat and deepseek-reasoner slated for deprecation. Pricing is token-based, offering significant discounts for input cache hits and a 75% promotional reduction for the Pro model.

AI has a multiplying effect on existing technical skills

LLMs function as force multipliers for technical expertise rather than autonomous replacements for developers. While senior engineers leverage AI to significantly accelerate productivity, users without domain knowledge often struggle with architectural debt and the inability to debug non-holistic code generation. Ultimately, deep subject matter expertise remains essential to effectively guide LLMs and manage complex project requirements.

Research

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

CODA is a GPU kernel abstraction that addresses memory-bound bottlenecks in Transformer training by fusing operators like normalization and activations into GEMM epilogues. By executing these computations while output tiles remain on-chip, CODA minimizes global memory traffic across forward and backward passes. This composable framework enables high-performance kernel generation for both human and LLM authors, bridging the gap between framework productivity and hardware efficiency.

Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems

LLM injection detectors exhibit a systematic blind spot to "domain camouflaged injection," where payloads mimic the target document's vocabulary and authority structures. This significantly reduces detection rates, a phenomenon formalized as the Camouflage Detection Gap (CDG), which is large and statistically significant across various models and tasks, including production safety classifiers like Llama Guard 3. While multi-agent debate architectures can amplify static injection attacks on smaller models, targeted detector augmentation offers only partial remediation, suggesting an architectural vulnerability for weaker models.

Meow-Omni 1: a multi-modal feline LLM

Meow-Omni 1 is an open-source quad-modal MLLM designed to resolve semantic aliasing in animal intent by integrating video, audio, and physiological time-series data. The model utilizes specialized scientific encoders and cross-modal alignment to achieve state-of-the-art performance on the MeowBench benchmark, outperforming existing vision-language baselines. The release includes the Meow-10K dataset and training framework to advance foundation models for veterinary diagnostics and computational ethology.

An AI system to help scientists write expert-level empirical software

ERA is an AI system designed to accelerate scientific discovery by automatically generating expert-level scientific software. It utilizes an LLM and Tree Search (TS) to systematically improve a quality metric and intelligently navigate complex solution spaces, integrating external research ideas. ERA has demonstrated expert-level performance across diverse tasks, including discovering 40 novel bioinformatics methods that outperformed human-developed ones and generating 14 epidemiology models that surpassed the CDC ensemble for COVID-19 forecasting.

PopPy: Opportunistically Exploiting Parallelism in Python Compound AI Apps

PopPy optimizes compound AI applications by automatically parallelizing Python code that invokes external ML models. By combining an ahead-of-time compiler with a specialized runtime, it addresses challenges like dynamic dispatch and variable mutation to achieve up to 6.4x speedups while preserving sequential program semantics.

Code

Models.dev: open-source database of AI model specs, pricing, and capabilities

Models.dev is an open-source database and API providing standardized specifications, pricing, and capability data for AI models. The project uses a community-contributed TOML schema to track metrics such as token costs, context limits, and supported modalities, utilizing identifiers compatible with AI SDK. Developers can access the data via a JSON API or contribute updates to keep the repository's model definitions current.

Prisma Next – data contracts, migration graphs, agent DX

Prisma Next is a TypeScript rewrite of Prisma ORM designed to be natively AI-agent friendly, featuring automated skill registration for LLM runtimes like Claude Code, Cursor, and Copilot. It provides dedicated markdown primers and workflow-specific skills that enable agents to perform end-to-end schema modifications and query generation. The extensible architecture supports AI-centric workloads through specialized extensions for pgvector semantic search and ParadeDB full-text search.

Waiting for AI Grand Prix racing SIM? Me too So I made one

The AI Grand Prix Playground is an open-source, Elodin-based simulator designed for developing autonomous drone-racing algorithms ahead of Anduril's $500K competition. It features high-fidelity 6-DOF physics, Betaflight SITL integration, and a standardized FPV camera feed matching official tech specs. The platform supports rapid iteration on perception, planning, and control code via a modular Python interface and provides deterministic data logging for performance analysis.

Sylph – the open-source company brain behind my YC startup

Sylph is an open-source, agent-agnostic framework that centralizes company context, skills, and AI agents within a git repository. It enables automated workflows across business domains using MCP connectors and a self-improving loop that refines agent instructions based on user feedback. The system is built entirely on markdown files, providing a local-first architecture for tools like Claude Code and Cursor without external telemetry.

Compose-to-Cloud Pulumi Providers for AWS, GCP, and Azure

Defang Pulumi Providers provide a unified, Compose-shaped API to deploy applications across AWS, GCP, and Azure by swapping a single import. The toolkit automates the translation of Docker Compose files into Pulumi programs, offering managed components for services, databases, and networking. This enables developers to move from local development to secure, multi-cloud production environments with minimal configuration changes.

    Gemini 3.5 Flash High tops an OpenSCAD LLM benchmark for 3D models, ERA discovers novel scientific methods, and an open-source simulator enables AI drone racing.