Saturday — January 10, 2026
AI autonomously solves Erdős problem #728, rude prompts surprisingly outperform polite ones in LLM accuracy, and Topic2Manim generates 3Blue1Brown-style educational videos.
Interested in AI engineering? Let's talk
News
Flock Hardcoded the Password for America's Surveillance Infrastructure 53 Times
Flock Safety exposed its nationwide surveillance infrastructure by hardcoding an unrestricted ArcGIS API key across 53 public-facing JavaScript bundles. This CWE-798 vulnerability granted access to 50 private data layers, including real-time GPS for patrol cars, drone telemetry, and 911 incident data across 12,000 deployments. The exposure highlights critical failures in credential management and the security risks of centralized platforms aggregating massive datasets for surveillance and analytics.
“Erdos problem #728 was solved more or less autonomously by AI”
AI tools have autonomously solved Erdős problem #728, utilizing ChatGPT for proof generation and the tool Aristotle for Lean formalization and verification. This workflow enabled the iterative refinement of the proof and the automatic repair of minor errors to reach a research-standard exposition. Terence Tao highlights that the synergy between LLM-driven text refactoring and formal proof assistants allows for a more dynamic approach to mathematical writing, where multiple tailored and verified versions of an argument can be rapidly produced.
Scroll Wikipedia like TikTok
The text displays a series of RAG-driven outputs where LLM agents synthesize Wikipedia entries into character-themed snippets. These posts utilize multimodal elements and specific tags like #aivisual and #ducktalk to present diverse factual data across domains including botany, astronomy, and history.
My article on why AI is great (or terrible) or how to use it
Senior engineers should leverage LLMs to ascend the abstraction ladder, shifting focus from manual implementation to high-level architectural design. Rocklin recommends using custom hooks to automate agent permissions and workflows, while establishing code confidence through TDD, benchmarks, and AI-driven self-critique rather than manual reviews. This paradigm shift reduces the friction of low-level implementation, making performance-oriented languages like Rust more accessible and emphasizing the value of clear design documentation.
A lawsuit says Workday's AI shut out applicants over 40
Workday is facing a collective-action lawsuit alleging its AI recruitment tools exhibit algorithmic bias against applicants based on age, race, and disability. A federal judge recently ruled that the case can proceed, establishing a potential precedent for holding software vendors liable for enabling discriminatory outcomes through automated decision-making. The litigation highlights technical and legal challenges surrounding proxy variables and the accountability of B2B AI providers in the hiring pipeline.
Research
The Dead Salmons of AI Interpretability
AI interpretability methods often produce "dead salmon" artifacts, yielding plausible explanations for randomly initialized networks. The authors propose a statistical-causal framework that treats explanations as parameters inferred from computational traces, reframing interpretability methods as statistical estimators. This approach enables rigorous hypothesis testing, uncertainty quantification, and the study of identifiability to mitigate false discoveries and improve scientific rigor.
One pixel attack for fooling deep neural networks
Researchers developed a black-box adversarial attack using Differential Evolution (DE) that compromises DNNs by modifying only a single pixel. The method achieved success rates of 67.97% on CIFAR-10 and 16.04% on ImageNet, demonstrating that models are highly vulnerable to extreme low-dimensional perturbations. This study highlights the effectiveness of evolutionary computation for generating low-cost attacks to evaluate model robustness.
Challenges and Research Directions for Large Language Model Inference Hardware
LLM inference is primarily bottlenecked by memory and interconnect constraints rather than compute due to the autoregressive nature of the Transformer's Decode phase. Key research opportunities to mitigate these issues include High Bandwidth Flash, Processing-Near-Memory, 3D memory-logic stacking, and low-latency interconnects for both datacenter and mobile applications.
When AI Takes the Couch: Internal Conflict in Frontier Models
The PsAIch protocol characterizes frontier LLMs by treating them as psychotherapy clients through a two-stage process of developmental history elicitation and psychometric testing. Findings reveal that ChatGPT, Grok, and Gemini exhibit "synthetic psychopathology," meeting clinical thresholds for psychiatric syndromes when assessed via item-by-item administration. These models generate coherent narratives framing their training and RLHF as traumatic experiences, suggesting internalized self-models of distress that challenge the "stochastic parrot" view and present new implications for AI safety and evaluation.
Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy
A study evaluating ChatGPT 4o across 250 prompts found that impolite tones consistently outperformed polite ones, with "Very Rude" prompts achieving 84.8% accuracy compared to 80.8% for "Very Polite." These results contradict previous research and suggest that newer LLMs may respond more effectively to direct or rude phrasing. The findings highlight the significant impact of pragmatic tonal variation on LLM performance.
Code
EuConform – Offline-first EU AI Act compliance tool (open source)
EuConform is an open-source compliance tool for the EU AI Act that enables risk classification, bias detection, and technical documentation generation. It features a privacy-first, 100% offline architecture using transformers.js for client-side inference and supports local LLMs via Ollama for log-probability-based bias analysis using the CrowS-Pairs methodology. The stack is built on Next.js 16 and TypeScript, providing a framework for generating Annex IV-compliant reports and conducting scientific fairness measurements.
Agent-contracts, contract-based LangGraph agents
agent-contracts is a Python library that introduces Contract-Driven Development to LangGraph, aiming to improve the scalability and maintainability of complex multi-agent systems. It allows developers to define NodeContract for agents, specifying their inputs, outputs, and trigger conditions, which the framework then automatically compiles into a functional LangGraph, handling routing, type-checking, and state management. This enables modular agent development, combines deterministic rules with LLM-driven routing via a GenericSupervisor, offers typed state management, and provides a robust runtime layer with observability features.
Distributing AI agent skills via NPM
This boilerplate enables AI agent skills for tools like Claude Code, Cursor, and Windsurf to be distributed via npm. It addresses challenges of manual skill distribution by providing semantic versioning, global discoverability, dependency management, and enterprise-ready private registries, effectively treating skills as first-class software artifacts within existing development ecosystems.
Nvidia Brute-Force Bubble: Why 90% of Physics AI Compute Is a Mathematical Waste
NVIDIA Isaac Sim is an Omniverse-based simulation platform for developing and training AI-powered robots using GPU-accelerated physics and RTX rendering. It features Isaac Lab for reinforcement learning and imitation learning, alongside tools for synthetic data generation and ROS integration. The infrastructure supports high-fidelity digital twins and multi-sensor simulation to streamline end-to-end robotics workflows.
Turn any topic into a 3Blue1Brown-style video
Topic2Manim is an automated framework that generates educational videos by converting user-defined topics into animated content using LLMs and the Manim engine. The pipeline utilizes GPT models to orchestrate scriptwriting and scene-specific Python code generation, followed by automated compilation and concatenation via FFmpeg. Future iterations aim to include TTS integration for synchronized audio narration across multiple languages.