Wednesday May 27, 2026

Uber exhausts its 2026 AI budget in four months, research finds that rude prompts improve LLM accuracy, and Vision Clicker automates the approval of agent-generated actions.

Interested in AI engineering? Let's talk

News

Uber president says AI spending is getting 'harder to justify'

Uber is questioning the ROI of its AI investments after exhausting its 2026 budget in just four months. President Andrew Macdonald highlighted a disconnect between surging token consumption—specifically for Claude Code—and the delivery of tangible consumer features. As R&D costs rise, the company is increasingly forced to evaluate the trade-offs between high API expenses and human headcount.

Stack Overflow’s forum is dead but the company’s still kicking

Stack Overflow has seen forum engagement plummet to 2008 levels as developers migrate to LLMs for coding assistance. Despite this, the company doubled its annual revenue to $115 million by licensing its human-curated dataset to AI labs and launching "Stack Internal," an enterprise generative AI tool. This pivot positions the platform as a critical data source for training models on complex technical queries that automated assistants still struggle to resolve.

Launch HN: Minicor (YC P26) – Windows desktop automations at scale

Minicor is an RPA platform that enables AI deployment into legacy systems using computer use agents and deterministic code. It features self-healing reflection agents that verify UI actions in real-time, achieving 93-96% accuracy by combining scripted workflows with agentic error recovery. The platform supports deployment on Windows VMs or on-prem and provides full observability through video replays and execution context.

AI tools are only as good as your judgment

Passive AI integration risks technical debt and the abdication of engineering judgment. To mitigate this, engineers should adopt an adversarial workflow, treating LLM outputs as drafts to be interrogated for edge cases, security flaws, and implicit assumptions. Maintaining a "generate, interrogate, revise" loop ensures that AI tools sharpen rather than replace critical thinking.

The AI bubble isn't like the internet bubble

The AI bubble fundamentally differs from the early internet because adoption is driven by management mandates rather than bottom-up worker demand. While the early web scaled toward profitability, AI exhibits poor unit economics where costs increase with usage and model iteration. This top-down pressure risks creating "reverse-centaur" workflows that prioritize capital-driven throughput over labor-led quality and worker agency.

Research

Prompt Politeness Affects LLM Accuracy

A study evaluating ChatGPT 4o across 250 prompts found that impolite tones consistently outperformed polite ones, with "Very Rude" prompts achieving 84.8% accuracy compared to 80.8% for "Very Polite" prompts. These results contradict earlier research and suggest that newer LLMs exhibit distinct performance shifts based on tonal pragmatics.

Advancing Mathematics Research with AI-Driven Formal Proof Search

Researchers evaluated LLM-driven formal proof generation in Lean to solve open mathematical problems, successfully resolving 9 Erdős problems and 44 OEIS conjectures. The study demonstrates that autonomous agents combining LLM generation with automated verification can advance research in fields like combinatorics and algebraic geometry. Findings highlight the impact of agent architecture on the cost-efficiency of formal proof search for complex problems.

FML-Bench: A Controlled Study of AI Research Agent Strategies

FML-Bench is a benchmark of 18 ML research tasks designed to isolate agent strategy from execution infrastructure using 12 process-level metrics. Evaluation of various agent architectures shows that strategy complexity does not guarantee performance; greedy hill-climbing matches tree-search in dense opportunity spaces, while tree-search excels in sparse ones. An adaptive agent that switches exploration modes based on improvement stagnation outperforms static strategies, with results indicating that early convergence and focused exploration are the primary drivers of final performance.

A sleep-like consolidation mechanism for LLMs

This research introduces a sleep-like consolidation mechanism that converts recent context into persistent fast weights within SSM blocks via $N$ offline recurrent passes. By clearing the KV cache and shifting computation to these "sleep" periods, the model maintains inference latency while outperforming standard Transformers and SSM-attention hybrids on long-horizon and math reasoning tasks. Performance scales with sleep duration $N$, particularly for tasks requiring deep reasoning.

Barriers to Complexity-Theoretic Proofs That "AGI" Using ML Is Impossible

This critique of van Rooij et al. (2024) disputes the claim that human-like intelligence is computationally intractable to learn from data, citing unjustified assumptions regarding input-output distributions. The authors argue that any such proof must formally define "human-like" intelligence and account for the specific inductive biases of ML systems, which the original analysis fails to do.

Code

I built a tool to auto-accept AI slop and bigtech devs loves it

Vision Clicker is a local macOS menu bar app designed to enable autonomous AI agent workflows by automatically clicking approval buttons like "Run," "Fetch," or "Retry." It utilizes on-device Apple Vision OCR to monitor user-defined screen regions and perform synthetic mouse clicks, supporting multi-monitor setups and automated tab switching in Cursor. The tool operates entirely on-device for privacy, requiring only macOS Accessibility and Screen Recording permissions without the need for external API keys.

theta: a humble approach to harness agnostic configuration

Theta is a Rust CLI for managing agent configurations defined by theta-spec, functioning as a package manager for agent resources like rules, tools, skills, and subagents. It enables users to resolve, lock, and materialize configurations, with the ability to "cast" them to and from supported harnesses including Claude Code, GitHub Copilot, and Cursor. The tool supports MCP tools and GitHub-hosted skills, drawing architectural inspiration from the uv package manager for its lifecycle and locking mechanisms.

Agile V: Turning AI Agents into Verifiable Engineering Systems

Agile V™ is a framework designed to transform LLM agents into verifiable engineering systems by enforcing formal traceability, independent verification, and hardware awareness. It provides a library of specialized skills for requirements architecture, red-teaming, and compliance auditing, supporting multi-cycle development loops and optimized context engineering. The system integrates with tools like Cursor and Claude Code to automate ISO-aligned documentation and ensure production-ready code through mandatory human gates and ambiguity-triggered halts.

An LLM translator whose source is a single prompt

This project provides a single prompt to generate a neobrutalist, single-file HTML translator app using Vue 3 and Tailwind CSS. It features OpenAI-compatible API integration with streaming output, customizable translation templates using placeholders, and JSON-based configuration management for templates and API settings.

I open-sourced two AI agents with real memory (chat and voice, MIT)

SynapCores-agent is a framework-free AI agent that utilizes SynapCores as a unified backend for long-term memory, RAG, semantic tool routing, and grounded generation. By exposing these capabilities through a single SQL/Cypher interface over HTTP, it replaces complex LLM stacks and orchestration frameworks with a lightweight, dependency-free Python loop. The system supports MCP for integration with tools like Claude Code and can operate entirely locally using bundled models or via external LLM providers.

    Uber exhausts its 2026 AI budget in four months, research finds that rude prompts improve LLM accuracy, and Vision Clicker automates the approval of agent-generated actions.