Monday — December 1, 2025
An AI proves a long-standing Erdos problem, a new tool lets one LLM consult another when stuck, and a benchmark shows LLMs score only 40% on practical robotics.
News
Don't push AI down our throats
The author argues that the rapid, forced integration of AI into products is driven by financial pressure to justify massive GPU and infrastructure investments, not by user utility. Acknowledging the end of the hype cycle and known LLM limitations like hallucinations, the piece calls for a more organic adoption focused on proven, valuable use cases. It dismisses the need for AGI and asserts that users should not be compelled to adopt flawed technology to validate corporate spending.
AI just proved Erdos Problem #124
An AI theorem prover named Aristotle has reportedly solved a long-standing open number theory problem conjectured by Erdős et al. Working from a formal statement of the problem in Lean, the AI generated a surprisingly simple proof in 6 hours, which was then verified in one minute. This achievement contrasts with literature searches using Gemini and ChatGPT, which failed to uncover any new information on the problem.
I Tested the M5 iPad Pro's Neural-Accelerated AI, and the Hype Is Real
Benchmarks using an M5-optimized version of MLX reveal that the new chip's local AI performance exceeds Apple's claims, primarily due to new Neural Accelerators in the GPU. The most significant improvement is in the prefill stage, drastically reducing TTFT for long prompts, with tests showing a 4.4x speedup over the M4 on a 10k token prompt using a Qwen3-8B model. While token generation speed saw only marginal gains, the accelerated prompt processing is highly beneficial for long-context applications like RAG.
AI rendering of Roman war scenes from Trajan's Column
This project uses an AI image generation model to create modern visualizations of the Dacian Wars based on the narrative reliefs of Trajan's Column. The model is prompted with images of the original carved plates and their corresponding textual descriptions to generate photorealistic "restorations" of each scene. This demonstrates an application of generative AI for historical and archaeological visualization.
Did Nvidia Just Prove There Is No AI Bubble
The article argues that Nvidia's strong earnings mask a fragile AI bubble built on an unsustainable financial structure. 90% of its revenue comes from data centers that are themselves deeply unprofitable, with depreciation costs exceeding revenue. These data centers' customers, AI operators like OpenAI, are even less profitable due to the "efficient compute frontier," where exponential costs yield only linear performance gains, while revenue stalls due to model limitations like hallucination. The author contends this ecosystem is propped up by circular financing and a massive debt market that is showing signs of instability, predicting that as debt financing dries up, the entire system will collapse.
Research
AI Eyes on the Road: Cross-Cultural Perspectives on Traffic Surveillance
A 3x3 factorial survey (N=720) investigated public perception of AI-powered road surveillance across China, Europe, and the US, comparing conventional, AI-enhanced, and AI-enhanced with public shaming modes. While conventional surveillance was universally preferred and public shaming was least preferred, Chinese respondents showed significantly higher acceptance of AI-enhanced systems than Europeans or Americans. The findings underscore that cultural context and social norms are critical factors influencing the acceptance of AI monitoring technologies.
Generative AI Compensates for Age-Related Cognitive Decline in Decision Making
A study using GPT-4o found that providing preference-aligned options to older adults significantly reduced their perceived difficulty in a decision-making task without impacting choice satisfaction. For older adults with lower cognitive function, who normally experience higher difficulty and lower satisfaction, AI use attenuated this negative correlation. The results suggest generative AI can compensate for age-related cognitive constraints by offloading the information search component of decision-making.
Butter-Bench: Evaluating LLM Controlled Robots for Practical Intelligence
Butter-Bench is a new benchmark evaluating the practical intelligence of LLMs for high-level reasoning in robotic control, isolating them from low-level VLA models. On this benchmark, the best LLMs score only 40%, significantly underperforming the mean human score of 95%. LLMs struggled most with multi-step spatial planning and social understanding, and fine-tuning for embodied reasoning did not improve their performance.
Program-of-Thought Prompting Outperforms Chain-of-Thought by 15% (2022)
Program of Thoughts (PoT) is a prompting technique that improves upon CoT for complex numerical reasoning by disentangling reasoning from computation. It uses an LLM to generate a program representing the reasoning steps, which is then offloaded to an external interpreter for execution. Across multiple math and financial QA datasets, PoT demonstrates an average performance gain of around 12% over CoT. Combining PoT with self-consistency decoding achieves SOTA performance on math datasets and near-SOTA on financial datasets.
Training Foundation Models on a Full-Stack AMD Platform
This paper details the first large-scale MoE pretraining study on a pure AMD stack, using MI300X GPUs with Pollara interconnect. It provides comprehensive system characterization, including microbenchmarks for network collectives and MI300X kernels, along with hardware-aware transformer sizing rules. The resulting 8.3B parameter MoE model, ZAYA1, shows competitive performance against models like Llama-3-8B and Qwen3-4B, demonstrating the AMD stack's maturity for large-scale pretraining.
Code
Local AI: 152 Open-Source Tools for 100% Offline LLMs (2025–2026)
An error occurred because the README file could not be retrieved.
Show HN: Let Claude Code call other LLMs when it runs in circles
Consult LLM MCP is a server that allows a primary LLM to escalate complex problems by consulting more powerful models like GPT-5.1 Codex or Gemini 2.5 Pro. It exposes a single consult_llm tool that accepts a prompt along with file and git diff context. The tool features multiple operational modes: direct API calls, shelling out to local CLIs to leverage free quotas, and a web mode that copies formatted prompts to the clipboard for browser-based LLMs.
SafeShare – strip tracking parameters from links before you share them
SafeShare is a client-side PWA and bookmarklet for cleaning URLs. It removes tracking parameters and resolves redirects entirely within the browser, ensuring privacy with no server-side processing or logs. A pro version is available, offering features like bulk cleaning and team whitelists.
Using Petri nets as a formal language for LLM-assisted development
go-pflow is a Go library for modeling systems with Petri nets and simulating them as ODEs using mass-action kinetics. It features a process mining pipeline to discover models from event logs and a "Neural ODE-ish" approach to fit learnable transition rates to data, using the net structure as a structural prior. The library supports real-time predictive monitoring for applications like SLA violation detection and includes examples for game AI and constraint satisfaction.
Collection of best papers from top AI conferences
This repository aggregates award-winning papers from top-tier AI, ML, and computer vision conferences. It includes best paper, honorable mention, and test-of-time awards from major venues like CVPR, NeurIPS, ICLR, and ICML, covering recent years. The collection serves as a curated list of significant contributions and landmark research in the field.