Sunday — November 9, 2025

A $1T tech stock sell-off reflects AI skepticism, a new tool serves hundreds of LLMs on a single GPU, and an evolutionary agent rediscovers mathematical formulas.

News

Study identifies weaknesses in how AI systems are evaluated

A new study from the Oxford Internet Institute reviewed 445 AI benchmarks and found many lack scientific rigor, leading to unreliable conclusions about LLM capabilities and safety. Key weaknesses identified include a lack of statistical methods in performance comparisons and the use of vague or contested definitions for abstract concepts like reasoning. The paper proposes eight recommendations to improve benchmark validity, drawing from fields like psychometrics, and provides a practical checklist for developers and regulators.

Cerebras Code now supports GLM 4.6 at 1000 tokens/sec

Cerebras offers API access to the GLM 4.6 coding LLM, achieving inference speeds of over 1,000 tokens/second. The model is a high-performing open alternative, ranked #1 for tool calling on the Berkeley Function Calling Leaderboard and comparable to Sonnet 4.5. The service integrates with AI code editors via an API key and is available in free and paid tiers based on daily token limits.

$1T in tech stocks sold off as market grows skeptical of AI

A recent $1 trillion tech stock sell-off reflects growing market skepticism towards AI. Investors are concerned about the massive capital expenditures on AI infrastructure by companies like Microsoft and Nvidia, as concrete profits and a clear ROI have yet to materialize. The downturn is driven by anxiety over the high operational costs of AI relative to its current profitability.

Firefox Forcing LLM Features

Mozilla is enabling local LLM and AI features in Firefox by default, leading to user complaints about high resource usage and the lack of a GUI-based opt-out. Disabling this functionality requires manually setting a comprehensive list of preferences in about:config or prefs.js. The author notes that simply toggling a few main flags is insufficient to remove all related UI elements and features.

How to declutter, quiet down, and take the AI out of Windows 11 25H2

This guide details how to debloat a fresh Windows 11 25H2 installation by removing ads, unwanted apps, and disabling telemetry. For users with new Copilot+ PCs, it provides specific instructions for fully removing the controversial Recall feature, an on-device AI that continuously screenshots user activity. The article also covers disabling other embedded AI functionalities in core apps and removing Copilot integrations from the OS and Edge browser.

Research

Making Democracy Work: Fixing and Simplifying Egalitarian Paxos

Egalitarian Paxos (EPaxos) is a leaderless replication protocol designed to overcome the single-point-of-failure and latency issues of leader-based systems. This paper introduces EPaxos*, a simpler and provably correct variant that addresses the complexity and bugs of the original. Its key contributions are a simplified failure-recovery algorithm and a generalization of the protocol to an optimal spectrum of failure thresholds.

Tidally Torn: Why the Most Common Stars May Lack Large, Habitable-Zone Moons

Using N-body simulations that model 3-body interactions and tidal forces, a study concludes that large, Luna-like moons are unstable around Earth-like planets in the HZ of M-dwarf stars. These exomoons become unstable on timescales of 10^7 to <10^9 years, depending on the star type. This finding suggests such moons are rare, which has significant implications for planetary habitability, the Drake equation, and the Fermi paradox.

Multi-objective optimization by quantum annealing

A study comparing quantum approaches for multi-objective optimization found that quantum annealing vastly outperforms QAOA run on an IBM gate-model processor. Using the same problems and methodology as a prior study, quantum annealing surpassed all previously analyzed classical and quantum methods. On the harder problem, it even improved upon the best known Pareto front, highlighting its significant potential for complex optimization tasks.

Mathematical exploration and discovery at scale – Terence Tao et al.

AlphaEvolve is an evolutionary coding agent that uses an LLM in an iterative framework to propose, test, and refine solutions for complex mathematical problems. When tested on a diverse set of problems, it rediscovered most known solutions and improved upon several, sometimes generalizing specific results into a universal formula. The system can be combined with other AI tools for automated proof generation, demonstrating that LLM-guided evolutionary search is a powerful tool for autonomous mathematical discovery.

Statistical Estimate of Occurrence of Extraterrestrial Intelligence in Milky Way

Researchers developed an empirical galactic simulation model to map the spatial-temporal distribution of potential ETI. The model, which incorporates factors like abiogenesis, evolutionary timescales, and self-annihilation, found the probability of self-annihilation (Pann) to be the most influential parameter. The simulation suggests ETI peaked 8 billion years ago in an annular region 4 kpc from the galactic center, and that most surviving intelligent life is likely young, making detection difficult.

Code

GPT-5-Codex-Mini – A more compact and cost-efficient version of GPT-5-Codex

Codex CLI is a local coding agent from OpenAI, installable via npm or Homebrew. It authenticates using ChatGPT subscription plans or an API key and supports features like slash commands, a sandbox environment, and automation via a GitHub Action and SDK. The tool is also configurable and supports the Model Context Protocol (MCP).

Show HN: Serve 100 Large AI models on a single GPU with low impact to TTFT

flashtensors is a high-performance inference engine designed to load LLMs from SSD to GPU VRAM up to 10x faster than standard loaders. This enables hosting and hot-swapping hundreds of models on a single GPU with cold starts under 5 seconds, even for 32B parameter models. The tool provides a CLI and an SDK with integration for backends like vllm, aiming to scale inference by usage rather than by the number of co-hosted models.

Show HN: DeepShot – NBA game predictor with 70% accuracy using ML and stats

DeepShot is an open-source NBA game predictor that uses an XGBoost model trained on historical data scraped from Basketball Reference. Its feature engineering relies on rolling statistics, particularly EWMA, to emphasize recent team performance and momentum. The project serves its predictions through a local web interface built with the NiceGUI framework.

Show HN: Allos – An open-source, LLM-agnostic agentic SDK for Python

Allos is an open-source, LLM-agnostic agentic SDK for building production-ready AI agents without vendor lock-in. It provides a unified interface to seamlessly switch between different providers like OpenAI, Anthropic, and local models using the same codebase. The framework includes a rich ecosystem of built-in tools for file operations and shell execution, along with features like automatic context management, session persistence, and fine-grained permissions.

AI powered stocks CLI tool

A command-line tool for stock portfolio tracking that fetches real-time data using the Alpha Vantage API. It leverages Groq's llama-3.3-70b-versatile LLM to generate AI-powered investment analysis. The tool also supports sending HTML reports via email and includes Docker support for containerized deployment and scheduled execution.