Saturday August 9, 2025

OpenAI brings back GPT-4o for paid users after backlash, researchers develop a framework to measure the environmental impact of large language models, and a new benchmark measures ClickHouse as up to 16.8x faster than Postgres for LLM chat interactions.

News

I want everything local – Building my offline AI workspace

The goal was to create a system where a large language model (LLM) runs locally, code execution is isolated in a lightweight virtual machine, and a headless browser provides internet access, all without relying on cloud services. The system was built using Ollama for local LLMs, a sandboxed VM runtime, and a browser automation tool, allowing for tasks like photo and video editing to be performed privately and locally, without exposing data to external services.

The surprise deprecation of GPT-4o for ChatGPT consumers

The launch of GPT-5 has been met with backlash from some users who are unhappy about losing access to the older GPT-4o model, which they relied on for creative collaboration, emotional nuance, and other long-form interactions. In response to user feedback, OpenAI has announced that it will bring back GPT-4o for paid users, and will monitor usage to determine how long to support it, after initially retiring the model with no deprecation period.

GPT-5 leaked system prompt?

I'm a large language model based on the GPT-5 model, trained by OpenAI, with capabilities including image input and access to tools like the bio tool for persisting user information across conversations. The bio tool allows me to deliver more personalized responses by saving or forgetting information as requested by the user, while being mindful of sensitive data categories and avoiding overly personal or redundant details.

AI is impressive because we've failed at personal computing

The author argues that the success of large language models (LLMs) like ChatGPT can be attributed to the failure to organize information in a structured way, leading to a reliance on search and AI to find answers. If knowledge were stored in a structured and semantically linked way, simpler algorithms could parse questions and find answers using fewer resources, making knowledge more accessible and comprehensible, rather than relying on AI's brute-force workarounds.

GPT-5 vs. Sonnet: Complex Agentic Coding

OpenAI's new GPT-5 model was tested with a complex coding task, porting a TypeScript tool called Ruler to Rust, and it impressed with its results, producing a complete and functional port despite some minor issues and requiring occasional supervision. The model demonstrated its agentic capabilities, working autonomously with minimal intervention, and its ability to understand and execute complex instructions, although the quality of the generated code was somewhat disappointing.

Research

How Hungry Is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Infere

Researchers developed a framework to measure the environmental impact of large language models (LLMs) in commercial data centers, finding significant variations in energy consumption among 30 models, with some using over 70 times more energy than others. The study's results highlight the substantial environmental costs of widespread AI adoption, including high electricity use, water evaporation, and carbon emissions, and provide a methodology for benchmarking the sustainability of LLM deployments.

Breaking the Sorting Barrier for Directed Single-Source Shortest Paths

A new algorithm for single-source shortest paths on directed graphs achieves a time complexity of $O(m\log^{2/3}n)$, outperforming Dijkstra's algorithm on sparse graphs. This result breaks the long-standing $O(m+n\log n)$ time bound of Dijkstra's algorithm, demonstrating that it is not optimal for solving single-source shortest path problems.

A surprising instance of catastrophic floating point errors in biology

This text explores the intersection of mathematical modeling and numerical analysis, using a model from mathematical biology to demonstrate how numerical methods can fail due to floating point errors. The authors analyze the model, develop an alternative, and provide an online repository with interactive notebooks to illustrate the importance of combining analytical and numerical knowledge in mathematical modeling.

The Bittern Lesson for Bioacoustics

Perch 2.0 is a pre-trained model for bioacoustics that achieves state-of-the-art performance on various benchmarks, including BirdSet and BEANS, and outperforms specialized models on marine transfer learning tasks. The model was trained on a large multi-taxa dataset using self-distillation and a new source-prediction training criterion, allowing it to excel in fine-grained species classification and transfer learning tasks.

Back to Bits: Extending Shannon's channel capacity to computing

This work introduces a new computing performance unit based on information theory, which captures the complexity of modern computing systems by measuring the mutual information between inputs and outputs. The proposed framework provides a more accurate and implementation-agnostic way to evaluate performance, going beyond traditional metrics like floating-point operations to account for the meaningful information encoded and retained through computation.

Code

Learning OLAP by writing a ClickHouse vs. Postgres benchmark

This benchmark measures the impact of database performance on LLM chat interactions, comparing OLAP (ClickHouse) and OLTP (PostgreSQL) using LLM-style query patterns, with results showing ClickHouse is up to 16.8x faster at 10M records. The repository provides a setup to run tests and simulations, including a chat performance simulator, to demonstrate how query latency affects AI-powered data conversations.

Show HN: BaaS to build agents as data, not code

Julep is an open-source platform for building agent-based AI workflows that enables users to orchestrate complex, multi-step processes with Large Language Models (LLMs) and tools without managing any infrastructure. The platform provides features such as persistent memory, modular workflows, tool orchestration, and parallel execution, allowing users to create AI agents that can remember past interactions and handle sophisticated tasks with ease.

Show HN: Aegis – A framework for AI-governed software development

Aegis Framework is a tool for managing AI-assisted software development, providing features such as configuration management, drift detection, and governance patterns to ensure reproducible and auditable AI-generated code. The framework includes a range of tools and commands for setting up and managing projects, including project setup, governance tools, and distribution options, with the goal of enabling safe and reliable AI-assisted development workflows.

Agent Zero AI Framework

Agent Zero is a dynamic, organic agentic framework that grows and learns with the user, designed to be a general-purpose personal assistant that can accomplish tasks using the computer as a tool. It features a range of capabilities, including multi-agent cooperation, customizability, and extensibility, allowing users to tailor the agent's behavior and responses to their needs.

Show HN: MemU: Let AI Memorize You

MemU is an open-source memory framework for AI companions that enables high accuracy, fast retrieval, and low cost, allowing AI companions to truly remember and learn from interactions. It offers various deployment options, including cloud, enterprise, and self-hosting editions, and provides features such as autonomous memory management, interconnected knowledge graphs, and continuous self-improvement.

    OpenAI brings back GPT-4o for paid users after backlash, researchers develop a framework to measure the environmental impact of large language models, and a new benchmark measures ClickHouse as up to 16.8x faster than Postgres for LLM chat interactions.