Friday August 8, 2025

OpenAI's GPT-5 model achieves state-of-the-art results on coding benchmarks, researchers discover declining medical safety messaging in generative AI models, and Octofriend, a coding agent, allows users to switch between GPT-5 and Claude models mid-conversation.

News

GPT-5

ChatGPT is now powered by GPT-5, its smartest and fastest model yet, providing expert-level intelligence across various subjects, including math, science, finance, and law. The updated model offers new features such as personalized study mode, voice improvements, and the ability to connect with Gmail and Google Calendar, making it more useful and accessible to everyone, from individuals to businesses.

GPT-5: Key characteristics, pricing and system card

The author has been using the new GPT-5 model family for two weeks and finds it to be their new favorite model, exuding competence and rarely messing up, making it a sensible default for most tasks. The GPT-5 models are priced aggressively competitively, with costs ranging from $0.05/million input tokens for the Nano model to $1.25/million input tokens for the full GPT-5 model, making them a competitive option compared to other providers.

GPT-5 for Developers

OpenAI has released GPT-5, its most advanced model yet for coding and agentic tasks, which has achieved state-of-the-art results on key coding benchmarks, including scoring 74.9% on SWE-bench Verified and 88% on Aider polyglot. GPT-5 excels at producing high-quality code, handling tasks such as fixing bugs and editing code, and is designed to be a collaborative and steerable model that can follow detailed instructions and provide explanations of its actions.

OpenAI's new open-source model is basically Phi-5

OpenAI has released its first open-source large language models, gpt-oss-120b and gpt-oss-20b, which perform well on certain benchmarks but lack out-of-domain knowledge and may not be as effective in real-world tasks. The models are believed to have been trained on synthetic data, similar to Microsoft's Phi-series models, which prioritizes safety over real-world performance, allowing OpenAI to release a model that is less likely to be misused or cause controversy.

Zero-day flaws in authentication, identity, authorization in HashiCorp Vault

Here is a 2-sentence summary of the text: Researchers at Cyata discovered 9 previously unknown zero-day vulnerabilities in HashiCorp Vault, a widely used secrets management tool, which could allow attackers to bypass security measures, escalate privileges, and even execute remote code. The vulnerabilities, which were found through a deep manual review of the source code, highlight the potential risks of relying on a single tool to manage sensitive secrets and underscore the importance of thorough security testing and responsible disclosure.

Research

Analysis of Declining Medical Safety Messaging in Generative AI Models

The presence of medical disclaimers in outputs from large language models (LLMs) and vision-language models (VLMs) has significantly decreased from 2022 to 2025, dropping to less than 1% by 2025. This decline in disclaimers is concerning as these models become more capable and authoritative, highlighting the need for safeguards to remind users that AI outputs are not a substitute for professional medical advice.

A candidate giant planet imaged in the habitable zone of α Cen A

The James Webb Space Telescope's MIRI instrument was used to observe the star $\alpha$ Cen A, achieving high sensitivity to detect planets and exozodiacal dust emission, and setting a new limit on the latter. A point source, potentially a planet, was detected in August 2024, but not in subsequent observations, although its possible orbital motion could explain its non-detection, and if confirmed, it could be a 225 K, 1-1.1 $R_{\rm Jup}$ planet with a mass of 90-150 $M_{\rm Earth}$.

A surprising instance of catastrophic floating point errors in biology

This text explores the intersection of mathematical modeling and numerical analysis, using a model from mathematical biology to demonstrate how numerical methods can fail due to floating point errors. The authors analyze the model, develop an alternative, and provide an online repository with interactive notebooks to illustrate the importance of combining analytical and numerical knowledge in mathematical modeling.

The Bittern Lesson for Bioacoustics

Perch 2.0 is a pre-trained model for bioacoustics that achieves state-of-the-art performance on various benchmarks, including BirdSet and BEANS, after being trained on a large multi-taxa dataset. The model's robustness and ability to generalize to new tasks, including those with limited training data, make it a valuable tool for bioacoustics applications, including fine-grained species classification and transfer learning tasks.

3D Printing Radiance Fields

DreamPrinting is a novel pipeline that transforms volumetric rendering techniques into physically realizable 3D prints by integrating physical constraints such as pigment compatibility and material density. The pipeline achieves exceptional detail in reproducing semi-transparent structures and outperforms traditional surface-based methods, enabling the creation of complex, high-quality volumetric prints that closely mirror their digital origins.

Code

Show HN: Octofriend, a cute coding agent that can swap between GPT-5 and Claude

Octo is a coding assistant that works with OpenAI-compatible and Anthropic-compatible LLM APIs, allowing users to switch models mid-conversation and utilizing custom-trained models to handle tool call and code edit failures. To get started, users can install Octo globally using npm install --global octofriend and then run it using octofriend, with optional configurations and customizations available for advanced users.

Show HN: Browser AI agent platform designed for reliability

Notte is a web agent framework designed for speed, cost-efficiency, scale, and reliability, allowing users to build and deploy AI agents that interact seamlessly with the web. The framework provides a range of features, including structured output, stealth browser sessions, and digital personas, and can be used locally or through a hosted API service with premium features.

GPT-5 Coding Examples

This repository contains a collection of demo applications generated entirely by the GPT-5 model from single prompts, showcasing its strengths in coding and scaffolding websites, front-end applications, and interactive UIs. The demos can be explored by cloning the repository, running it locally, or visiting the hosted version, and users can also experiment with similar prompts using GPT-5 in their preferred coding environment or through ChatGPT.

Bitfrost – LLM gateway 90x faster than Litellm at p99

Bifrost is a high-performance AI gateway that connects to 10+ providers, including OpenAI and Anthropic, through a single API, offering automatic failover, load balancing, and zero-downtime deployments. It can be set up in under 30 seconds and adds only 11µs latency while handling 5,000+ requests per second, making it a reliable and efficient solution for building AI applications.

Show HN: Generate Fine-tunning dataset using deep research in terminal

Datalore is a terminal tool that generates structured datasets from real-world data using deep research, allowing users to describe the dataset they need and automatically searching the web to build context and output clean, usable data. To use Datalore, users must set up and run the project locally by installing dependencies, creating a virtual environment, and configuring environment variables, after which they can guide the dataset creation process step by step.

    OpenAI's GPT-5 model achieves state-of-the-art results on coding benchmarks, researchers discover declining medical safety messaging in generative AI models, and Octofriend, a coding agent, allows users to switch between GPT-5 and Claude models mid-conversation.