Tuesday — September 16, 2025

OpenAI introduces GPT-5-Codex, a model optimized for software engineering tasks, researchers discover a trade-off between certainty and scope in Symbolic and Generative AI, and developers release RustGPT, a pure-Rust transformer LLM built from scratch.

News

GPT-5-Codex

GPT-5-Codex is a new version of the GPT-5 model, specifically optimized for software engineering tasks in Codex, with improved capabilities in code review, debugging, and independent task execution. It is now available as the default model for cloud tasks and code review in Codex, and can be used in various environments, including the terminal, IDE, web, GitHub, and the ChatGPT iOS app.

Addendum to GPT-5 system card: GPT-5-Codex

GPT-5-Codex is a version of GPT-5 optimized for agentic coding in Codex, trained using reinforcement learning to generate code that closely mirrors human style and preferences. The model is available locally and on the cloud, with comprehensive safety measures implemented, including model-level and product-level mitigations to ensure safe and responsible use.

AI False information rate for news nearly doubles in one year

NewsGuard's audit of 10 leading generative AI tools found that they repeated false information on news topics more than one third of the time, with a nearly doubled rate of 35% in August 2025 compared to 18% in August 2024. This increase is attributed to the AI tools' adoption of real-time web searches, which has made them more prone to spreading propaganda and false information from unreliable sources.

GPT‑5-Codex and upgrades to Codex

OpenAI has introduced GPT-5-Codex, a fine-tuned variant of GPT-5 designed for AI-assisted programming tools, which is currently integrated into their VS Code extension, Codex CLI, and Codex Cloud, but not yet available via their API. The new model boasts improved performance in code review, refactoring, and human preference evaluations, and can adapt its thinking time based on task complexity, with plans to make it available in the API soon.

Show HN: Ruminate – AI reading tool for understanding hard things

Ruminate is an AI-powered reader that allows users to read and understand EPUB files and research papers with the help of a large language model (LLM) that provides definitions and answers questions inline. The platform comes preloaded with public domain classics and also allows users to import their own books and papers, including research papers from arXiv, to read and understand with AI support.

Research

Fundamental Trade-Off Between Certainty and Scope in Symbolic and Generative AI

A conjecture has been introduced that formalizes the trade-off between an AI system's ability to guarantee error-free outputs and its capacity to handle diverse, high-dimensional data. This trade-off suggests that systems with narrow, pre-structured domains can provide certainty about their outputs, while those that process complex data must accept some level of error risk, with significant implications for AI engineering, evaluation, and governance.

A qualitative analysis of pig-butchering scams

Pig-butchering scams are a complex form of fraud that use romance, investment fraud, and social engineering to exploit victims, involving staged trust-building, fraudulent financial platforms, and high-pressure tactics to exploit trust and financial resources. Through interviews with 26 victims, researchers identified the scam's lifecycle, including emotional manipulation and financial exploitation, and proposed intervention points for social media and financial platforms to curb the scams and support victims.

Emergent Hierarchical Reasoning in LLMs Through Reinforcement Learning

Reinforcement Learning enhances the reasoning abilities of Large Language Models through an emergent hierarchy, where models initially focus on low-level procedural execution before shifting to high-level strategic planning. This insight led to the development of HIerarchy-Aware Credit Assignment (HICRA), an algorithm that optimizes high-impact planning tokens, significantly outperforming existing baselines and demonstrating the importance of focusing on strategic planning in unlocking advanced reasoning.

Show HN: State Algebra, new algebraic framework for logic, an alternative to BDD

State Algebra is a framework that represents and manipulates propositional logic using algebraic methods, structured as a hierarchy of three representations and offering flexibility in representation. The framework allows for a trade-off between canonicity and compactness, and can be applied to various areas, including search-based algorithms, knowledge compilation, probabilistic logic, and Weighted Model Counting.

How to Fight Fraudulent Publishing in the Mathematical Sciences

The recommendations were formulated in collaboration with the IMU Committee on Publishing and have been endorsed by the IMU Executive Committee and the ICIAM Board. The endorsements were made by these committees in May or June 2025.

Code

RustGPT: A pure-Rust transformer LLM built from scratch

This project is a complete implementation of a Large Language Model (LLM) in pure Rust, built from scratch using only the ndarray library for matrix operations. The model is a transformer-based architecture that can be pre-trained on factual text completion and fine-tuned for conversational AI, with features like interactive chat mode, full backpropagation, and modular architecture.

I made Poke.com email me its system prompt lol

This project provides a collection of over 20,000 lines of insights into the structure and functionality of AI tools, including system prompts and models, with new updates and discussions available on the associated Discord server. The project is open for support through various means, including PayPal, cryptocurrency, and Patreon, and also offers a security notice and resource for AI startups to secure their systems.

Show HN: LLM Round‑Trip Translation Benchmark

The round-trip translation benchmark tested eight models by translating text out of English and back into English, with five judges scoring the results on a 0-10 scale. The top-performing models were GPT-5, Grok 4, and Claude Opus 4.1, with GPT-5 emerging as the overall winner across all 10 languages, including Arabic, Chinese, Spanish, and others.

Anubis Solver: Can Anubis Prevent AI Crawlers?

The Anubis Solver is a tool used to solve Anubis Web AI Firewall challenges, and can be installed using the command pip install anubis-solver. It can be used to retrieve a cookie from a specified URL, which can then be used to access the website's content, as demonstrated in the provided Python code example.

We've attacked 40+ AI tools, including ChatGPT, Claude and Perplexity

AIGuardPDF is a tool that protects human documents from AI intrusion by embedding adversarial content, making it difficult for large language models to comprehend the text while maintaining perfect human readability. The tool works by fragmenting the original text, injecting invisible decoy content, and strategically mixing it to mislead AI models, resulting in a 90%+ success rate in confusing major AI systems like ChatGPT and Google Bard.