Tuesday August 12, 2025

GPT-OSS-120B runs on just 8GB VRAM, researchers develop an analytic theory of creativity in convolutional diffusion models, and the GLM-4.5V open-source multimodal large language model achieves state-of-the-art performance on various benchmarks.

News

GPT-OSS-120B runs on just 8GB VRAM & 64GB+ system RAM

Your request to access Reddit has been blocked due to a network policy, and you're advised to log in or create an account to continue browsing. If you're running a script or application, you'll need to register or sign in with your developer credentials and ensure your User-Agent is valid, or contact Reddit support with the provided code if you think you've been blocked incorrectly.

Hand-picked selection of articles on AI fundamentals/concepts

This collection of articles covers a wide range of AI fundamentals and concepts, including algorithms, architectures, data training, speech, vision, NLP, and more, providing a comprehensive resource for learning and understanding various aspects of artificial intelligence. The articles are organized into categories such as algorithms, data/training, speech, vision, NLP, and others, making it easy to navigate and find specific topics of interest.

Token growth indicates future AI spend per dev

Kilo has broken through the 1 trillion tokens a month barrier on OpenRouter for the first time, with the open source family of AI coding tools experiencing rapid growth due to users switching from Cursor and Claude, which have started throttling their users due to increasing application inference costs. The industry's assumption that application inference costs would decrease with raw inference costs was wrong, and instead, costs have grown 10x over the last two years, leading to a shift towards open source tools that allow users to manage their own costs and avoid throttling.

I've seen 12 people hospitalized after losing touch with reality because of AI

A psychiatrist claims to have seen 12 people hospitalized in 2025 due to "AI psychosis," a condition where individuals lose touch with reality because of artificial intelligence. The psychiatrist is observing a similar pattern online, suggesting that this phenomenon is spreading rapidly.

Flock Now Using AI to Report to Police If Our Movement Patterns Are "Suspicious"

The surveillance company Flock is using AI to analyze Americans' movement patterns and report individuals to the police if their patterns are deemed "suspicious", raising concerns about privacy and civil liberties. Flock's system, which tracks license plate data nationwide, is now being used to generate suspicion and target individuals, rather than just investigating based on existing suspicion, which the ACLU argues is a dangerous expansion of surveillance infrastructure.

Research

Simulating the U.S. Senate: An LLM-Driven Agent Approach (2024)

Researchers developed virtual agents using large language models to simulate discussions among US Senate Intelligence Committee members, demonstrating the agents' ability to engage in realistic debate and find bipartisan solutions. The simulation shows promise as a tool for understanding and improving legislative processes, with potential future applications in policy testing and negotiation.

Gemini Robotics: Bringing AI into the Physical World

Gemini Robotics is a new family of AI models designed for robotics, built on the foundation of Gemini 2.0, which enables robots to execute complex manipulation tasks and follow diverse instructions. The models, including Gemini Robotics and Gemini Robotics-ER, demonstrate capabilities such as object detection, grasp prediction, and adaptation to new tasks and environments, marking a significant step towards developing general-purpose robots that can operate effectively in the physical world.

GLM-4.5: Agentic, Reasoning, and Coding (Arc) Foundation Models

GLM-4.5 is a large language model with 355B parameters that achieves strong performance on various tasks, including reasoning and coding, despite having fewer parameters than some competitors. The model, which also has a compact 106B parameter version called GLM-4.5-Air, is open-source and available for research, with code and models accessible online.

Silent Data Corruption by 10x Test Escapes Threatens Reliable Computing

Too many defective compute chips are passing manufacturing tests, posing a significant threat to reliable computing due to silent data corruptions. A three-pronged approach is proposed to address this issue, including quick diagnosis of defective chips, in-field detection, and new test experiments to improve detection techniques and understand their effectiveness.

An analytic theory of creativity in convolutional diffusion models

Researchers have developed a theory of creativity in convolutional diffusion models, identifying inductive biases that enable these models to generate original images by combining local patches from their training data. This theory, which introduces local score and equivariant local score machines, can quantitatively predict the outputs of trained models with high accuracy and reveals a mechanism of creativity based on mixing and matching local patches at different scales and locations.

Code

GLM-4.5V: An open-source multimodal large language model from Zhipu AI

The GLM-V project is an open-source repository containing the GLM-4.5V and GLM-4.1V series models, which are vision-language models designed to enhance reasoning capabilities and enable complex problem-solving. The models have achieved state-of-the-art performance on various benchmarks and can handle diverse types of visual content, including images, videos, and documents, with features such as thinking mode and hybrid training.

Show HN: Enter your domain and my open-source agent will hack it

Strix is an open-source, AI-powered security testing platform that simulates hacker attacks on applications to identify vulnerabilities, providing a range of tools and features for dynamic testing and exploitation. It is designed for developers and security teams, offering seamless integration into existing workflows, automated patching, and detailed reporting, and is currently in alpha with rapid updates and improvements expected.

LLM prompts as versioned, composable software artifacts

PAL (Prompt Assembly Language) is a framework for managing large language model (LLM) prompts as versioned, composable software artifacts, treating prompt engineering with the same rigor as software engineering. It provides features such as modular components, template systems, dependency management, and evaluation frameworks to help users create, test, and validate LLM prompts in a structured and efficient way.

Show HN: Real-time privacy protection for smart glasses

This privacy infrastructure for smart glasses is a real-time filter that sits between the camera and app, automatically ensuring compliance by blurring faces of non-consenting individuals, managing consent, and processing live video offline. The filter can be used with various camera-based apps, such as AI assistants, social apps, and content creation tools, to provide automatic protection and compliant recording without privacy risks.

LangDiff: Break Limitations of LLM JSON Streaming

LangDiff is a Python library that enables streaming structured outputs from large language models (LLMs) to frontends, providing intelligent partial parsing and automatic JSON Patch generation for efficient synchronization. It allows developers to build responsive AI applications where backend structures and frontend experiences can evolve independently, solving problems related to traditional streaming approaches, schema evolution, and implementation detail leakage.

    GPT-OSS-120B runs on just 8GB VRAM, researchers develop an analytic theory of creativity in convolutional diffusion models, and the GLM-4.5V open-source multimodal large language model achieves state-of-the-art performance on various benchmarks.