Monday June 30, 2025

Gartner forecasts a steep cancellation rate for overpriced AI projects, researchers propose a reward-guided tree search to enhance LLM reasoning, and Octelium emerges as a free, open-source alternative for zero-trust resource access.

News

AI slop security reports submitted to curl

The curl bug bounty program on Hackerone has received 17 reports of security vulnerabilities, including buffer overflows, format string vulnerabilities, and memory leaks, which could potentially allow for remote code execution, sensitive information access, and other exploits. These reports have been submitted by various researchers and are publicly available, with the curl project maintaining a policy of instantly banning reporters who submit low-quality or AI-generated reports.

Generative AI's failure to induce robust models of the world

The author discusses the limitations of Large Language Models (LLMs) in building and maintaining adequate, interpretable, and dynamically updated models of the world, which is a fundamental aspect of human and animal cognition. This limitation is exemplified by the author's conversation with chess legend Garry Kasparov and the observation that even in rule-based games like chess, LLMs can be led astray and fail to understand the world in a way that is stable and trustworthy.

The AI Backlash Keeps Growing Stronger

Duolingo's decision to become an "AI-first" company, replacing contractors with automation, sparked outrage on social media, with many young users expressing anger and deleting the app. This backlash is part of a larger trend of growing animosity towards AI, with concerns about job displacement, environmental damage, mental health impacts, and copyright violations contributing to a shift in public perception, from initial awe to widespread criticism.

Show HN: A tool to benchmark LLM APIs (OpenAI, Claude, local/self-hosted)

The provided text appears to be a log or output from an AI model testing platform, showing test results and statistics for various models and prompts. The results include metrics such as first token response time, output speed, and success rate, but most of the data is currently empty or showing zeros, indicating that the testing is either incomplete or has just begun.

AI agents get office tasks wrong around 70% of time, and many aren't AI at all

Gartner predicts that over 40% of agentic AI projects will be cancelled by 2027 due to rising costs, unclear business value, or insufficient risk controls, with many vendors exaggerating their products' capabilities through "agent washing". Researchers at Carnegie Mellon University have developed a benchmark to test AI agents' performance on common workplace tasks, finding that even the best-performing models can only complete around 30% of tasks autonomously.

Research

Enhancing LLM Reasoning with Reward-Guided Tree Search

Test-time scaling has shown promise in improving the accuracy of large language models by allocating more computational resources during inference, but developing effective reasoning approaches remains a challenge. This paper presents a framework called STILL-1, which uses reward-guided tree search algorithms to enhance the reasoning abilities of large language models, and demonstrates its effectiveness on mathematical reasoning tasks across four challenging datasets.

Evaluating World Models with LLM for Decision Making

World models, particularly those leveraging Large Language Models (LLMs), are crucial in decision making, with models like MuZero and Dreamer achieving success in complex tasks. This work evaluates the effectiveness of advanced LLMs, such as GPT-4o and GPT-4o-mini, as world models in decision making across various environments and tasks, revealing key observations about their performance and limitations.

Uncovering and addressing the secret water footprint of AI models

The water footprint of artificial intelligence, including the massive amounts of water used to train models like GPT-3, has gone largely unnoticed despite its significant impact, with global AI demand projected to account for 4.2-6.6 billion cubic meters of water withdrawal by 2027. To address this issue, it is essential for AI to take responsibility for its water footprint and work towards sustainability, which requires a holistic approach that considers both water and carbon footprints.

Universal pre-training by iterated random computation

Researchers investigated using randomly generated data to pre-train a model, providing theoretical justification and empirical evidence that this approach can be effective. The results show that pre-training with synthetic data can lead to improved performance, including zero-shot learning and faster convergence, especially when fine-tuned with real-world data.

WorldVLA: Towards Autoregressive Action World Model

WorldVLA is a unified framework that combines action and image understanding and generation, using a world model to predict future images and an action model to generate subsequent actions based on image observations. The model outperforms standalone action and world models, and an attention mask strategy is proposed to address the issue of error propagation when generating sequences of actions in an autoregressive manner.

Code

Show HN: Octelium – FOSS Alternative to Teleport, Cloudflare, Tailscale, Ngrok

Octelium is a free and open-source, self-hosted platform that provides a unified solution for zero-trust resource access, offering a modern alternative to remote access VPNs and similar tools. It supports various use cases, including remote access VPN, zero-trust network access, self-hosted infrastructure for secure tunnels, API gateway, AI gateway, and more, with features like dynamic secret-less access, identity-based access control, and scalability.

AI-SDK-cpp: Modern C++ AI SDK

The AI SDK CPP is a modern C++ toolkit that enables developers to build AI-powered applications with popular model providers like OpenAI and Anthropic, providing a unified and easy-to-use API. It supports various features such as text generation, streaming, multi-turn conversations, error handling, tool calling, and async tool execution, with plans to add additional providers, embeddings, and image generation support in the future.

Show HN: Superclass – Classify Files, PDF, Images, Docx etc. with GPT

Superclass is a document analysis tool that combines advanced text extraction with AI-powered classification, supporting multiple document formats and providing both CLI and HTTP server interfaces. It offers various features, including category detection, content summarization, keyword extraction, and sentiment analysis, with support for multiple AI providers such as OpenAI, Anthropic, and Azure OpenAI.

Show HN: I built an AI chatbot that learns from your website to answer questions

This tutorial shows how to build a "set and forget" AI chatbot that learns directly from a live website, keeping its knowledge up-to-date automatically without manual updates. The chatbot uses an intelligent agent-based architecture to crawl and extract content, make decisions, and generate comprehensive answers, and can be hosted on a web server and embedded into a website using provided JavaScript code.

Show HN: AGI-SaaS v1.0.0 – Modular Python RAG Framework for LLM Pipelines

AGI-SaaS est un projet d'intelligence artificielle générale modulaire conçu pour créer des bots intelligents, des API IA et des produits SaaS, mettant l'accent sur l'évolution cognitive via des plugins extensibles et une architecture flexible. Le code est librement consultable et modifiable pour des fins personnelles, mais toute utilisation commerciale est soumise à une clause de royalties de 0,8% et nécessite une autorisation.