Saturday — May 24, 2025
Google faces backlash over AI Mode's use of content, while Microsoft integrates AI features into Notepad, and the Ascend Institute pushes boundaries with the autonomous and evolving AI system, GremlinGPT.
News
Positional preferences, order effects, prompt sensitivity undermine AI judgments
Large Language Models (LLMs) are being used to make decisions in sensitive areas, but their judgments are undermined by positional preferences, order effects, and prompt sensitivity, which can lead to biases and unreliable results. Research has shown that LLMs exhibit vulnerabilities and cognitive biases similar to humans, such as serial position, framing, and anchoring, and that even small changes in prompt phrasing or labeling can significantly alter their decisions, highlighting the need for careful consideration of these factors when using LLMs for decision-making.
KumoRFM: A Foundation Model for In-Context Learning on Relational Data
KumoRFM is a Relational Foundation Model that can make accurate predictions over relational databases across various predictive tasks without requiring task-specific training, outperforming conventional approaches by 2-8% on average. The model uses in-context learning, a novel Relational Graph Transformer, and a table-invariant encoding scheme to reason within arbitrary multi-modal data across tables, and can be fine-tuned for specific tasks to improve performance by 10-30% on average.
Show HN: I built a more productive way to manage AI chats
ContextChat is a platform that allows users to easily set up multiple projects and ingest content from various sources, including web, files, and GitHub, to create a unified context for AI-powered conversations. The platform offers a range of features, including a context builder, multiple chats per project, and a pay-as-you-go credit system, making it a flexible and cost-effective alternative to traditional AI chat tools.
Google's AI Mode is 'the definition of theft,' publishers say
Publishers are criticizing Google's new AI Mode, calling it "the definition of theft" as it uses their content without providing traffic or revenue in return. Google had considered allowing publishers to opt out of the AI tools, but ultimately decided against it, leaving publishers with no choice but to opt out of Search entirely if they want to avoid their content being used.
Microsoft dumps AI into Notepad as 'Copilot all the things' mania takes hold
Microsoft has updated its Notepad app with AI-powered features, including a "Write" feature that uses Copilot to generate text for users. The update is part of Microsoft's efforts to integrate AI into its built-in Windows apps, with similar updates also coming to the Paint app. The new features allow users to generate text and images using AI prompts, but it's unclear whether users actually wanted or needed these additions to the simple text editor.
Research
Understanding Generative AI Capabilities in Everyday Image Editing Tasks
Researchers analyzed 83,000 image editing requests from a Reddit community to understand what types of edits people want and how well AI editors can fulfill them, finding that only about 33% of requests can be successfully handled by current AI editors. AI editors tend to struggle with precise editing tasks, such as preserving the identity of people and animals, but perform better on more creative and open-ended tasks.
Beyond Semantics: Unreasonable Effectiveness of Reasonless Intermediate Tokens
Researchers investigated the role of intermediate tokens, or "thoughts," in large reasoning models, finding that their accuracy has a loose connection to solution accuracy, and models can still produce correct solutions with invalid reasoning traces. Training models on noisy or corrupted traces did not significantly impact performance, challenging the idea that intermediate tokens induce predictable reasoning behaviors and cautioning against over-interpreting them as evidence of human-like reasoning in language models.
A Formal Proof of Complexity Bounds on Diophantine Equations
The authors have formalized a construction of Diophantine equations with bounded complexity in Isabelle/HOL, which is a key step in showing that a certain class of Diophantine equations is undecidable. This work builds on previous research that identified a universal pair for integer unknowns and involves the formalization of number theory concepts and the development of metaprogramming infrastructure to handle complex polynomial definitions.
Prime Path Coverage in the GNU Compiler Collection
The GNU Compiler Collection 15 introduces prime path coverage, a structural coverage metric that balances the number of tests and coverage by requiring loops to be taken, taken more than once, and skipped. This approach improves upon existing algorithms, reducing computational complexity and allowing for efficient tracking of candidate paths, and also subsumes modified condition/decision coverage (MC/DC).
What Lives? A meta-analysis of diverse opinions on the definition of life
The question of what constitutes life remains unanswered, with no single definition achieving universal acceptance despite significant progress in various fields. A novel methodological approach using large language models and semantic analysis revealed a continuous landscape of themes related to the definition of life, suggesting a unified conceptual space with differentiated perspectives rather than a binary taxonomic problem.
Code
Raif v1.1.0 – a Rails engine for LLM powered apps
Raif is a Ruby AI framework that allows developers to add AI-powered features to their Rails applications, supporting multiple LLM providers including OpenAI, Anthropic Claude, AWS Bedrock, and OpenRouter. It provides a range of features, including tasks, conversations, and agents, and can be easily integrated into existing Rails applications through a simple setup process and configuration of LLM providers.
Show HN: GremlinGPT – Local Self-Evolving AI (No Cloud, No APIs, Just Autonomy)
The Ascend Institute for Autonomous Sovereignty & Human Financial Liberation is developing a recursive AI system called GremlinGPT, which is designed to be autonomous, self-healing, and self-evolving. The project is currently seeking funding and support to secure infrastructure, including a dedicated GPU cluster and persistent vector DB, to reach full deployment and achieve stable memory growth, continuous mutation, and multi-agent orchestration.
Writing A Job Runner (In Elixir) (Again) (10 years later)
The author is revisiting a job runner they wrote in Elixir 10 years ago, with minimal code changes, to provide a detailed explanation and case study on implementing a job runner using the gen_stage library. The post explores the landscape of job processing, the architecture of work, and the producer-consumer pattern, highlighting how Elixir's unique features, such as processes and message passing, make it well-suited for job processing and provide a more elegant and self-regulating solution.
Show HN: Samchika – A Java Library for Fast, Multithreaded File Processing
Samchika is a Java library for fast and efficient file processing, utilizing multithreading to handle large files and CPU-intensive tasks in parallel. It offers a simple API, optional runtime statistics, and is ideal for processing and analyzing large text files, with use cases including log analysis, ETL operations, and data transformation pipelines.
Show HN: HNRelevant – Add a "related" section to Hacker News
HNRelevant is a browser extension that adds a "Related" section to Hacker News, providing an instant list of related submissions and allowing users to customize search queries. The extension is available on Chrome, Firefox, Microsoft Edge, and can also be installed as a userscript, supporting further browsers.