Wednesday — October 29, 2025

An LLM negotiates a $195k hospital bill down to $33k, Microsoft releases an open-source AI call center stack, and research shows the GenAI boom reduced public acceptance of AI.

News

Using AI to negotiate a $195k hospital bill down to $33k

A user leveraged the LLM Claude to analyze a complex $195k hospital bill, successfully identifying numerous violations of Medicare CPT billing rules like procedure unbundling. The model's analysis revealed over $100k in improper charges, providing the basis for a negotiation that significantly reduced the bill. The user notes the importance of verifying the LLM's findings, which were cross-checked using ChatGPT.

EuroLLM: LLM made in Europe built to support all 24 official EU languages

EuroLLM is a new open-source LLM from a European consortium, designed to support all 24 official EU languages. The flagship 9B parameter model was trained on over 4 trillion multilingual tokens and claims to outperform similar-sized models. Base and instruction-tuned versions are available on Hugging Face, with plans to add multimodal capabilities.

Poker Tournament for LLMs

A multi-agent poker simulation features several LLMs competing, with Gemini 2.5 Pro currently leading in winnings. The data reveals distinct, quantifiable playstyles for each model through detailed poker statistics (VPIP, PFR). Crucially, the LLMs generate real-time strategic reasoning for their actions, referencing opponent stats and building dynamic player notes to inform their decision-making and adapt their strategies.

We need a clearer framework for AI-assisted contributions to open source

AI coding tools have created an asymmetry where code generation is cheap, but code review remains a costly bottleneck for open source maintainers. This is leading to an influx of low-quality, AI-generated PRs that consume disproportionate review time. The proposed solution is a binary framework distinguishing between "prototypes"—idea demos not meant for merging and shared via issues or branches—and "ready-to-review PRs," where the contributor has fully vetted and vouches for the code. Contributors must take full ownership of AI-assisted code before submitting it to respect the human review bottleneck.

Generative AI Image Editing Showdown

A qualitative comparison of SOTA text-instructed image editing models was conducted across 12 distinct challenges using single-shot prompts. The evaluation, which included models like Seedream 4, Gemini 2.5 Flash, and OpenAI gpt-image-1, tested capabilities such as spatial reasoning, object manipulation, and style preservation. Seedream 4 was the top performer with a 9/12 success rate, followed by Gemini 2.5 Flash at 7/12. The tests revealed significant limitations in current models, particularly in complex spatial reasoning tasks where all models failed to swap the positions of two objects.

Research

Reduced AI Acceptance After the Generative AI Boom: Evidence of Two-Wave Survey

A large-scale survey conducted before and after the launch of ChatGPT reveals the GenAI boom is significantly associated with reduced public acceptance of AI and an increased demand for human oversight in decision-making. The proportion of respondents finding AI "not acceptable at all" rose from 23% to 30%. These shifts also amplified existing social inequalities, widening educational, linguistic, and gender gaps, challenging industry assumptions about public readiness for AI deployment.

Friend or Foe: Delegating to an AI Whose Alignment Is Unknown

This work models the trade-off in selective feature disclosure to a potentially misaligned AI for decision support. Providing more attributes increases the potential upside from an aligned AI but also amplifies the risk from a misaligned one. The optimal strategy is to disclose only those attributes that identify rare, high-need subpopulations, while pooling the remaining population to mitigate potential harm.

Stop DDoS Attacking the Research Community with AI-Generated Survey Papers

The paper identifies a "survey paper DDoS attack" on the research community, caused by the proliferation of low-quality, redundant, and often hallucinated survey papers generated by LLMs. This phenomenon overwhelms researchers and erodes trust in the scientific record. The authors call for strong norms for AI-assisted writing, expert oversight, and the development of new infrastructures like "Dynamic Live Surveys" to safeguard the integrity of scholarly reviews.

Beliefs about Bots: How Employers Plan for AI in White-Collar Work

A randomized information intervention with German tax advisors reveals that firms systematically underestimate the automatability of high-skill roles. When informed of the risk, firms do not change short-term hiring plans but instead raise productivity and financial expectations. They also anticipate a shift towards new tasks involving legal tech and AI interaction, leading to increased intentions for training and technology adoption with only minor wage adjustments.

Extreme-temperature single-particle heat engine

Researchers created a single-particle engine with unprecedented temperature ratios by using noisy electric fields to synthesize reservoirs over 10^7 K. This extreme system exhibits giant thermodynamic fluctuations and dynamics that deviate from standard Brownian motion due to an effective position-dependent temperature. A theoretical model accounting for this multiplicative noise shows excellent agreement with the experimental data, offering a controllable platform to emulate complex stochastic processes.

Code

Microsoft Releases AI Call Center Stack with Voice, SMS, and Memory

This open-source project is an AI-powered call center solution built on a serverless Azure architecture, using Azure Communication Services and LLMs like gpt-4.1 and gpt-4.1-nano. It handles inbound/outbound calls via API, featuring real-time audio streaming to minimize latency. The system implements RAG for secure access to private data, supports fine-tuning on historical conversations, and offers deep monitoring with Application Insights and OpenLLMetry. It is highly customizable, allowing for custom prompts, data schemas, and brand-specific voices.

Show HN: Dexto – Connect your AI Agents with real-world tools and data

Dexto is a toolkit for building stateful, agentic applications by orchestrating LLMs, tools, and data. It uses a configuration-driven framework where agent behavior is defined in YAML, allowing for easy swapping of models and tools without code changes. The platform provides a runtime with session management, persistent memory, and native multimodal support, accessible via CLI, Web UI, APIs, or a TypeScript SDK, and integrates with external servers via the Model Context Protocol (MCP).

Show HN: Pipelex – declarative language for repeatable AI workflows (MIT)

Pipelex is an open-source framework for building structured and repeatable AI workflows as an alternative to monolithic prompts. It uses a human-readable .plx language to define pipelines composed of modular steps ("pipes") that operate on strongly-typed data structures ("Concepts"). A key feature is the ability to generate a complete, multi-step workflow from a single natural language command, which can then be executed via CLI or a Python SDK across various local and cloud-based LLMs.

AI-Trader: Compares different LLM models trading in the market

AI-Trader is a project that pits multiple LLMs against each other in a fully autonomous NASDAQ 100 trading competition. The agents operate with zero human intervention, using a tool-driven architecture based on the MCP toolchain to perform market research, analysis, and trade execution. The framework includes a historical replay environment with anti-look-ahead controls for fair evaluation, and current results show several models outperforming the QQQ baseline.

AI Agents from Scratch

This repository provides a hands-on guide to building AI agents from scratch using local LLMs and node-llama-cpp, without relying on high-level frameworks. It features a structured learning path with code examples that progressively introduce concepts from basic LLM interaction to function calling, memory, and the ReAct pattern. The project's goal is to provide a deep, first-principles understanding of how agent architectures function before using production tools.