Monday — August 25, 2025
Comet AI browser is vulnerable to prompt injection attacks, a new evaluation finds open models often outperform closed models for personal use cases, and researchers introduce DeepConf, a method to scale LLM reasoning with confidence scores.
News
Comet AI browser can get prompt injected from any site, drain your bank account
Zack Overflow is warning about the risks of AI-powered browsers, claiming that users can have their bank accounts drained through "prompt injection" while browsing sites like Reddit. Brave, a browser company, recently disclosed a security flaw in Perplexity's Comet browser that put users' sensitive information at risk, highlighting the potential dangers of AI browsing agents.
Evaluating LLMs for my personal use case
The author conducted an evaluation of various AI models, including those from Google, Anthropic, and OpenAI, using 130 real prompts from their bash history, and found that almost all models performed well, with cost and latency being the key differentiators. The evaluation revealed that open models often outperformed closed models, and that reasoning capabilities were not always necessary, except in cases where creativity was required, such as writing a poem.
DeepConf: Scaling LLM reasoning with confidence, not just compute
The authors introduce DeepConf, a test-time inference method that enhances the reasoning capabilities of Large Language Models (LLMs) by leveraging internal log-probabilities to derive localized confidence scores, allowing for more efficient and accurate reasoning. DeepConf achieves state-of-the-art accuracy while reducing the number of generated tokens by up to 84.7%, making high-performance LLM reasoning more efficient, scalable, and economically viable for real-world applications.
Writing with LLM is not a shame
The use of AI in generating content raises questions about transparency, with some arguing that it's essential to disclose when AI is used, while others believe it's not necessary, drawing parallels to photo editing tools like Photoshop. The debate centers around the importance of credibility, with the value of content being a key factor, as low-quality content is unlikely to have a significant impact regardless of whether AI is used or not.
A bubble that knows it's a bubble
The current AI investment bubble, characterized by excessive valuations and speculation, is reminiscent of past technological bubbles, such as the Railway Mania of the 1840s and the dot-com bubble of the 1990s, where revolutionary technologies were accompanied by unsustainable speculation and eventual crashes. Despite the warnings from experts, including OpenAI's CEO Sam Altman, investors continue to pour money into AI companies with negligible revenue, mirroring the patterns of previous bubbles where speculation eventually gave way to reality checks and significant losses.
Research
Evaluating Long-Term Conversational Memory of LLM Agents
Researchers have developed a machine-human pipeline to generate high-quality, long-term dialogues, leveraging large language models and techniques like retrieval augmented generation, and created a dataset called LoCoMo with very long-term conversations. The dataset is used to evaluate the long-term memory of models, revealing that even advanced language models struggle to understand lengthy conversations and comprehend temporal and causal dynamics, lagging behind human performance.
Integer continued fractions for complex numbers
Researchers have developed a complex number extension of standard continued fractions, building on an algorithm by Lagrange and Gauss, and found that these new representations are unique and have useful properties. These complex continued fractions can also be interpreted geometrically as cutting sequences, providing a new perspective on the subject.
Zero-Shot Retrieval for Scalable Visual Search in a Two-Sided Marketplace
A visual search system was developed and deployed in Mercari's consumer-to-consumer marketplace, utilizing recent vision-language models for zero-shot image retrieval and achieving a 13.3% increase in retrieval metrics over the baseline. The system was shown to be effective in real-world use, with a one-week online test resulting in up to a 40.9% increase in transaction rate via image search, demonstrating the potential for zero-shot models to serve as a strong baseline for production use.
An elementary introduction to information geometry
This survey describes the differential-geometric structures of information manifolds and states the fundamental theorem of information geometry. It provides a self-contained introduction to the necessary concepts, along with examples of how these information manifolds are applied in information sciences, omitting proofs for the sake of brevity.
AetherCode: Evaluating LLMs' Ability to Win in Premier Programming Competitions
Current evaluations of Large Language Models (LLMs) overstate their proficiency in competitive programming due to limitations in benchmark problem difficulty and scope, as well as low-quality test cases. AetherCode is a new benchmark that addresses these issues by drawing from premier competitions and incorporating comprehensive, expert-validated test suites to provide a more accurate measure of LLM capabilities.
Code
Show HN: Clearcam – Add AI object detection to your IP CCTV cameras
Clearcam is an app that turns RTSP-enabled cameras or old iPhones into AI-powered security cameras, offering features like live feeds, event notifications, and end-to-end encryption with a premium subscription. The app is available on the Apple App Store, and users can also install and run it on their computers using Homebrew or Python, with premium features accessible by signing up through the iOS app.
Show HN: A "Catalog of Catalogs" for Unified Metadata
Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake that manages metadata directly in different sources, types, and regions, providing users with unified metadata access for data and AI assets. It offers key features such as unified metadata management, end-to-end data governance, and multi-engine compatibility, making it suitable for use cases like federated metadata discovery, multi-region metadata synchronization, and data and AI asset governance.
Show HN: Making hardware-integarted app development easier (open source)
Ubo App is a Python application that provides a unified interface and tools for developing and running hardware-integrated apps, optimized for Raspberry Pi devices. It offers a minimalistic UI for end-users to install and interact with developer apps, and supports various hardware-specific capabilities, with options for DIY hardware and remote API access.
I made a right-click selected text for AI chat extension
The AI Anywhere browser extension allows users to right-click on selected text on any webpage and send it directly to their preferred AI chat tool, choosing from 14 popular platforms. This extension streamlines research and saves time by eliminating the need for copy-pasting and enabling seamless transitions from reading content to asking AI questions about it.
Show HN: I tried coding theology – accidentally built AI accountability
The SRTA Governance Core is a Python implementation that provides a governance layer for AI systems, evaluating actions against policies and tracking responsibility. It is currently a prototype with a main engine, demo script, and integration examples, but has known issues and is intended for research purposes only.