Monday — September 15, 2025
Gentoo Council bans AI-assisted contributions due to copyright concerns, researchers develop ButterflyQuant for ultra-low-bit LLM quantization, and a developer builds a tool to detect bot-generated comments on Trump AI videos.
News
Gentoo AI Policy
The Gentoo Council has adopted a policy prohibiting contributions to Gentoo that were created with the assistance of Natural Language Processing artificial intelligence tools, due to concerns about copyright, quality, and ethics. This policy applies to Gentoo contributions and official projects, but does not prevent the addition of packages for AI-related software or software developed with AI tools upstream.
AI False information rate for news nearly doubles in one year
NewsGuard's audit of the 10 leading generative AI tools found that they repeated false information on topics in the news over 35% of the time in August 2025, nearly doubling from 18% in August 2024. This increase is attributed to the AI tools' shift from declining to answer questions to providing answers based on real-time web searches, which often pull from unreliable sources and propaganda.
The AI-Scraping Free-for-All Is Coming to an End
The era of free-for-all AI data scraping, where companies like OpenAI, Google, and Meta have been scraping websites for training data without permission or compensation, is coming to an end. A new standard called Really Simply Licensing (RSL) has been announced, which will allow websites to indicate how their content can be used, attributed, and priced, potentially marking a shift towards a more regulated and compensated approach to AI data scraping.
AI fabricates 21 out of 23 citations lawyer sanctioned reported to state bar [pdf]
The Court of Appeal of the State of California affirmed a judgment in favor of the defendants, Land of the Free, L.P., after the plaintiff, Sylvia Noland, appealed a grant of summary judgment in an employment-related case. The appeal was notable, however, because the plaintiff's counsel used generative artificial intelligence (AI) tools to draft the appellate briefs, which created fabricated legal authority, including fake quotes and citations to non-existent cases.
AI-generated medical data can sidestep usual ethics review, universities say
Some medical research centers, including those in the US, Canada, and Italy, are waiving usual ethics reviews for studies using AI-generated "synthetic" medical data, which is created by training AI models on real patient information and then generating new data sets with similar statistical properties. This exemption is justified because synthetic data does not contain real or traceable patient information, and institutions are citing national regulations and laws, such as the US Common Rule and Canada's Personal Health Information Protection Act, to support their decisions.
Research
ButterflyQuant: Ultra-low-bit LLM Quantization
Large language models are limited by their massive memory requirements, and while quantization can reduce memory usage, it often results in performance loss due to outliers in activations. ButterflyQuant addresses this issue by using learnable butterfly transforms that adapt to specific weight distributions, resulting in improved performance, such as achieving 15.4 perplexity on LLaMA-2-7B with 2-bit quantization, outperforming existing methods like QuaRot.
Generative Engine Optimization: How to Dominate AI Search
The adoption of AI-powered search engines is transforming information retrieval, replacing traditional ranked lists with synthesized answers, and requiring a new approach to optimization, termed Generative Engine Optimization (GEO). A comparative analysis of AI Search and traditional web search reveals significant differences in how they source information, and provides guidance for practitioners to adapt to this new landscape, including engineering content for machine scannability and dominating earned media to build authority.
Pipes: A Meta-Dataset of Machine Learning Pipelines
The Algorithm Selection Problem in machine learning is hindered by the high computational costs of evaluating various algorithms, but leveraging online repositories like OpenML can help mitigate this cost. To address the limitations of OpenML, a new collection of experiments called PIPES has been proposed, which provides a comprehensive and diverse set of pipeline results on 300 datasets, aiming to support the meta-learning community with a more representative and complete collection of experiments.
A qualitative analysis of pig-butchering scams
Pig-butchering scams are a complex form of fraud that use romance, investment fraud, and social engineering to exploit victims, involving staged trust-building, fake financial platforms, and high-pressure tactics to drain victims' finances over time. Through interviews with 26 victims, researchers analyzed the scam's lifecycle, revealing severe emotional and financial manipulation, and proposed intervention points for social media and financial platforms to reduce the scams' prevalence and support victims.
LLMs Don't Know Their Own Decision Boundaries
Language models can generate self-explanations by creating counterfactual scenarios, but these explanations are often either valid but not minimal, or minimal but ineffective, failing to provide insightful information about the model's decision-making process. The inability of large language models to produce reliable self-explanations raises concerns about their deployment in high-stakes settings, where misleading insights could impact downstream decision-making.
Code
Show HN: Update: Open-source private home security camera(end-to-end encryption)
Secluso is a privacy-preserving home security camera solution that uses end-to-end encryption to protect user videos, with a camera hub that encrypts and sends videos to a mobile app via an untrusted server. The solution has various components, including a camera hub, mobile app, and untrusted server, and supports both standalone cameras using Raspberry Pi and commercial IP cameras, with features like event detection and livestreaming.
I built a tool to check if Trump AI video comments were bots
This tool downloads YouTube comments and replies, and performs sentiment analysis and bot detection using AI, providing detailed reports in JSON and human-readable formats. To use the tool, users must install dependencies, obtain API keys from Google Cloud Console and OpenAI Platform, and configure environment variables before running scripts to download and analyze comments.
Show HN: Vue-Markdown-render – up to 100× faster streaming Markdown for Vue 3
Vue-renderer-markdown is a library designed to handle the unique challenges of streaming and rendering Markdown content in real-time, providing seamless formatting even with incomplete or rapidly changing Markdown blocks. It offers features such as ultra-high performance, streaming-first design, smart typewriter effect, and complete Markdown support, making it suitable for applications that require real-time Markdown rendering, such as AI model responses or live content updates.
We've attacked 40+ AI tools, including ChatGPT, Claude and Perplexity
AIGuardPDF is a tool that protects human documents from AI intrusion by embedding adversarial content, making it difficult for large language models to understand the document while maintaining perfect human readability. The tool uses sophisticated text steganography, injecting invisible text and decoy content to mislead AI models, and has been tested with a 90%+ success rate against major AI systems, including ChatGPT and Google Bard.
Database Intelligent Query Assistant
该系统是一个数据库智能查询系统,支持用户自然语言提问,自动生成SQL并返回查询结果,具有数据库元数据管理、表结构映射、向量化入库等功能。系统采用大语言模型和向量化技术,支持MySQL、Oracle、PostgreSQL等多种数据库,并具有多阶段异常处理和安全机制,确保查询稳定性和安全性。