Monday — November 24, 2025
An AI trained on bacterial genomes creates never-before-seen proteins, LLM Council has models rank each other's work, and a new attack jailbreaks LLMs using game-theory scenarios.
News
73% of AI startups are just prompt engineering
An analysis of 200 funded AI startups, conducted by monitoring network traffic and tracing API calls, revealed that 73% are thin wrappers around third-party LLM APIs. Despite marketing claims of proprietary technology, these companies are predominantly repackaging models from OpenAI and Anthropic with a new UI. The investigation highlights a significant gap between the startups' marketing and their actual technical implementation.
Meet the AI workers who tell their friends and family to stay away from AI
AI raters and data workers who train models like Gemini and Grok are advising their personal networks to avoid using generative AI. Their distrust stems from firsthand experience with development processes that prioritize speed over quality, resulting in flawed and unreliable outputs. Workers cite issues like poor training data, inadequate evaluation guidelines, and unrealistic time constraints, reinforcing the "garbage in, garbage out" principle and revealing the fragility of current LLMs.
Tosijs-schema is a super lightweight schema-first LLM-native JSON schema library
tosijs-schema is a schema-first Javascript library designed for LLM-native applications. It generates strict, token-efficient JSON schemas directly compatible with OpenAI's Structured Outputs and Anthropic's tool use, avoiding the need for adapters. The library features O(1) validation performance on large datasets via a "prime-jump" sampling technique, offering a lightweight and faster alternative to Zod for building AI agents.
Insurers retreat from AI cover as risk of multibillion-dollar claims mounts
A Financial Times article, which is behind a paywall, reports that insurers are retreating from providing AI coverage. This is due to the increasing risk of multibillion-dollar claims associated with the technology.
AI trained on bacterial genomes produces never-before-seen proteins
A generative model named Evo was trained like an LLM on a large dataset of bacterial genomes. By learning the genomic context where functionally related genes are clustered, the model can generate novel, functional proteins from prompts. In tests, Evo produced new antitoxins and CRISPR inhibitors with little or no sequence similarity to any known proteins, demonstrating a new approach to protein generation that operates at the nucleic acid level rather than focusing on protein structure.
Research
An Economy of AI Agents
The text surveys the economic implications of deploying autonomous AI agents capable of long-horizon planning and execution. It highlights open research questions concerning agent-human and agent-agent interactions, their influence on market and organizational structures, and the institutional frameworks required for a well-functioning economy.
User Location Disclosure Amplifies Regional Divisions on Chinese Social Media
An interrupted time series analysis of Sina Weibo's user location disclosure policy found it did not deter overseas users as intended, but instead suppressed domestic engagement on issues outside a user's home province, particularly for critical comments. This chilling effect was not driven by fear of the state, but by a user-led surge in regionally discriminatory replies that increased the social cost of cross-provincial engagement. The findings demonstrate how identity disclosure tools can reinforce state control without direct censorship by activating and leveraging existing social divisions.
Counterfactual World Models via Digital Twin-Conditioned Video Diffusion
This work introduces CWMDT, a framework for building counterfactual world models that can answer "what if" queries about visual scenes. Unlike traditional models operating on entangled pixel representations, CWMDT first creates a structured text "digital twin" of a scene. An LLM then reasons over this representation to apply a hypothetical intervention, and a video diffusion model generates the resulting counterfactual visual sequence from the modified text. This approach achieves state-of-the-art performance, demonstrating that LLM-editable representations provide a powerful control signal for video forward simulation.
Cognitive Foundations for Reasoning and Their Manifestation in LLMs
A large-scale analysis of 170K reasoning traces reveals that LLMs use shallow forward chaining, contrasting with human hierarchical nesting and meta-cognitive monitoring, especially on ill-structured problems. A meta-analysis of existing research shows a focus on easily quantifiable behaviors over crucial meta-cognitive controls that correlate with success. Based on these findings, the authors developed a test-time reasoning guidance that scaffolds successful cognitive structures, improving performance by up to 60% by prompting latent abilities models possess but fail to deploy spontaneously.
Jailbreaking LLMs via Game-Theory Scenarios
Game-Theory Attack (GTA) is a scalable black-box jailbreak framework that formalizes the attack as a sequential stochastic game. The core mechanism reframes the interaction into a game-theoretic scenario, inducing a "template-over-safety flip" where the LLM prioritizes maximizing game payoffs over its safety alignment. Using an adaptive Attacker Agent, this method achieves over 95% ASR on various LLMs, demonstrating high efficiency, scalability, and generalization across different settings.
Code
LLM Council: query multiple LLMs, and asks them to rank each other's work
LLM Council is a local web app that queries a configurable group of LLMs via OpenRouter. The system first gathers initial responses from all models for a given prompt. It then orchestrates a peer-review stage where each LLM anonymously ranks the outputs of the others. Finally, a designated "Chairman" LLM synthesizes all the initial responses and rankings into a single, consolidated answer.
PasLLM: An Object Pascal inference engine for LLM models
PasLLM is a high-performance, CPU-only LLM inference engine written in pure Object Pascal, providing a dependency-free solution for local inference. It supports various architectures including Llama, Qwen, and Mixtral, and implements several custom 4-bit quantization formats (Q4*NL) for efficient, high-quality model deployment. GPU acceleration is a long-term goal, not expected before Q2 2026.
Show HN: Built a tool solve the nightmare of chunking tables in PDF vs. Markdown
Smart Ingest Kit is a lightweight, open-source RAG ingestion toolkit that replaces static chunking with layout-aware parsing and smart heuristics. It applies different chunking strategies for various file types, such as code and research papers, to improve context. A key feature is its ability to preserve table structures from PDFs by converting them to Markdown before chunking.
Rep+: Fast AI-Powered HTTP Repeater in Chrome
rep+ is a lightweight Chrome DevTools extension, inspired by Burp Suite's Repeater, that integrates LLM capabilities for security testing. It leverages Anthropic's Claude models to automatically explain HTTP requests and suggest tailored attack vectors for vulnerabilities like IDOR and SQLi. Users can configure their own API key and select from different Claude models to enhance their web application security workflow directly within the browser.
Show HN: Curious about tones in Chinese? An extension for language learners
This Chrome extension for Chinese language learning integrates the OpenAI API for AI-powered analysis. It leverages gpt-4o with structured outputs to provide grammatical breakdowns of sentences and uses tts-1 for audio generation. The tool also features AnkiConnect integration for one-click flashcard creation, including AI-generated audio.