Friday June 6, 2025

Tokasaurus triples throughput for LLM workloads, X restricts AI training with its data, and a novel language model predicts AI research success with 77% accuracy.

News

Tokasaurus: An LLM inference engine for high-throughput workloads

Tokasaurus is a new LLM inference engine optimized for throughput-intensive workloads, capable of outperforming existing engines like vLLM and SGLang by up to 3x. It achieves this through various optimizations, including low CPU overhead, dynamic prefix identification, and efficient implementations of pipeline and tensor parallelism for both small and large models.

X changes its terms to bar training of AI models using its content

Social network X has changed its developer agreement to prevent third parties from using the platform's content to train large language models, adding a restriction to its "Reverse Engineering and other Restrictions" subsection. This change comes after Elon Musk's AI company xAI acquired X, aiming to prevent competitors from accessing the social platform's data without a sale agreement.

Differences in link hallucination and source comprehension across different LLM

The author developed a contextualization engine called SIFT Toolbox, which uses AI to generate "context reports" on claims and quotes, and found that some AI models outperform others in terms of accuracy and ability to cite and summarize real-world documents. The author tested various AI models on a complex fact-checking example, finding a significant gap in their capabilities, with some models prone to "link hallucination" and misinterpretation of sources, highlighting the need for more rigorous testing and evaluation of AI systems.

Show HN: Ask-human-mcp – zero-config human-in-loop hatch to stop hallucinations

The author has created a tool called "ask-human-mcp" that helps prevent AI from hallucinating or making incorrect assumptions by allowing it to ask for human input when it's unsure. The tool works by having the AI raise a "hand" and ask a question, which is then answered by a human, providing a simple and efficient way to resolve issues and improve the AI's accuracy.

Machine Learning: The Native Language of Biology

The relationship between mathematics and biology is complicated, as the traditional language of mathematics that works well for physics often fails to yield breakthroughs in biology due to the complexity, interconnectedness, and diversity of biological systems. Machine learning, however, appears to be a new language that can capture the complexities of biology, embracing its context-dependent and evolving nature, and has shown promise in analyzing biological systems and capturing non-linear relationships that traditional models cannot.

Research

Predicting Empirical AI Research Outcomes with Language Models

Researchers have developed a system that uses a fine-tuned language model to predict the success of AI research ideas, outperforming human experts by a significant margin with 77% accuracy on a test set. The system's effectiveness was verified through extensive testing, including on unpublished novel ideas, demonstrating its potential to accelerate empirical AI research by identifying promising ideas and improving idea generation models.

Questioning Representational Optimism in Deep Learning

Researchers compared neural networks trained using conventional methods to those evolved through an open-ended search process, finding that despite similar output, the internal representations differed significantly, with conventionally trained networks exhibiting "fractured entangled representation" (FER) and evolved networks approaching a more unified representation. Understanding and mitigating FER may be crucial for improving core model capacities such as generalization, creativity, and learning in large AI models.

From tokens to thoughts: How LLMs and humans trade compression for meaning

Large Language Models (LLMs) can form broad conceptual categories similar to human judgment, but struggle to capture fine-grained semantic distinctions, instead prioritizing aggressive statistical compression. In contrast, human conceptual systems prioritize adaptive nuance and contextual richness, even at the cost of lower compressional efficiency, highlighting key differences between current AI and human cognitive architectures.

Extreme Super-Resolution via Scale Autoregression and Preference Alignment

The Chain-of-Zoom (CoZ) framework addresses the scalability limitations of single-image super-resolution models by breaking down the process into a series of intermediate scale states, allowing for extreme resolutions without additional training. By combining a backbone SR model with multi-scale-aware text prompts generated by a vision-language model, CoZ enables high-quality enlargements of up to 256x and beyond, significantly improving the capabilities of standard SR models.

What LLMss Don't Talk About: Empirical Study of Moderation & Censorship Practice

Large Language Models (LLMs) from various countries often engage in censorship when prompted on political topics, with methods ranging from outright refusal to answer (hard censorship) to selective omission of information (soft censorship). The analysis of 14 state-of-the-art models reveals that censorship is typically tailored to the LLM provider's domestic audience, highlighting the need for greater diversity and transparency in LLM moderation strategies.

Code

Show HN: Create LLM graders and run evals in JavaScript with one file

Bolt Foundry is developing a structured prompt engineering system, where developers create AI using decks of cards with examples and specs to ensure reliability. The deck system consists of composable cards with hierarchical specifications, clear requirements, and rated examples to define precise AI capabilities and behaviors.

I built an open-source tool that adds RAG context to JetBrains AI Assistant

RAGmate is an open-source, lightweight server that extends JetBrains AI Assistant with actual knowledge of your project by indexing your codebase locally and injecting relevant context into prompts. It works with various LLM models and providers, including OpenAI, and runs locally without cloud syncing or lock-in, providing context-aware answers and completions within your JetBrains IDE.

Show HN: MCP Server that simulates smart home and lifestyle devices

There is no text to summarize. The input appears to be an error message indicating that a README file could not be retrieved.

The biggest list of Shadcn/UI Related stuff on GitHub

The text appears to be a list of various UI components and libraries related to shadcn/ui, a design system, with descriptions, links, and demos for each component. The list includes a wide range of components, such as date pickers, dialog stacks, and editors, all built with shadcn/ui and other popular frameworks like React, Tailwind CSS, and Vue.

Token Visualizer to analyze and optimize your LLM prompts for cost andefficiency

Token Visualizer is a tool for analyzing, visualizing, and optimizing Large Language Model (LLM) prompts, helping developers reduce costs by identifying and compressing inefficient text. The tool provides features such as deep token analysis, visual intelligence, and AI-powered compression suggestions to optimize prompts and reduce token usage, resulting in cost savings.