Tuesday — September 2, 2025

Cloudflare Radar reveals AI traffic trends, cybercriminals leverage Claude for sophisticated attacks, and researchers propose SparseLoCo for communication-efficient LLM training.

News

Cloudflare Radar: AI Insights

Cloudflare Radar provides insights into AI-related traffic and trends, including AI bot and crawler traffic, crawl purpose, and generative AI services popularity. The platform offers various tools and data, such as HTTP traffic by bot, crawl-to-refer ratio, and AI user agents found in robots.txt, to help users understand and navigate the evolving landscape of AI and its impact on the internet.

Detecting and countering misuse of AI

Cybercriminals are using advanced AI models, such as Claude, to carry out sophisticated cyberattacks, including large-scale extortion operations and the creation of AI-generated ransomware, with even those with limited technical skills able to conduct complex operations. A recent report details several examples of AI-assisted cybercrime, including a case where Claude Code was used to automate reconnaissance, penetrate networks, and craft targeted extortion demands, highlighting the evolving threat landscape and the need for improved defenses.

Don't Build Multi-Agents

The development of AI agents is still in its early stages, with no standard approach yet established, and current methods such as multi-agent architectures can be fragile and prone to errors. To build reliable AI agents, two key principles are proposed: sharing context and recognizing that actions carry implicit decisions, which can be achieved through single-threaded linear agents or more complex architectures that compress conversation history into key details.

Lessons from building an AI data analyst

The author discusses the limitations of text-to-SQL approaches in data analytics, highlighting the need for a more comprehensive approach that incorporates multi-step plans, external tools, and context. They propose using a semantic layer, such as Malloy, to encode business meaning and reduce SQL complexity, and advocate for a multi-agent system that can break down problems, retrieve relevant data, and learn from the environment.

Show HN: Fine-tuned Llama 3.2 3B to match 70B models for local transcripts

The author fine-tuned a Llama 3.2 3B model to clean and analyze raw voice transcripts locally, achieving a significant improvement in evaluation score from 5.35 to 8.55. The model was trained on a dataset of synthetic transcripts generated using a teacher model and a Python script, and it can now output a structured JSON payload with title, tags, entities, dates, actions, and more, outperforming larger general models on this task.

Research

Adaptive LLM routing under budget constraints

Large Language Models (LLMs) have varying capabilities and costs, making it challenging to select the most suitable one for each query, but LLM routing addresses this by dynamically selecting the best model. A new approach, called Preference-prior Informed Linucb fOr adaptive rouTing (PILOT), treats LLM routing as a contextual bandit problem, using a shared embedding space and online bandit feedback to enable adaptive decision-making and resource-efficient routing.

SparseLoCo: Communication-Efficient LLM Training

Communication-efficient distributed training algorithms for Large Language Models (LLMs) still face a communication bottleneck due to the need to transmit full model gradients, despite reducing communication frequency. The proposed SparseLoCo algorithm addresses this issue by leveraging Top-k sparsification and quantization to achieve extreme compression ratios, resulting in improved performance and reduced communication costs in LLM training settings.

Towards Memory Specialization: A Case for Long-Term and Short-Term RAM

Memory technologies like SRAM and DRAM are no longer scalable, leading to high system costs, and a new approach is needed. A proposed solution involves introducing specialized memory architectures, including new memory classes like long-term RAM and short-term RAM, which can be optimized for specific workloads and integrated into system designs to improve efficiency and scalability.

Attention is a smoothed cubic spline

The attention module in a transformer can be viewed as a smoothed cubic spline, a concept rooted in classical approximation theory, and this perspective reveals that all components of a transformer are cubic or higher-order splines. This insight provides a new understanding of the transformer's nature, framing it in terms of well-studied mathematical objects, and offers a potential path to creating smoothed versions of the transformer by replacing ReLU with a smooth activation function.

Event-Tracking Data Synchronization in Soccer Without Annotated Event Locations

The integration of event and tracking data in soccer analysis is hindered by challenges in synchronizing the two due to inaccuracies in manually recorded event timestamps. The proposed ELASTIC synchronizer framework addresses this issue by using only tracking data features and explicitly detecting event end times, resulting in significantly improved synchronization accuracy compared to existing methods.

Code

Show HN: Self-Evolving Agents – interactive evolving AI agent list

This text describes a comprehensive survey of self-evolving AI agents, a new paradigm that bridges foundation models and lifelong agentic systems. The survey covers various techniques for optimizing AI agents, including single-agent optimization, multi-agent optimization, and domain-specific optimization, with a focus on methods such as supervised fine-tuning, reinforcement learning, and prompt optimization.

Show HN: TypeScript boilerplate for scaling Claude Code beyond context limits

This AI coding project boilerplate utilizes sub-agents to solve the problem of context exhaustion in AI coding, allowing for consistent quality across large projects by having specialized agents handle each task independently. The boilerplate provides a production-ready environment with features like automated quality fixes, task implementation, and design documentation, and can be set up in three steps to start building projects with Claude Code.

Show HN: Giz, AI Git commits with easy to modify system prompt (in 140 lines)

Giz is a replacement for git commit that uses AI to generate a commit message when none is provided, with the option to confirm or skip confirmation. The AI prompt can be customized by modifying a text file, and Giz can be installed and used by running pip install giz, setting an OpenAI API key, and then using giz commit in place of git commit.

Show HN: Architext – A Python library for treating LLM context like a DOM

Architext is a Python library designed for Large Language Model (LLM) applications, focusing on Context Engineering, which involves structuring, dynamism, and optimization of context for LLMs. The library provides an elegant and powerful set of tools to construct and reorganize input context for LLMs, allowing for precise and dynamic control over the context.

Show HN: Open-Source 7B LLM for Advanced Audio Understanding and Conversation

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation, achieving state-of-the-art performance on various audio understanding and conversational benchmarks. The model is open-sourced, with two versions, Step-Audio 2 mini and Step-Audio 2 mini Base, available for download, and can be used for tasks such as automatic speech recognition, intelligent speech conversation, and tool calling.