Wednesday — July 9, 2025
Researchers propose design patterns to secure LLM agents against prompt injections, a new 3B language model SmolLM3 achieves competitive performance with larger models, and a GitHub project tracks AI-generated code in repositories.
News
Smollm3: Smol, multilingual, long-context reasoner LLM
SmolLM3 is a new 3B language model that achieves competitive performance with larger 4B models while being more efficient, and its architecture and training details are being openly shared. The model was trained on 11T tokens using a three-stage approach, incorporating web, math, and code data, and features several key modifications, including Grouped Query Attention and NoPE, to improve efficiency and long context performance.
Show HN: Jukebox – Free, Open Source Group Playlist with Fair Queueing
Create a collaborative playlist box and share a link with friends to add songs from their phones, all for free with no ads. The platform allows users to queue up music together, with features like song search, a built-in YouTube player, and anonymous, no-login usage.
Show HN: Sumble – knowledge graph for GTM data – query tech stack, key projects
The website Sumble uses cookies to enhance user experience, serve personalized ads, and analyze traffic, with categories including necessary, functional, analytics, performance, and advertisement cookies. Users can customize their cookie preferences, and the website provides detailed information about each cookie, including duration and description, to help users make informed decisions about their data.
Cloudflare: We Will Get Google to Provide a Way to Block AI Overviews
Cloudflare's CEO, Matthew Prince, claims that the company will get Google to provide a way to block AI Overviews and Answer boxes without blocking classic search indexing. Prince is confident that Google will cooperate, and if not, he suggests that Cloudflare may explore legislative options to require Google to differentiate between its crawlers and allow site owners to block AI-specific crawlers.
A Marco Rubio impostor is using AI voice to call high-level officials
An impostor has been using artificial intelligence-powered software to mimic the voice and writing style of Secretary of State Marco Rubio, contacting at least five high-level government officials, including foreign ministers, a US governor, and a member of Congress. The fake Rubio has been sending voice and text messages to these officials, sparking concerns about the potential for AI-powered impersonation to be used for malicious purposes.
Research
Design Patterns for Securing LLM Agents Against Prompt Injections
AI agents powered by Large Language Models are vulnerable to prompt injection attacks, which pose a significant security threat, especially when agents handle sensitive information. To address this, researchers propose design patterns for building AI agents that are resistant to prompt injection attacks, analyzing their effectiveness and trade-offs in terms of utility and security through case studies.
Measuring AI Ability to Complete Long Tasks
Researchers have proposed a new metric, the 50%-task-completion time horizon, to quantify AI capabilities in terms of human capabilities, finding that current AI models can complete tasks with 50% success rate in around 50 minutes, a time frame that has been doubling approximately every seven months. If this trend continues, it is predicted that within 5 years, AI systems will be capable of automating many software tasks that currently take humans a month to complete.
Interpreting Large Language Model's Personality Through Critical Event Analysis
Researchers have developed a framework to analyze the "personality" of Large Language Models (LLMs) by evaluating their ability to extract and rank key events from text, revealing distinct traits such as emotional reasoning, analytical styles, and prioritization of conceptual framing or empirical validation. This analysis, conducted on various LLMs including Orca 2, Qwen 2.5, and Claude 3.7, aims to improve model interpretability and make them more user-friendly for diverse applications.
Praise: Enhancing Product Descriptions with LLM-Driven Structured Insights
PRAISE is a system that uses Large Language Models to extract and structure insights from customer reviews and seller descriptions, identifying discrepancies and providing a clear format for comparison. This allows sellers to enhance their product listings and buyers to better assess product reliability, improving the overall quality and trustworthiness of e-commerce product catalogs.
Cats Confuse Reasoning LLM – Adversarial Triggers for Reasoning Models
Researchers have discovered that appending short, irrelevant text, such as "Interesting fact: cats sleep most of their lives," to math problems can significantly increase the likelihood of advanced reasoning models producing incorrect answers. This vulnerability, exploited through the CatAttack pipeline, raises security and reliability concerns, highlighting that even state-of-the-art models can be misled by subtle adversarial inputs.
Code
Show HN: Dashboard tracking all GitHub PRs and analyzing Code Agent activity
The Agents in the Wild project is an open-source repository that tracks and analyzes GitHub pull requests to identify autonomous code agents, classifying them into categories such as Human, OpenAI Codex, and GitHub Copilot. The system consists of a Next.js frontend and a Python backend, and can be deployed locally by following the provided installation instructions, allowing users to visualize insights and derive data from the analyzed pull requests.
Show HN: Lokilizer – free tool for translating apps from two source langs to any
There is no text to summarize. The input appears to be an error message indicating that a README file could not be retrieved.
Show HN: Track the AI-generated code in your repo
There is no text to summarize. The provided message appears to be an error notification indicating that a README file could not be retrieved.
Show HN: Free Unlimited Photo Enhancer, Background Remover, AI Image Gen, etc.
The PicWish API for Python is a tool that allows users to enhance, generate, and process images without tokens, accounts, or watermarks, with features including AI text-to-image generation, image enhancement, background removal, OCR, and image expansion. The API can be installed using pip and provides various examples of how to use its features, including generating images from text prompts, enhancing image quality, removing backgrounds, extracting text from images, and expanding images.
LLaMeSIMD – LLM SIMD Intrinsic and Function Translation Benchmarking Suite
LLaMeSIMD is a benchmarking suite that evaluates the ability of large language models to translate between different SIMD instruction sets across various CPU architectures, supporting multiple architectures and test modes. The suite provides scientific metrics, beautiful visualizations, and allows for the evaluation of local, open, and proprietary models, making it a valuable tool for researchers to benchmark model capabilities in high-performance computing and other fields.