Researchers develop CCPS to improve LLM confidence estimation, a custom CLI coding agent is built with Pydantic-AI to enhance coding workflows, and SwiftAI is introduced as an open-source library to easily build LLM features on iOS/macOS.
A self-proclaimed AI hater argues the technology is fundamentally flawed, a new framework called Active Reading improves large language models' knowledge absorption, and Vectorless RAG PageIndex achieves 98.7% accuracy on the FinanceBench benchmark without using vectors.
Google introduces Gemini 2.5 Flash Image for state-of-the-art image generation, researchers develop Jet-Nemotron for a breakthrough in LLM speed, and Sideko launches a hybrid deterministic/LLM generator for automating API work.
Researchers find Agentic AI browsers vulnerable to scams, a study proposes the open-source AnalogSeeker language model for analog circuit design, and developers release Agent-C, an ultra-lightweight 4KB AI agent written in C.
Comet AI browser is vulnerable to prompt injection attacks, a new evaluation finds open models often outperform closed models for personal use cases, and researchers introduce DeepConf, a method to scale LLM reasoning with confidence scores.
Google reduces AI query energy cost by 33 times, developers share tips for working with LLM coding agents like Claude Code, and researchers develop a complex number extension of standard continued fractions with unique representations.
Google reduces AI query energy cost by 33 times, expert programmers leverage LLM "vibe coding" for workflow efficiency, and researchers find LLMs exhibit human biases when generating random sequences.
AWS CEO Matt Garman calls replacing junior staff with AI "the dumbest thing I've ever heard", researchers introduce FormalGrad, a method integrating formal methods with gradient-based LLM refinement, and developers create DiffMem, a git-based memory backend for AI agents using Markdown files and Git for temporal evolution tracking.
Tidewave Web launches an in-browser coding agent for Rails and Phoenix, researchers find that AI-generated code can create a "bus factor of zero" posing significant maintenance risks, and Luminal introduces an open-source search-based GPU compiler for deep learning models.
DeepSeek-V3.1-Base model boasts 685B parameters, researchers identify six challenges to AI-assisted codebase generation, and OpenAI's Reflect project introduces a physical AI assistant that illuminates users' lives through sound, light, and color.
US companies have invested $40B in Generative AI with little return, researchers have found malicious LLMs can extract personal info from users, and developers have introduced Whispering, an open-source local-first dictation app.
AI-related posts have reached an all-time high on Hacker News, a new GitHub Action called PromptProof tests LLM prompts in CI/CD pipelines, and researchers have found that Apple Silicon's unified memory architecture makes it a cost-effective option for on-device large language model inference.
Researchers propose a 2-bit quantization framework for complex-valued large language models, developers release UrbanOS-PoC, a sovereign and self-healing AI architecture for city mobility, and a new neural network architecture based on the Tversky similarity function achieves notable improvements in image recognition and language modeling.
Codeberg's Anubis challenge is being solved by AI crawlers, researchers propose the Fairy±i framework for 2-bit complex LLMs, and the Nabu Android app combines Text-to-Speech and chat capabilities with on-device large language models.
Google introduces Gemma 3 270M, a compact AI model for hyper-efficient task-specific fine-tuning, researchers demonstrate GPT-5's state-of-the-art performance in multimodal medical reasoning, and developers release YAMS, a persistent memory system for large language models with features like content-addressed storage and semantic search.
Illinois bans AI therapy due to safety concerns, a new tool called StackBench audits how well coding agents use library documentation, and researchers develop self-evolving AI agents that adapt to dynamic environments through automatic enhancement.
Researchers find LLMs are poor at logical inference, Qodo Command scores 71.2% on SWE-bench Verified, and a new open-source platform called Omnara enables real-time monitoring and control of AI agents like Claude Code.
GPT-OSS-120B runs on just 8GB VRAM, researchers develop an analytic theory of creativity in convolutional diffusion models, and the GLM-4.5V open-source multimodal large language model achieves state-of-the-art performance on various benchmarks.
OpenAI releases gpt-oss-20b and gpt-oss-120b models, researchers propose design patterns to secure LLM agents against prompt injections, and a new CLI tool called Uwu generates shell commands inline with GPT-5.
A lawyer advocates for the deceased to have the right to delete their digital data, a developer uses a browser setup with multiple free AI models to code efficiently, and researchers propose design patterns to secure LLM agents against prompt injection attacks.
OpenAI brings back GPT-4o for paid users after backlash, researchers develop a framework to measure the environmental impact of large language models, and a new benchmark measures ClickHouse as up to 16.8x faster than Postgres for LLM chat interactions.
OpenAI's GPT-5 model achieves state-of-the-art results on coding benchmarks, researchers discover declining medical safety messaging in generative AI models, and Octofriend, a coding agent, allows users to switch between GPT-5 and Claude models mid-conversation.
The author of the "enigo" library was rejected by Anthropic despite his code being used in their AI project, Kitten TTS introduces a 25MB CPU-only open-source voice model, and researchers propose physics-based ASICs to solve the compute crisis in AI training.
OpenAI releases advanced open-weight reasoning models, an engineer overcomes AI imposter syndrome by realizing AI's limitations, and researchers discover that removing a single "super weight" parameter can significantly degrade a Large Language Model's performance.
Perplexity AI faces backlash for stealth crawling, researchers develop a robust framework for spiking neural networks on low-end FPGAs, and a tiny reasoning layer called WFGY boosts LLM output accuracy by 22.4%.
Read