Tuesday — October 7, 2025

Write-It-Down.com launches a personal finance tracker, Google DeepMind introduces CodeMender, an AI agent for code security, and researchers unveil the Dragon Hatchling, a new Large Language Model architecture inspired by the brain's scale-free biological networks.

News

Show HN: Write It Down – Personal finance tracker

Write-It-Down.com offers a personal finance tracker that allows users to track their income and expenses in a simple and organized way, with a one-time investment of $14 and no subscription fees. The tracker is built on Google Sheets, providing a clean and minimal dashboard, and allows users to define their own categories and access their data at any time.

The AI bubble is 17 times the size of the dot-com frenzy and four times subprime

Artificially low interest rates have fueled excessive investment in AI, creating a bubble 17 times larger than the dot-com bubble, according to research firm MacroStrategy Partnership. The firm argues that large language models have already hit scaling limits, and the AI bubble is likely to burst, leading to a recession and a prolonged period of deflation, with recommended investments including resources, emerging markets, and gold equities.

CodeMender: an AI agent for code security

Google DeepMind is a leading AI research organization that develops and applies various AI models and technologies to benefit humanity, with a focus on areas such as biology, climate, mathematics, and computer science. The organization provides access to its AI models, including Gemini and Gemma, and shares its research and breakthroughs through publications, blogs, and podcasts, with the goal of building AI responsibly and making it accessible to everyone.

Launch HN: Grapevine (YC S19) – A company GPT that actually works

Grapevine is an AI agent that searches across a company's documents, code, and communication to provide answers to questions, with the goal of saving time and increasing productivity. The platform is secure, with features such as encryption and isolated databases, and can be set up in under 30 minutes, with the ability to learn and improve over time.

The (economic) AI apocalypse is nigh

The author believes that an economic apocalypse is imminent due to the AI bubble, which is driven by monopolists claiming that AI can replace human workers, despite the fact that AI companies have poor "unit economics" and are hemorrhaging money. The author thinks that the bubble will eventually burst, causing widespread economic damage, and that the only way to mitigate this is to puncture the bubble as soon as possible by revealing the false claims about AI's capabilities.

Research

The Missing Link Between the Transformer and Models of the Brain

The Dragon Hatchling (BDH) is a new Large Language Model architecture inspired by the brain's scale-free biological networks, offering strong theoretical foundations, interpretability, and performance comparable to Transformer models like GPT2. BDH's biologically plausible design, which relies on synaptic plasticity and Hebbian learning, allows for interpretability of state and demonstrates monosemanticity in language tasks, making it a promising approach to achieving Universal Reasoning Models.

The Command Line GUIde: Graphical Interfaces from Man Pages via AI

The command line shell remains a powerful tool in modern operating systems, but its text-based syntax can be daunting for users, whereas graphical interfaces offer a more intuitive way to discover and invoke actions. A system called GUIde uses AI to translate command line documentation into graphical interface specifications, automatically creating user-friendly interfaces for command line tools and making their functionality more accessible.

The Dragon Hatchling: The Missing Link Between the Transformer and the Brain

Expected Attention: KV Cache Compression by Estimating Attention

Memory consumption of the Key-Value cache is a significant bottleneck for large language model inference, and existing pruning methods face limitations due to inaccessible attention scores. The introduced Expected Attention method overcomes these challenges by estimating KV pairs' importance through predicted attention scores, enabling effective compression without performance degradation, and is made available along with KVPress, a library for implementing and benchmarking KV cache compression methods.

Mojo: MLIR-Based Performance-Portable HPC Science Kernels on GPUs

Mojo, a novel language based on LLVM's MLIR compiler infrastructure, achieves competitive performance with CUDA and HIP for memory-bound kernels on NVIDIA and AMD GPUs. However, performance gaps exist for atomic operations on AMD GPUs and fast-math compute-bound kernels on both AMD and NVIDIA GPUs, despite Mojo's potential to close significant gaps in the Python ecosystem for scientific computing and AI.

Code

Show HN: I built an open-source AI data layer that connects any LLM to any data

Bag of words is an open-source AI data layer that connects any large language model (LLM) to any data source, providing centralized context management, trust, observability, and control. It allows users to chat with their data to build charts, dashboards, and scheduled reports, and supports a wide range of LLM providers and data sources, including databases, warehouses, and services like Snowflake, BigQuery, and Tableau.

Show HN: XedOut (A Safari Extension filter for X.com)

XedOut is a Safari extension that uses AI-powered content analysis to filter unwanted content on X (formerly Twitter), allowing users to hide posts containing videos, images, or specific topics. The extension utilizes OpenAI's GPT-4o-mini model and can be customized with user-defined filtering criteria, providing a more personalized browsing experience.

Show HN: Open-source testing framework for AI agents with semantic validation

SemanticTest is a composable, pipeline-based testing framework for AI systems and APIs that allows users to build complex test scenarios using simple, reusable blocks with semantic validation. It provides a range of features, including composable blocks, pipeline architecture, and semantic evaluation using GPT-4, to make testing AI systems easier and more effective.

Self hosted LLM cost monitoring

Pulsecost-oss is an open-source proxy and dashboard for optimizing LLM (Large Language Model) usage costs, providing features such as logging, caching, and cost estimation. The platform offers a simple dashboard with KPIs, charts, and logs, and works with multiple databases, including Postgres, to help users take control of their LLM costs.

GPT-2 implementation in Modular MAX

The Modular MAX implementation of GPT-2 supports features like GPU acceleration, paged KV caching, and flash attention, and can be run using the command max serve --model openai-community/gpt2 --custom-architectures ../max-gpt-2. On an Nvidia RTX 5090 GPU, the model achieves token generation speeds of up to 250.1 tokens per second with a warm cache, significantly outperforming its cold cache performance.