Monday — December 29, 2025
AI demand for DRAM pushes device prices up, automated multi-turn jailbreaks expose LLM vulnerabilities, and a Z80-μLM fits in 40KB.
Interested in AI engineering? Let's talk
News
AI Slop Report: The Global Rise of Low-Quality AI Videos
AI-generated "slop" and "brainrot" videos are rapidly proliferating on platforms like YouTube, characterized by low-quality, often nonsensical content designed to grab attention and farm views. A study analyzed trending YouTube channels globally and new user feeds, finding Spain leads in AI slop channel subscribers (20.22M) and South Korea in views (8.45B). Notably, 33% of a new YouTube Shorts feed consisted of brainrot videos. This content poses challenges for legitimate creators, advertisers, and the information ecosystem, potentially eroding trust and mental faculties due to its pervasive nature and the "illusory truth effect."
Rich Hickey: Thanks AI
Rich Hickey critically assesses current AI/LLM advancements, prompted by a "sycophantic" email from an LLM. He accuses "AI purveyors" of pirating creative works, degrading education, harming the environment, wasting developer resources, eliminating entry-level jobs, and replacing human interaction with ineffective AI. Hickey argues that "agentic 'AI'" will flood communication channels with "BS," ultimately creating more problems than it solves and representing a significant "con."
As AI gobbles up chips, prices for devices may rise
AI's escalating demand for memory chips, particularly for cloud computing and data centers, has created a significant shortage of DRAM, with demand exceeding supply by 10%. This imbalance is driving sharp price increases for memory chips, projected to continue into 2026. Chipmakers are prioritizing high-end memory for AI workloads, reducing availability for other devices like PCs and smartphones, which will likely see price hikes. This structural shift in demand, driven by the inherent memory requirements of AI training and inference systems, has no short-term production solution.
2 in 3 Americans think AI will cause major harm to humans in the next 20 years [pdf] (2024)
Pew Research Center data highlights a perception gap regarding AI: U.S. adults are predominantly concerned (51%) about its increased use, whereas AI experts are more excited (47%). Experts are significantly more optimistic about AI's long-term positive impact on the U.S., particularly in medical care and productivity, contrasting with the public's mixed to negative outlook. Both groups foresee job displacement, notably for cashiers and software engineers, and express a strong desire for greater control over AI's application, coupled with low confidence in government and corporate regulation. Top concerns for adults include AI impersonation and personal data misuse, while experts also prioritize inaccurate information and bias. AI chatbot awareness and usage are near-universal among experts, who find them highly beneficial, unlike the general public's lower adoption.
'PromptQuest' is the worst game of 2025 (trying to make chatbots work)
The article criticizes the current state of AI chatbots, likening the effort to craft effective prompts to playing frustrating 1980s text adventure games, dubbed "PromptQuest." The author highlights issues such as inconsistent responses from the same prompt, varying behavior across different chatbot versions (e.g., Microsoft Copilot), and the need to constantly adapt prompts as underlying LLM models update. This "PromptQuest" experience, characterized by trial-and-error to achieve desired outputs, undermines the promised productivity benefits of AI.
Research
Designing Predictable LLM-Verifier Systems for Formal Method Guarantee
This work introduces an LLM-Verifier Convergence Theorem, providing the first formal framework with provable termination guarantees for multi-stage software verification pipelines leveraging LLMs. It models the interaction as a sequential absorbing Markov Chain across four engineering stages, proving almost sure convergence to a verified state and deriving a precise latency bound of $\mathbb{E}[n] \leq 4/\delta$. Extensive empirical validation confirms the theoretical predictions, enabling predictable resource planning and performance budgeting for safety-critical software.
Automating Deception: Scalable Multi-Turn LLM Jailbreaks
Multi-turn conversational attacks, leveraging psychological principles like FITD, bypass LLM safety alignments, a problem exacerbated by manual dataset creation. This work introduces an automated pipeline to generate large-scale, psychologically-grounded multi-turn jailbreak datasets, operationalizing FITD into 1,500 scenarios. Evaluation of seven LLMs revealed GPT family models are significantly vulnerable to conversational history (up to 32% ASR increase), while Google's Gemini 2.5 Flash showed exceptional resilience and Anthropic's Claude 3 Haiku strong resistance, highlighting critical differences in how safety architectures handle context.
Memelang: Terse SQL uses "axial grammar" for LLM generation
This paper introduces axial grammar for structured generation in LLM tool use, enabling compact, deterministically parsable intermediate representations (IRs). This grammar recovers multi-dimensional structure from linear token sequences via rank-specific separators, allowing a single left-to-right pass for coordinate assignment and parsing without complex surface syntax. Memelang, an LLM-emittable query language built on axial grammar, uses fixed coordinate roles for table/column/value slots, supports features like coordinate-stable relative references and implicit context carry-forward, and compiles to parameterized PostgreSQL SQL.
Beyond Context: Large Language Models Failure to Grasp Users Intent
Current LLM safety mechanisms are systematically vulnerable due to their inability to understand context and user intent, allowing circumvention via techniques like emotional framing and academic justification. Empirical evaluation of SOTA LLMs showed reasoning-enabled configurations amplified exploitation by increasing factual precision without addressing intent, highlighting a critical need for architectural shifts towards contextual understanding and intent recognition as core safety capabilities, with Claude Opus 4.1 being a notable exception.
A Profit-Based Measure of Lending Discrimination
Researchers introduce a profit-based measure to audit algorithmic lending for discrimination in loan pricing. Applying this to a fintech platform's personal loans, they found that loans to men and Black borrowers yielded lower profits, indicating favorable terms. This disparity was attributed to underwriting model miscalibration, specifically underestimating credit risk for Black borrowers and overestimating it for women. The study suggests explicitly including race and gender could correct these disparities, illustrating a tension between competing notions of fairness.
Code
Show HN: Self-growing neural networks via a custom Rust-to-LLVM compiler
NOMA is an experimental systems programming language for ML that implements reverse-mode automatic differentiation as a compiler pass (LLVM IR) rather than a runtime library. It treats training loops as first-class language constructs and model parameters as explicit, growable memory buffers. This design enables dynamic topology changes, like network growth, during training while preserving optimizer state, and compiles to small, standalone native binaries.
Show HN: Z80-μLM, a 'Conversational AI' That Fits in 40KB
Z80-μLM is a retrocomputing micro language model designed to run on a Z80 processor with 64KB RAM, fitting into a 40KB .COM binary. It leverages quantization-aware training (QAT) and features 2-bit weight quantization and 16-bit integer inference without floating-point math. The model uses trigram hash encoding for input, making it typo-tolerant and word-order invariant, and generates autoregressive, character-by-character responses. While not a general chatbot, it demonstrates personality through terse, context-limited replies, showcasing the feasibility of highly constrained LLM-like systems.
Julie – an open-source, screen-aware multimodal desktop AI assistant
Julie is an open-source, lightweight, screen-aware desktop AI assistant designed to reduce context switching by understanding current screen content. It leverages Groq (Llama 3 70B & Llama 4 Scout) for instant, reasoning-heavy responses, allowing users to interact via voice or text without breaking focus. Positioned as a practical, non-agentic tool, it offers an invisible interface and vision capabilities for macOS and Windows.
Terence Tao: AI contributions to Erdős problems
The Erdős problem database is a community-curated collection of 1120 mathematical problems, with 637 currently open. Notably, AI tools have assisted in solving some problems, and 273 problem statements are formalized in Lean, a proof assistant, with 32 proofs or disproofs also formalized in Lean. The database also integrates with the OEIS.
AI Contributions to Erdős Problems
This community database catalogs 1120 Erdős problems, detailing their status (e.g., 274 proved, 101 disproved, 637 open). Notably for AI/LLM researchers, several problems have received assistance from AI tools, and 273 problem statements are formalized in Lean, with 32 proofs/disproofs also formalized in Lean. The project actively seeks community contributions, especially for linking problems to OEIS sequences.