Wednesday — December 24, 2025

Local AI drives PC redesign with NPUs and unified memory, JustRL scales 1.5B LLMs with a simple RL recipe, and AudioGhost AI runs SAM-Audio on consumer GPUs.

News

Inside CECOT – 60 Minutes [video]

The Internet Archive hosts a 15:04 minute video titled "Inside CECOT | 60 Minutes" by CBS News, published on 2025-12-22. This segment, featuring Sharyn Alfonsi's report, is noted as having been censored by Bari Weiss and originally appeared on Canada's Global TV app, available for free streaming and download.

Local AI is driving the biggest change in laptops in decades

Running LLMs locally on current PCs is challenging due to insufficient CPU/GPU/NPU power and fragmented memory architectures. The industry is addressing this by integrating specialized, power-efficient NPUs into SoCs, leading to a rapid increase in TOPS. A crucial architectural shift involves adopting unified memory, allowing CPU, GPU, and NPU to share a large, single memory pool, improving efficiency for memory-intensive AI models. Software platforms like Windows AI Foundry Local are optimizing AI workload distribution across these heterogeneous processors and providing advanced APIs for RAG and LoRA, signaling a fundamental reinvention of PC design for pervasive local AI.

Show HN: Kapso – WhatsApp for developers

Kapso provides a Meta-official WhatsApp integration, emphasizing a first-class developer experience with a type-safe client and webhooks for messaging. The platform features AI-powered automation via visual workflows and AI agents that understand context, connect to APIs, and automate customer support. Developers can also build mini-apps using WhatsApp Flows with AI assistance and deploy custom serverless JavaScript functions for advanced webhook processing and API integrations.

Nature Is Laughing at the AI Build Out

The author argues that the current AI/LLM build-out is an energy-inefficient "IBM 7090 era" of computing, characterized by power-hungry GPUs and costly cloud hosting. They foresee a future where AI compute integrates into all devices, enabling ubiquitous, on-device models rivaling human intelligence with drastically reduced power and cost. This evolution, driven by more efficient model architectures and hardware, will decentralize AI capabilities and likely lead to overinvestment in current data center infrastructure, power, and GPU vendors.

What Is (AI) Glaze?

Glaze is a system that applies adversarial perturbations to digital art, making subtle changes imperceptible to human eyes but causing generative AI models to perceive a dramatically different style. This technique aims to protect human artists from style mimicry, preventing AI models, even after fine-tuning or LoRA, from accurately replicating their unique artistic identity. While robust against common image manipulations and updated against known attacks, Glaze acknowledges its limitations as a non-permanent solution and its reduced effectiveness against styles already deeply embedded in base models.

Research

Minimizing Hyperbolic Embedding Distortion with LLM-Guided Hierarchy Structuring

Hyperbolic embeddings are effective for hierarchical data, but their quality is highly dependent on the input hierarchy's structure, specifically requiring a high branching factor and single inheritance. This paper proposes a prompt-based approach utilizing LLMs to automatically restructure existing hierarchies to meet these optimal criteria. Experiments demonstrate that LLM-restructured hierarchies consistently yield higher-quality hyperbolic embeddings and enable explainable reorganizations for knowledge engineers.

Memelang: An Axial Grammar for LLM-Generated Vector-Relational Queries

This paper introduces axial grammar for structured generation in LLM tool use, enabling compact, deterministically parsable intermediate representations (IRs). This grammar recovers multi-dimensional structure from linear token sequences via rank-specific separators, allowing a single left-to-right pass for coordinate assignment and parsing without complex surface syntax. Memelang, an LLM-emittable query language built on axial grammar, uses fixed coordinate roles for table/column/value slots, supports features like coordinate-stable relative references and implicit context carry-forward, and compiles to parameterized PostgreSQL SQL.

Executable governance for AI: turning policy text into runnable tests

The P2T framework addresses the challenge of converting prose-based AI policy guidance into executable, machine-readable rules, a process typically slow and error-prone. It employs a pipeline and a compact DSL to encode policy elements like hazards and conditions, generating rules that closely match human baselines across diverse policy types. Downstream evaluation demonstrated P2T's impact by applying HIPAA-derived safeguards to a generative agent, where an LLM-based judge measured reduced violation rates and improved robustness against obfuscated and compositional prompts compared to an unguarded agent.

Memelang: An Axial Grammar for LLM-Generated Vector-Relational Queries

JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

JustRL presents a minimal RL approach for LLMs, utilizing single-stage training with fixed HPs to achieve SOTA performance on 1.5B reasoning models using 2x less compute. It challenges the necessity of complex multi-stage pipelines and dynamic HPs, demonstrating that "standard tricks" can even degrade performance by collapsing exploration. The work suggests that current complexity in RL for LLMs might be addressing issues that disappear with a stable, scaled-up baseline.

Code

Agent Skills

Agent Skills introduce a modular, composable architecture for AI agents, enabling on-demand knowledge injection through standardized SKILL.md packages. This approach leverages progressive disclosure to scale agent capabilities infinitely without bloating context windows or requiring expensive fine-tuning, significantly reducing token usage. Adopted as an open standard by major platforms, Agent Skills facilitate the development of general-purpose agents with dynamic specializations, promoting cross-platform portability and frictionless distribution of agent functionalities.

Show HN: VeriMed – open-source medical license verification

VeriMed is an open-source API providing a hybrid global medical provider verification engine to combat healthcare fraud. It integrates with 5+ official government APIs for primary-source validation and leverages AI (OpenAI Vision with BYOK architecture) for document-based credential analysis in countries lacking public registries. The platform offers features like fuzzy identity validation, batch processing, webhook notifications, DEA/sanctions checks, and is production-ready with Docker/Kubernetes support and robust security.

Sparse-ternary-fma – 23x faster AVX-512 kernel for FHE&AI, from my ePrint paper

The sparse-ternary-fma kernel is a C library that optimizes polynomial multiplication for TFHE and low-precision AI, addressing the inefficiency of ternary secret keys. It achieves significant performance gains through 2-bit ternary encoding for data density, sparse processing for common FHE key distributions, and AVX-512 SIMD acceleration for FMA operations. This results in 2.38x throughput and 26.12x latency improvements, enabling more efficient client-side FHE and advanced AI applications.

A semantic POP-style framework for structuring AI-assisted programs

Theus is a Python framework introducing Process-Oriented Programming (POP) to manage state complexity in systems like AI Agents by treating applications as deterministic workflows. It addresses common state management issues such as implicit mutations and race conditions. Theus enforces architectural invariants through a 3-Axis Context Model for state, Zero-Trust Memory with default-deny access and immutability, and industrial-grade audit capabilities, ensuring robust and auditable application logic.

Show HN: AudioGhost AI – Run Meta's Sam-Audio on Consumer GPUs (4GB-6GB VRAM)

AudioGhost AI is an object-oriented audio separation tool that leverages Meta's SAM-Audio model for text-guided sound extraction or removal. It features a memory-optimized "Lite Mode" that significantly reduces VRAM usage by disabling unused model components and employing bfloat16 precision. The system utilizes a FastAPI backend, Celery task queue, and a Next.js frontend, with future plans for video support and visual prompting via SAM 3 integration.