Saturday — December 20, 2025

AI-assisted reverse engineering exposes TP-Link camera flaws, BRAID enhances LLM reasoning with Mermaid graphs, and Linggen provides a local-first memory layer for AI tools.

News

TP-Link Tapo C200: Hardcoded Keys, Buffer Overflows and Privacy

AI-assisted reverse engineering, leveraging LLMs like Grok and Claude with tools like GhidraMCP, uncovered multiple pre-authentication vulnerabilities in the TP-Link Tapo C200 camera. These include hardcoded SSL private keys enabling MitM, memory and integer overflows causing DoS via ONVIF SOAP XML and HTTPS Content-Length parsing, and unauthenticated APIs (connectAp, scanApList) allowing WiFi hijacking, DoS, and precise camera geolocation via BSSID enumeration. The disclosure process highlighted a conflict of interest with TP-Link acting as a CNA.

LLM Year in Review

2025 was a transformative year for LLMs, marked by the emergence of Reinforcement Learning from Verifiable Rewards (RLVR) as a new training paradigm that fostered reasoning capabilities and consumed significant compute. This led to a deeper understanding of LLM intelligence as "jagged" or "ghost-like," excelling in verifiable domains but challenging benchmark reliability. The year also saw the rise of specialized LLM app layers (e.g., Cursor) and powerful local agents (e.g., Claude Code) leveraging private context. Additionally, "vibe coding" democratized software creation, and models like Gemini Nano banana foreshadowed future LLM GUIs integrating multimodal generation.

Using AI Generated Code Will Make You a Bad Programmer

The author contends that relying on AI to generate code, as opposed to using it as a learning aid, hinders a developer's growth and leads to becoming a "bad programmer." This practice is argued to rob individuals of crucial learning opportunities, cause existing coding skills to atrophy, and foster dependency on tools ultimately aimed at replacing human developers. Additional concerns include the legal ambiguity surrounding ownership of AI-generated code and the erosion of professional pride and respect for one's craft. The author suggests corporations promote AI coding tools to reduce reliance on human programmers.

We ran Anthropic’s interviews through structured LLM analysis

An LLM-powered analysis of 1,250 interviews on AI adoption found that 85.7% of users experience unresolved tensions, highlighting cognitive dissonance as a common state rather than a barrier. Creatives exhibit high identity threat and meaning disruption despite increasing AI use, while scientists thrive by treating AI as a tool and rigorously verifying outputs. The primary trust destroyer identified was AI hallucinations, specifically their confident inaccuracy.

Engineers who dismiss AI

The article critiques engineers who dismiss modern AI coding tools based on outdated experiences, often citing poor performance from 2022. It argues that contemporary LLM-powered tools like Claude Code and Cursor have dramatically improved, now capable of codebase-wide understanding and complex refactoring. The author asserts that while these tools are imperfect, engineers who refuse to adopt them risk falling behind peers who leverage them for increased productivity.

Research

GEMM Performance Optimization Across Generations of Ryzen AI NPUs

This paper presents a systematic methodology for optimizing GEMM workloads on AMD's Ryzen AI XDNA and XDNA2 NPUs, crucial for improving DL performance. By exploiting unique architectural features and addressing system-level bottlenecks, the work achieves state-of-the-art throughputs of up to 38.05 TOPS for int8 and 14.71 TOPS for bf16 precision on XDNA2.

Agnosticism About Artificial Consciousness

The text argues that assessing AI consciousness must adhere to Evidentialism, relying solely on scientific evidence. It contends that current evidence is insufficient to determine if AI can have conscious experiences, advocating for an agnostic stance. Both biological and functional views are critiqued for overestimating what scientific data, primarily derived from conscious organisms, can tell us about artificial consciousness.

Braid: Bounded reasoning for LLMs using symbolic Mermaid graphs

This paper introduces BRAID, a structured prompting framework utilizing Mermaid-based instruction graphs to enable LLMs to reason structurally, addressing the nonlinear relationship between performance, cost, and token usage. Evaluated across multiple GPT model tiers on benchmarks like AdvancedIF and GSM-Hard, BRAID significantly enhances reasoning accuracy and cost efficiency for autonomous agents in production systems, establishing it as a scalable inference optimization technique.

Evaluating Large Language Models in Scientific Discovery

A new scenario-grounded benchmark, the SDE framework, evaluates LLMs on iterative scientific discovery tasks across biology, chemistry, materials, and physics, moving beyond decontextualized knowledge tests. It assesses models at both question and project levels, requiring hypothesis generation, experiment design, and result interpretation. Initial evaluations reveal a consistent performance gap relative to general science benchmarks, diminishing returns from scaling, and systematic weaknesses across top LLMs, suggesting current models are far from scientific "superintelligence" despite showing promise in guided discovery.

Advancing Low-Light Raw Enhancement by Retasking Diffusion Models for Camera ISP

Low-light image enhancement, traditionally handled by camera ISPs, is increasingly leveraging deep networks, but existing regression-based models often oversmooth images. While recent attempts with diffusion models trained from scratch show promise, they still struggle with detail recovery and color accuracy. This work introduces a novel framework that retasks pre-trained generative diffusion models with the camera ISP to enhance low-light raw images, achieving state-of-the-art perceptual quality.

Code

Show HN: I open-sourced my Go and Next B2B SaaS Starter (deploy anywhere, MIT)

This enterprise-grade SaaS boilerplate is built with Next.js 16 and Go 1.25, featuring a robust backend that leverages PostgreSQL with pgvector for vector similarity search. It integrates the OpenAI API for LLM capabilities, including a RAG pipeline and vector embeddings, and utilizes Mistral AI for OCR services. Key features include multi-tenancy, RBAC, billing, and a ready-to-use AI & RAG pipeline.

Show HN: Linggen – A local-first memory layer for your AI (Cursor, Zed, Claude)

Linggen is a local-first application that indexes codebases and tribal knowledge to provide persistent, cross-project context for AI tools like Cursor, Zed, and Claude. It bridges the "context gap," enabling AIs to understand architecture, dependencies, and long-term decisions without manual input. Utilizing LanceDB for on-machine vector search, Linggen ensures data privacy and offers features like a System Map for dependency visualization, with users interacting via MCP-enabled IDEs to dynamically load relevant memory.

Building Blocks for Agents in C++

agent.cpp is a C++ library for building local LLM agents, leveraging llama.cpp for on-device inference with GGUF models. It provides core components like an agent loop for orchestrating model calls and tool executions, configurable instructions, and callbacks for lifecycle hooks, context manipulation, and human-in-the-loop interactions. The library supports custom tool integration with defined schemas and execution functions, and offers straightforward CMake integration with support for llama.cpp's hardware acceleration.

We built a universal installer for agent skills based on the new open standard

AI Agent Skills is a universal repository centralizing diverse capabilities for AI agents, addressing the fragmentation of skills across platforms. It offers a npx command-line tool to easily install skills, such as frontend-design or pdf, for various compatible agents including Claude Code, Cursor, and GitHub Copilot. All skills adhere to the open Agent Skills specification, and users can also create custom skills following this standard.

Show HN: Helix – AI-powered API mocking with strict schema enforcement

Helix is an AI-powered API mocking server that generates realistic, schema-compliant data. It leverages various LLM providers, including DeepSeek, Groq, and local Ollama models, to produce mock responses that strictly adhere to user-defined schemas, ensuring structural consistency. Key features include context-aware sessions, Redis caching, chaos engineering, and a comprehensive CLI for streamlined setup and management, enabling efficient development and testing.