Saturday — April 4, 2026

Switzerland's RISC-V initiative boosts ML/LLM training efficiency 100-fold, research shows LLMs encode decisions before reasoning, and Mold brings local, GPU-accelerated AI image generation without Python or cloud.

Interested in AI engineering? Let's talk

News

I built a frontpage for personal blogs

The Blogosphere feed features technical deep dives into AI development, notably Giles Thomas's series on building an LLM from scratch with a focus on float32 interventions. Simon Willison provides commentary on the cognitive impact of coding agents and the state of vulnerability research. Other relevant technical entries include discussions on Bayesian privacy and software engineering updates.

Apfel – The free AI already on your Mac

apfel is a Swift-based utility that exposes the native ~3B parameter LLM integrated into macOS Tahoe on Apple Silicon. It provides access to the model via a CLI, an interactive chat interface, and an OpenAI-compatible HTTP server, supporting features like tool calling, streaming, and JSON output. By wrapping the FoundationModels framework, it enables 100% local inference on the Neural Engine and GPU with a 4,096-token context window and zero per-token costs.

"Cognitive surrender" leads AI users to abandon logical thinking, research finds

Research introduces "cognitive surrender," a phenomenon where LLM users uncritically accept AI-generated outputs, often abandoning their own logical reasoning. Experiments demonstrated that users accepted faulty AI answers over 73% of the time, even exhibiting increased confidence in their responses. This tendency is exacerbated by time pressure and fluent AI outputs, while incentives and higher fluid IQ can mitigate it, underscoring that user performance directly correlates with the AI's quality.

The Subprime AI Crisis Is Here

The generative AI industry is facing a "Subprime AI Crisis" driven by fundamentally unsustainable economics. AI Labs and startups heavily subsidize user access to LLMs, with subscription costs significantly below actual compute expenses, leading to massive losses and reliance on continuous VC funding. As providers attempt to control costs through price increases or severe rate limits, users accustomed to cheap, unlimited access are reacting negatively, exposing the fragility of demand and the lack of a clear path to profitability across the AI value chain.

Switzerland hosts 'CERN of semiconductor research'

Switzerland is establishing itself as a key hub for open-source semiconductor research, akin to a "CERN" for chips. Swiss universities, including ETH Zurich, are utilizing the RISC-V open-source ISA to bypass the commercial restrictions of proprietary architectures, enabling the development of ultra-low-power semiconductors. This initiative has demonstrated 100-fold efficiency gains for ML and LLM training, directly addressing the increasing energy demands of advanced AI systems.

Research

When a reasoning LLM chooses, which comes first: thought or decision?

LLMs appear to encode action choices before textual deliberation, rather than thinking first. Evidence shows early-encoded decisions shape chain-of-thought, with a linear probe successfully decoding tool-calling decisions from pre-generation activations, often before any reasoning tokens. Causal activation steering further demonstrates that perturbing these decision directions inflates deliberation and flips behavior, which the subsequent chain-of-thought then rationalizes.

Mars Terraforming Research Roadmap

This research roadmap explores non-biological methods for warming Mars, assessing their feasibility, cost, and associated risks. It identifies three complementary tracks: solid-state greenhouse membranes for local warming, orbiting reflectors for key sites, and strengthening Mars' natural greenhouse effect for broader regional or global warming. Near-term priorities include on-Earth testing of engineered aerosol warming and bioplastic habitat production, alongside designing at-Mars process experiments. This early-stage research, which also supports understanding Mars' atmosphere and hazards to explorers, aims to keep open the option of extending life beyond Earth, contingent on factors like falling launch costs.

Reasoning models encode tool choices before they start reasoning

Research indicates that LLMs often encode action choices, such as tool-calling decisions, before textual deliberation begins. A linear probe successfully decodes these decisions from pre-generation activations, sometimes even before the first reasoning token. Causal evidence from activation steering shows perturbing these early decisions inflates deliberation and flips behavior, with the subsequent chain-of-thought frequently rationalizing the altered choice.

Embarrassingly Simple Self-Distillation Improves Code Generation

Simple Self-Distillation (SSD) enables LLMs to improve code generation using only their own raw outputs, without external verifiers or teachers. SSD involves sampling solutions from the model and then fine-tuning on these self-generated samples. This method improved Qwen3-30B-Instruct's pass@1 on LiveCodeBench v6 from 42.4% to 55.3%, generalizing across various Qwen and Llama models. The gains are attributed to SSD resolving a precision-exploration conflict in LLM decoding by adaptively reshaping token distributions.

Why do we do astrophysics?

This paper examines the implications of LLMs acquiring capabilities in designing, executing, writing up, and refereeing scientific projects within astrophysics data science. It proposes foundational "points of agreement" for the astrophysics profession, explores its various benefits, and critically analyzes two extreme policy recommendations ("let-them-cook" and "ban-and-punish") for LLM integration, advocating for the development of moderate policies.

Code

Travel Hacking Toolkit – Points search and trip planning with AI

This toolkit facilitates AI-powered travel hacking by integrating with LLM platforms like OpenCode and Claude Code. It utilizes MCP servers for real-time data from services such as Skiplagged and Trivago, alongside markdown-based "skills" that define API interactions for services like Seats.aero and AwardWallet. The AI assists users in optimizing travel bookings by searching award availability, comparing cash prices, and checking loyalty balances to determine the best points-versus-cash strategy.

Find out if your SaaS is underpriced in 10 minutes

Right Suite is an AI-powered platform comprising seven tools designed to validate go-to-market decisions, including pricing, messaging, audience, and ad creative. It leverages an AI simulation engine to generate synthetic buyer interactions, providing rapid, structured feedback and actionable recommendations. This approach significantly shortens the GTM feedback loop, enabling founders and growth teams to optimize strategies in minutes rather than weeks.

OpenUMA – bring Apple-style unified memory to x86 AI inference (Rust, Linux)

OpenUMA is a Rust middleware that optimizes AI and LLM inference on shared memory hardware like AMD APUs and Intel iGPUs. It automatically detects hardware, intelligently partitions memory between iGPU and CPU, and utilizes zero-copy DMA-BUF for efficient data transfers. By generating optimal configurations for engines such as llama.cpp, Ollama, and KTransformers, OpenUMA significantly enhances performance by strategically allocating components like attention layers to the iGPU and leveraging unified memory for the KV cache.

agenteval – static analysis for AI coding instruction file

agenteval is a linter, benchmarker, and CI gate for AI coding instructions, designed for a technical audience working with LLMs. It helps validate and measure the effectiveness of instruction files (e.g., CLAUDE.md, AGENTS.md) by catching static quality issues, building performance benchmarks from git history, running agent evaluations, comparing results, and preventing regressions in CI. The tool supports various instruction formats and operates as a self-contained binary.

Mold – local AI image generation CLI (FLUX, SDXL, SD1.5, 8 families)

Mold is a Rust-based CLI tool enabling local, GPU-accelerated text-to-image generation without Python or cloud dependencies, supporting NVIDIA (CUDA) and Apple Silicon (Metal). Built on candle for pure Rust ML, it offers features like txt2img, img2img, LoRA, ControlNet, and prompt expansion using a local LLM. It supports diverse model families including FLUX, SDXL, and SD3, and provides a REST API, Discord bot, and TUI for flexible interaction and deployment.