Tuesday February 25, 2025

Apple plans to invest $500 billion in US AI servers, Claude 3.7 Sonnet offers agentic coding from the terminal, and Sift's new approach boosts LLMs to state-of-the-art accuracy on AIME2024.

News

Apple says it will add 20k jobs, spend $500B, produce AI servers in US

Apple plans to add 20,000 new jobs in the US and spend $500 billion over the next four years, with investments including a new server manufacturing facility in Texas to produce AI servers. The move is seen as an effort to mitigate the impact of US President Donald Trump's tariffs on goods imported from China.

Claude 3.7 Sonnet and Claude Code

Claude 3.7 Sonnet is a new hybrid reasoning model that combines the capabilities of a large language model with extended thinking, allowing it to produce near-instant responses or step-by-step thinking. The model is now available on all Claude plans and shows strong improvements in coding and front-end web development, with a new command line tool called Claude Code introduced for agentic coding, enabling developers to delegate substantial engineering tasks to Claude directly from their terminal.

Microsoft cancels leases for AI data centers, analyst says

Microsoft has canceled some leases for US data center capacity, totaling around a couple of hundred megawatts, according to TD Cowen, raising concerns that the company may be securing more AI computing capacity than it needs in the long term. This move comes despite Microsoft's pledge to invest $80 billion in computing capacity, sparking questions from Wall Street about the demand for AI over the longer term.

It's still worth blogging in the age of AI

The author argues that blogging is still worthwhile despite the rise of AI tools like ChatGPT, as it allows individuals to learn and think critically, and creates a durable record of their knowledge and expertise. While blogging may not lead to fame or a large following, it can be a valuable way to build a portfolio of writing and demonstrate one's expertise, which can be beneficial in career advancement and other professional contexts.

Google Co-Scientist AI fed previous paper with the answer in it

Google's AI Co-Scientist tool was reported to have solved a "superbug" problem in two days, but it was later revealed that the tool had been fed a previous paper by the research team that already contained the answer. This has raised questions about the tool's actual ability to make new scientific discoveries, with critics arguing that it is simply reassembling existing information rather than generating new insights.

Research

Computer Simulation of Neural Networks Using Spreadsheets (2018)

The article discusses the need to develop training methods for simulating neural networks in a spreadsheet environment, identifying various approaches to achieve this, including using add-ins, macros, and standard spreadsheet tools. It also explores the historical roots of computational neuroscience, highlighting key figures and models, such as Rashevsky and Pitts, and suggests that mastering these historical models is essential for acquiring neural simulation competences in a spreadsheet environment.

Sift: Grounding LLM Reasoning in Contexts via Stickers

Large language models can struggle with misinterpreting context, leading to errors in reasoning and calculation, but a new approach called Stick to the Facts (SIFT) can help by emphasizing key information and refining predictions. SIFT has been shown to improve performance across various models and benchmarks, including achieving a new state-of-the-art accuracy of 85.67% on the AIME2024 benchmark with the DeepSeek-R1 model.

Latent computing by biological neural networks: A dynamical systems framework

Neural circuits can maintain stable outputs despite individual neurons and populations exhibiting representational drift, suggesting a dynamical systems framework focused on latent processing units that enable robust coding and computation. This framework yields key attributes of neural computation, including the ability to generate high-dimensional dynamics from low-dimensional computations and maintain stable representations despite variable single-cell activity, providing a foundation for understanding how large neural systems perform robust computations.

Is this the simplest (and most surprising) sorting algorithm ever? (2021)

An extremely simple sorting algorithm is presented, which despite initial appearances of being incorrect, is proven to be correct. The algorithm is compared to other simple sorting algorithms and its unusual properties are analyzed.

TabulaROSA: Tabular OS Massively Parallel Heterogeneous Compute Engines (2018)

The traditional role of operating systems is evolving to accommodate diverse computing hardware, with a new model emerging where the controlling processor is separate from the compute engines performing computations. A proposed Tabular Operating System Architecture (TabulaROSA) uses database operations to manage resources, demonstrating up to 20x higher performance than Linux while handling massively parallel processes in simulations.

Code

DeepSeek Open Source FlashMLA – MLA Decoding Kernel for Hopper GPUs

FlashMLA is an efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and achieving high performance of up to 3000 GB/s and 580 TFLOPS. It can be installed and benchmarked using Python, and its usage involves importing the flash_mla module and utilizing functions such as get_mla_metadata and flash_mla_with_kvcache to perform computations.

DeepSeek open source DeepEP – library for MoE training and Inference

DeepEP is a communication library designed for Mixture-of-Experts (MoE) and expert parallelism, providing high-throughput and low-latency all-to-all GPU kernels, as well as support for low-precision operations. The library offers optimized kernels for asymmetric-domain bandwidth forwarding, low-latency kernels for inference decoding, and a hook-based communication-computation overlapping method, making it suitable for both training and inference tasks.

Show HN: Orra – The missing glue layer for production-ready multi-agent apps

Orra is a platform for building production-ready multi-agent applications that can handle complex real-world interactions, using intelligent reasoning to coordinate tasks across existing stacks, agents, and tools. It provides features such as smart pre-evaluated execution plans, domain grounding, durable execution, and automatic service health monitoring, allowing developers to create scalable and reliable multi-agent applications.

Muon Is Scalable for LLM Training

The Muon optimizer has been scaled up to train large language models, achieving 2× computational efficiency compared to AdamW, and a new 3B/16B-parameter Mixture-of-Expert model called Moonlight has been trained using Muon, achieving better performance with fewer training FLOPs than prior models. The Moonlight model and its distributed Muon implementation are open-sourced, and the pretrained model is available for download and use with the Hugging Face Transformers library.

Containers are bloated. We built a runtime tool to remove bloat and CVEs

BLAFS is a bloat-aware filesystem for container debloating that detects and removes unused files, reducing container size by up to 95% while maintaining functionality. It works by converting the container to the BLAFS filesystem, running profiling workloads to track file usage, and then debloating the container to retain only the used files.