Monday — June 17, 2024

Meta scales to 600,000 GPUs with OpsPlanner managing a million operations daily, Stanford's TextGrad boosts Google-Proof QA accuracy by 4%, and NumPy 2.0.0 marks its first major release since 2006.

News

Maintaining large-scale AI capacity at Meta

It will surprise absolutely no one that Meta has developed one of the largest AI training infrastructures globally. They are set to scale to 600,000 GPUs within a year and are running thousands of training jobs daily (!). AI training at Meta requires strict capacity guarantees, low interruption rates, and host consistency. To maintain performance, Meta uses bespoke hardware, high-speed backend networks, and a flexible software stack (basically everything is custom). Maintenance is handled through a technique called “maintenance trains”, ensuring minimal capacity downtime and gradual rollouts to avoid disruptions. Theur OpsPlanner orchestrator manages up to a million operations daily! Ensuring upgrades are safely applied.

OpenAI and Microsoft Azure to deprecate GPT-4 32K

Erik Meijer (the Dutch computer scientist) has raised concerns on X about OpenAI's quiet removal of GPT-4 32K mentions and its apparent lack of availability. He also notes that Azure will deprecate GPT-4 32K in September. While GPT-4 Turbo and other models like Claude have 128k-token input contexts, their output context remains limited to 4K tokens!

McDonald's will stop testing AI to take drive-thru orders, for now

McDonald’s is ending its AI drive-thru ordering partnership with IBM by July 26th, 2024 (thank goodness) After testing the system in over 100 restaurants, the company still believes in a future for voice-ordering solutions despite pulling the plug on IBM. This move might pave the way for a collaboration with Google, who has a pending deal that includes an employee guidance chatbot named "Ask Pickles." Meanwhile, other fast-food chains like White Castle, Carl’s Jr., and Hardee’s are also experimenting with AI drive-thru systems, often supported by remote human operators. Beyond drive-thru AI, McDonald’s continues to innovate with mobile ordering, in-store kiosks, drone deliveries, kitchen robots, and AI hiring tools.

Research

Creativity Has Left the Chat: The Price of Debiasing Language Models

This paper examines the impact of RLHF on the creativity of LLMs like Llama-2, focusing on syntactic and semantic diversity. The results show that aligned models produce less diverse outputs, form distinct clusters in embedding space, and tend toward "attractor states," indicating reduced creativity. These results are pretty intuitive to anyone using LLMs daily but is especially problematic for marketers using LLMs for tasks that require high creativity, such as copywriting and ad creation. The balance between consistency and creativity is hard to strike.

TextGrad: Automatic "Differentiation" via Text

Combining multiple LLMs together (commonly known as “chaining”) results in highly complex non-deterministic systems. To address optimization challenges for these compound AI systems, a new framework from Stanford called TextGrad is introduced. TextGrad utilizes textual feedback from LLMs to optimize various components, making it flexible and simple to use with PyTorch-like syntax. It can handle diverse tasks without requiring tuning and improves outcomes noticeably: zero-shot accuracy on Google-Proof QA from 51% to 55%, a 20% gain on LeetCode-Hard coding problems, better prompts for reasoning, novel drug molecule design, and radiation treatment plans. This framework sets the stage for advancing next-generation AI systems.

Depth Anything V2

Depth Anything V2 improves on its predecessor (a SOTA monocular depth estimation model) by focusing on three main practices: using synthetic images instead of labeled real images, increasing the capacity of the teacher model, and utilizing large-scale pseudo-labeled real images for student model training. These enhancements result in depth predictions that are finer and more robust. Compared to models based on Stable Diffusion, Depth Anything V2 is over 10 times faster and more accurate. The models range from 25M to 1.3B parameters, and they show strong generalization capabilities that are further refined with metric depth labels. Additionally, a new evaluation benchmark with precise annotations and diverse scenes has been created to support future research and address current test set limitations.

Code

NumPy 2.0.0

NumPy is an essential package for scientific computing in Python. For those who don’t know it offers robust N-dimensional array object, advanced broadcasting functions, and tools for integrating with C/C++ and Fortran code. NumPy 2.0.0 is the first major release since 2006. It is the result of 11 months of development since the last feature release and is the work of 212 contributors spread over 1078 pull requests. It contains a large number of exciting new features as well as changes to both the Python and C APIs.

Highly realistic talking head video generation

"Halo" is a project that focuses on hierarchical audio-driven visual synthesis for portrait image animation. It takes a source image and driving audio as inputs to generate animated portraits. The framework employs various pretrained models including denoising UNet, face locator, and motion modules to achieve realistic animations. The output is a video file that visually syncs the input image with the audio.

Enterprise-add: Number addition in C using gradient descent

And finally something funny! This project uses a simple machine learning model written in pure C99 to predict the sum of two numbers, a + b. Despite its humorous intentions, it’s actually a pretty good resource that demonstrates basic ML operations.