Friday — June 21, 2024

Claude 3.5 Sonnet launches with superior intelligence and affordability, Octomind ditches LangChain for modular AI agents, and torchtune offers a streamlined PyTorch-native LLM fine-tuning library.

News

Claude 3.5 Sonnet

Claude 3.5 Sonnet has launched, offering significant improvements in intelligence, speed, and cost-efficiency. This model surpasses Claude 3 Opus across various evaluations, including graduate-level reasoning, undergraduate-level knowledge, and coding proficiency. Claude 3.5 Sonnet is available free on Claude.ai, the Claude iOS app, and for subscribers via Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI.

Octomind no longer use LangChain for building AI agents

At Octomind, AI agents with multiple LLMs are utilized to create and fix end-to-end tests in Playwright. Initially using the LangChain framework, they encountered issues with its high-level abstractions, which complicated their code and hindered productivity. LangChain's rigid design forced the team to dive into its internals, making maintenance difficult. By shifting to modular building blocks, Octomind significantly simplified their codebase.

Optimizing AI Inference at Character.ai

Character.AI is working towards AGI by developing efficient inference systems for LLMs. Their model architecture and inference stack innovations allow them to serve over 20,000 queries per second, roughly 20% of Google Search's volume. Key techniques include Multi-Query Attention, Hybrid Attention Horizons, and Cross Layer KV-sharing, which reduce GPU memory bottlenecks significantly. Their stateful caching system achieves a 95% cache rate, and they employ int8 quantization for both training and serving to further cut costs. These advancements have reduced their serving costs by a factor of 33 since late 2022.

Research

RAR-B: Reasoning as Retrieval Benchmark

Retrievers struggle with reasoning problems when they are not specifically trained for such tasks, though current decoder-based models show potential for improving reasoning-level language understanding. Instruction-aware IR models perform better without instructions during inference for reasoning tasks, indicating a retriever-LLM behavioral gap. Fine-tuning re-ranker models for reasoning tasks results in state-of-the-art performance, outperforming bi-encoders. The Reasoning as Retrieval Benchmark (RAR-b) is introduced to evaluate retriever models' reasoning capabilities, providing a comprehensive suite of tasks and settings.

Lossless Visualization of 4D Compositional Data on a 2D Canvas

The simplex projection method allows 4D compositional data to be visualized losslessly on a 2D canvas, extending the previously limited 3D compositional data capability. The method is applicable to individual data points, point clouds, and continuous probability density functions on simplices. The approach includes rigorous proofs verifying its applicability to compositional data of any finite dimensionality.

AppealMod: Induce Friction to Reduce Moderator Workload of Handling User Appeals

AppealMod, designed through a collaborative process with Reddit moderators, aims to assist with user ban appeals by incorporating friction into the appeals process. This system prompts users to provide more information before their appeals are reviewed, making it easier for moderators to manage their workload and reduce exposure to toxic content. A randomized field experiment in a large Reddit community showed that moderators processed only 30% of initial appeals and less than 10% of toxically worded ones while maintaining a similar approval rate compared to the control group. AppealMod effectively reduces workload and exposure to negativity, preserving moderators' desire for direct engagement and decision-making autonomy.

A Survey of LLMs for Financial Applications: Progress, Prospects and Challenges

This survey reviews the potential of LLMs in revolutionizing financial tasks by enhancing traditional practices and fostering innovation. It categorizes existing literature into key areas like linguistic tasks, sentiment analysis, financial time series, financial reasoning, and agent-based modeling. The survey delves into methodologies including textual and knowledge-based analysis, forecasting, data augmentation, planning, decision support, and simulations, while providing a comprehensive collection of datasets, model assets, and codes. It finally outlines future research challenges and opportunities, underscoring specific aspects crucial for advancing LLM adoption in finance.

Code

Eidos – Offline Alternative to Notion

Eidos is a browser-based framework for managing personal data with PWA and offline support. It integrates AI features using LLMs for functionalities like translation and summarization, even when offline. Eidos is highly extensible, allowing customizations via extensions. It's developer-friendly with an API, SDK, and SQLite standardization for all tables.

Axolotl: A tool to fine-tune AI models

Axolotl is a versatile tool designed for fine-tuning various AI models, supporting Huggingface models like llama, pythia, falcon, and mpt. It includes multiple configurations such as fullfinetune, lora, qlora, relora, and gptq, and allows users to customize setups via yaml or CLI. It can handle different dataset formats and integrates advanced optimizations like xformer, flash attention, rope scaling, and multipacking, supporting single and multiple GPU setups via FSDP or Deepspeed. It also facilitates easy logging of results and checkpoints to wandb or mlflow.

Torchtune: A Native-PyTorch Library for LLM Fine-Tuning

torchtune is a PyTorch-native library aimed at simplifying the fine-tuning and experimentation with LLMs. It offers modular building blocks, training recipes for techniques like LoRA and QLoRA, and YAML configurations for various workflows. Key integrations include Hugging Face Hub, EleutherAI's LM Eval Harness, PyTorch FSDP, and more. The library supports a range of models, including Llama3 and Llama2, with fine-tuning recipes for both distributed and single-device setups. It emphasizes simplicity, extensibility, and correctness, providing well-tested components and extensive configurations for memory-efficient setups.