Sunday — June 16, 2024

Amazon bypasses GitHub limits for fast data scraping, Nvidia’s Nemotron-4 340B rivals GPT-4 in synthetic data generation, and TextGrad boosts AI system optimization with textual feedback.

News

Perplexity AI is lying about their user agent

Robb Knight set up server-side measures to block AI bots like Perplexity from accessing their content, using robots.txt and nginx configurations to return a 403 status. Despite these blocks, Perplexity still managed to summarize a post. Further investigation revealed Perplexity was using a generic Chrome user-agent string, likely via headless browsers, bypassing blocks. Robb confirmed this non-compliance by checking server logs. Next steps might involve a GDPR request…

Making my local LLM voice assistant faster and scalable with RAG

Optimizing local LLM-based smart home assistants involves addressing the bottleneck in the "prefill" stage of LLM inference, which suffers from quadratic latency increases with large context sizes. To mitigate this, leveraging RAG (Retrieval Augmented Generation) can streamline the prompt lengths by embedding and dynamically retrieving only relevant parts of the full smart home state. This approach not only speeds up the response time but also allows for scalability. The implementation involved creating a RAG API to manage embeddings, using ollama's and mxbai-embed-large models, and dynamically generating in-context examples. This resulted in more efficient and meaningful prompts for the LLM, significantly reducing latency while maintaining prompt relevance.

Amazon has a way to scrape GitHub and feed its AI model

Amazon's Artificial General Intelligence (AGI) Group aims to accelerate the collection of coding metadata from GitHub by implementing a strategy where employees create multiple GitHub accounts to bypass the platform’s data scraping limits. This tactic addresses GitHub's restriction of 5,000 requests per hour per account, enabling Amazon to quickly gather the necessary data for AI training. The initiative reportedly adheres to internal legal and security guidelines, although it raises ethical questions about data privacy and consent. The metadata procured from GitHub is crucial for enhancing Amazon’s AI models, which drive various technological innovations across its services.

'Nemotron-4 340B' model redefines synthetic data generation, rivals GPT-4

Nvidia has introduced Nemotron-4 340B, a powerful family of models designed for synthetic data generation that promises to significantly impact the training of large language models (LLMs) across various industries. This suite includes base, instruct, and reward models, and boasts a vast training dataset of 9 trillion tokens, a 4,000 context window, and support for over 50 natural languages and 40 programming languages. Nemotron-4 340B is positioned to outshine competitors like Mistral’s Mixtral-8x22B, Anthropic’s Claude-Sonnet, Meta’s Llama3-70B, Qwen-2, and even GPT-4. Its commercially-friendly license facilitates widespread adoption, helping businesses create domain-specific LLMs without extensive real-world datasets. This development could lead to advancements in sectors such as healthcare, finance, manufacturing, and retail, while also raising questions about data privacy, security, and ethics.

Research

Automatic Bug Detection in LLM-Powered Text-Based Games Using LLMs

Advancements in LLMs have enhanced interactive game design by allowing dynamic plotlines and player-NPC interactions. However, issues like hallucinations and logical inconsistencies persist. This paper proposes a systematic LLM-based method to automatically detect bugs in player game logs without needing extra data like post-play surveys. Applied to the text-based game DejaBoom!, the method effectively identifies inherent LLM-related bugs, outperforming unstructured bug-catching approaches and addressing a gap in automated detection of logical and design flaws.

Can language models serve as text-based world simulators?

LLMs like GPT-4 have potential as text-based world simulators to predict how actions change world states, which can save on the complexity of manually building virtual environments. A new benchmark called ByteSized32-State-Prediction was developed for assessing this ability using a dataset of text game state transitions and tasks. Results show that although GPT-4 performs well, it remains inconsistent as a world simulator without additional improvements. This work provides insights into the strengths and limitations of current LLMs and introduces a novel benchmark for future evaluations.

TextGrad: Automatic "Differentiation" via Text

TextGrad is a new framework designed to automatically optimize compound AI systems by leveraging textual feedback from LLMs. It draws an analogy to the transformative impact of backpropagation in neural networks, aiming to make optimization smoother and more automatic. TextGrad uses textual suggestions from LLMs to enhance variables in computation graphs, supporting tasks from coding to molecular design. It's user-friendly, compatible with PyTorch, and requires minimal tuning. Remarkably, it boosts GPT-4o's zero-shot accuracy in Google-Proof Question Answering from 51% to 55%, enhances LeetCode-Hard coding solutions by 20%, and excels in prompt improvement, drug molecule design, and radiotherapy planning. This paves the way for advancing AI systems development.

Code

CrewAI Meets Suno AI

"Melody Agents" is a crewAI app designed for generating songs based on a given topic and music genre using the Suno AI API. It has three main agents: the Web Researcher Agent, which searches the web for topic-related info; the Lyrics Creator Agent, which crafts lyrics from that info tailored to the specified genre; and the Song Generator Agent, which uses the Suno API to create and download two candidate songs from the generated lyrics.

Open Source Version of AI "Math Notes" from Apple WWDC

AI Math Notes is an interactive drawing app where you can sketch out mathematical equations, and the app uses a multimodal LLM to calculate and show the result right on the canvas. Built with Python, Tkinter, and PIL, the app has a simple interface with features like clearing the canvas, undoing the last action, and calculating equations with a single button press. Future versions aim to auto-detect the equals sign for better accuracy.

Summarize Trending GH Repos in Terminal or Browser

Github Trending Summarizer fetches trending GitHub repositories and summarizes their details using GPT-4. It displays these summaries either in the terminal with rich formatting, including clickable links or renders them as HTML with a sepia theme for viewing in your browser.