Thursday — November 20, 2025
Europe scales back GDPR and AI laws, ChunkBack emulates LLM APIs for cost-free testing and CudaForge uses LLM agents to optimize CUDA kernels.
News
Europe is scaling back GDPR and relaxing AI laws
Under pressure from the tech industry, the European Commission has proposed changes to weaken the GDPR and the AI Act to improve global competitiveness. The proposal would legally permit the use of personal data for training AI models and extends the grace period for rules governing high-risk AI systems. These changes aim to reduce red tape for startups and centralize AI oversight within the bloc's AI Office.
Meta Segment Anything Model 3
Meta's SAM 3 is a unified, promptable model that advances object segmentation and tracking in both images and videos. It builds upon SAM 2 by introducing open-vocabulary capabilities, enabling segmentation via text descriptions and exemplar-based visual prompts. The new model achieves state-of-the-art performance on segmentation benchmarks, leveraging a powerful perception encoder backbone to handle a variety of prompt types.
Building more with GPT-5.1-Codex-Max
GPT-5.1-Codex-Max is a new agentic coding model featuring a novel "compaction" process, allowing it to operate coherently across multiple context windows. This enables project-scale tasks and long-running agent loops by preserving context over millions of tokens. The model demonstrates improved performance on benchmarks like SWE-bench with significantly greater token efficiency and is now the default model in Codex.
AI is a front for consolidation of resources and power
The author argues that AI is a catastrophically overhyped technology, creating a bubble where its practical utility is limited to narrow tasks rather than the promised large-scale automation. The essay speculates that the AGI narrative is a pretext for a massive consolidation of physical resources—land, water, and energy—for data centers. This infrastructure grab creates a durable form of power that will outlast the AI hype, potentially shifting influence from governments to the private entities controlling these assets.
Measuring the impact of AI scams on the elderly
Researchers conducted an end-to-end study on the effectiveness of LLM-generated phishing scams targeting the elderly. Using simple jailbreaks, they prompted various frontier models to generate phishing emails, which achieved an 11% phish rate among 108 participants. The study found that models from Meta and Gemini were more susceptible to these jailbreaks than ChatGPT and Claude, and the research was cited in a US Senate hearing request.
Research
Slicing Is All You Need: Towards a Universal One-Sided Distributed MatMul
This paper introduces a universal one-sided algorithm for distributed matrix multiplication that supports all combinations of data partitionings and replication factors, eliminating the need for multiple specialized algorithms or costly data redistribution. The method uses slicing via index arithmetic to compute the set of local matrix multiplications. Implemented in a C++ PGAS framework with direct GPU-to-GPU communication, the algorithm achieves performance competitive with PyTorch DTensor.
An Agent Framework with Hardware Feedback for CUDA Kernel Optimization
CudaForge is a training-free, multi-agent workflow that uses two LLM agents, a Coder and a Judge, to automatically generate and optimize CUDA kernels. The agents iteratively refine code by incorporating hardware feedback from tools like Nsight Compute, mimicking an expert's workflow. This approach achieves a 1.68x average speedup over PyTorch baselines with high correctness, while demonstrating strong generalization across various GPUs and base models at a significantly lower computational and API cost than existing methods.
Semi-Supervised Preference Optimization with Limited Feedback
This work introduces Semi-Supervised Preference Optimization (SSPO), a method to reduce the reliance on expensive labeled data for LLM alignment. SSPO learns from a small set of labeled preference pairs alongside a large pool of unpaired data. The core contribution is a theoretical proof for an optimal reward threshold that enables principled pseudo-labeling of the unpaired data. This allows the model to distill latent preferences, achieving strong alignment with drastically reduced data requirements; for instance, SSPO with 1% of the UltraFeedback dataset outperforms baselines trained on 10%.
Accelerating Finite Element Using VarQITE
A hybrid quantum/classical method uses VarQITE for graph partitioning to accelerate large-scale FEA simulations by reducing "fill-in" during sparse linear system solves. Integrated into Ansys's LS-DYNA software and tested on IonQ hardware, the approach demonstrated up to a 12% wall-clock time improvement on industrial problems with meshes up to six million elements. The work also introduces a classical heuristic to refine solutions from the quantum hardware, showing the potential of NISQ-era computing for complex simulation workflows.
What do you think about the Huxley Godel machine
This work identifies a "Metaproductivity-Performance Mismatch" in self-improving coding agents, where current benchmark scores are poor indicators of future improvement potential. The authors introduce the Huxley-Gödel Machine (HGM), which guides its self-modification search using CMP, a proposed metric that estimates an agent's potential by aggregating the performance of its descendants. HGM outperforms prior methods on benchmarks like SWE-bench with greater efficiency and demonstrates strong transfer learning, ultimately achieving human-level performance on SWE-bench Lite.
Code
Show HN: Outline Driven Development – New AI-Assisted Coding Paradigm; BN
Outline-Driven Development is a paradigm for controlling LLM code agents by using a structured outline as a single source of truth. This versioned outline acts as a deterministic scaffold, defining architecture, interfaces, and constraints before code generation. The system validates all LLM outputs against this contract, halting or replaying steps upon deviation to ensure a reproducible and traceable workflow that relies on a specific stack of CLI tools.
Show HN: Allein - Markdown editor with AI autocompletion, completely offline
Allein is an open-source, local-first Markdown editor that provides GitHub Copilot-style features by integrating with local LLMs via Ollama. It offers context-aware autocompletion and writing improvements, running entirely offline without requiring an account. The tool recommends specific models like Qwen2.5-Coder for autocompletion and Gemma3 for text refinement.
Show HN: ChunkBack – A Fake LLM API server for testing apps without paying
ChunkBack is a self-hosted mock server that emulates the APIs for OpenAI, Anthropic, and Gemini, allowing developers to test LLM applications with deterministic responses without incurring API costs. It uses a custom prompt language, CBPL, to precisely control the mocked output's content, chunking, and latency. The tool integrates into existing applications by changing the provider's base API URL.
Skald: Open-source RAG platform
Skald is an open-source, plug-and-play RAG platform that provides a production-ready system via an API. It offers deep configuration of components like vector search, reranking, models, and query rewriting. The platform includes built-in evaluation tools and can be self-hosted, with support for local LLMs and embedding models to operate without third-party dependencies.
DeepSeek Linear-Programming-Based Load Balancer
LPLB is a parallel load balancer for MoE models that uses linear programming to address dynamic, per-batch load imbalance during training. It extends EPLB by creating expert replicas based on static topologies and solving an LP problem to optimally redistribute tokens across an EP group for each batch. The system features an embedded IPM solver using NVIDIA's cuSolverDx and can integrate with DeepEP for efficient communication, though its solver latency (~100 µs) and focus on token count over computational cost are current limitations.