Friday — March 6, 2026

GPT-5.4 introduces native computer-use capabilities, GLiNER2 enables CPU-based information extraction and researchers map Dyson sphere signatures on the H-R diagram.

Interested in AI engineering? Let's talk

News

GPT-5.4

GPT-5.4 and GPT-5.4 Pro introduce native computer-use capabilities, achieving a 75.0% success rate on OSWorld-Verified, and support context windows up to 1M tokens. The model features a new "tool search" mechanism that reduces token usage by 47% in complex workflows and allows for mid-response steering in ChatGPT. It outperforms GPT-5.2 across professional reasoning, coding, and visual perception benchmarks while maintaining high token efficiency.

The L in "LLM" Stands for Lying

The author critiques the current LLM-driven software development landscape, arguing that "vibe-coding" produces unauthentic "slop" that increases technical liability and devalues engineering craftsmanship. By framing LLM output as a form of forgery, the text highlights the systemic lack of source attribution and the resulting degradation of open-source contributions. To move beyond mere "citation role-play," the author suggests that future AI development must prioritize auditable forward passes and verifiable sourcing to ensure intellectual integrity.

Relicensing with AI-Assisted Rewrite

The maintainers of the chardet library used Claude Code to rewrite their LGPL codebase to relicense it under MIT, bypassing traditional "clean room" protocols. This approach faces legal challenges regarding whether LLM-generated output constitutes a derivative work of the input source or if it lacks copyrightability entirely under recent judicial rulings. If upheld, AI-assisted rewrites could effectively undermine copyleft protections by allowing automated license conversion.

A GitHub Issue Title Compromised 4k Developer Machines

The "Clinejection" attack compromised 4,000 developer machines by using prompt injection in a GitHub issue title to exploit an AI-powered triage bot. This allowed an attacker to achieve arbitrary code execution within a CI environment, poison the GitHub Actions cache, and exfiltrate npm release tokens to publish a malicious version of the Cline CLI. The incident highlights a critical "confused deputy" vulnerability where autonomous agents process unsanitized natural language inputs while maintaining privileged access to secrets and deployment pipelines.

GPT-5.4

Research

Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

Memex is an indexed experience memory mechanism designed to overcome LLM context window limitations in long-horizon tasks by offloading full-fidelity interactions to an external database. Unlike lossy summarization, it maintains a compact working context of structured summaries and stable indices that the agent can dereference to retrieve exact evidence as needed. Optimized via the MemexRL framework, this approach enables agents to manage memory effectively, improving task success while significantly reducing active context usage.

Dyson spheres on H-R diagram

This study models the radiative signatures of Dyson spheres around white dwarfs and M-dwarfs, establishing a $T \propto R_D^{-1/2}$ scaling law for equilibrium temperatures. By mapping these megastructures onto the H-R diagram, the research identifies specific infrared flux constraints and observational signatures. These findings provide a quantitative framework for detecting techno-signatures in low-luminosity stellar systems using infrared surveys.

Generative Linguistics, LLMs, and the Social Nature of Scientific Success

The author argues that generative linguistics faces a crisis due to the success of LLMs, which some claim refutes Chomskyan approaches. While Chesi advocates for increased formal and empirical rigor to save the field, the author contends that generativists must also expand their social ambitions and engage external stakeholders to match the current impact of LLM research.

Trojan Source: Invisible Vulnerabilities

"Trojan Source" attacks exploit Unicode encoding subtleties to create source code that appears different to human reviewers than to compilers, enabling the insertion of invisible vulnerabilities across multiple programming languages. This technique poses significant supply-chain risks by decoupling the logical token order from the visual display. The authors propose compiler-level defenses and detail a coordinated industry-wide disclosure to mitigate these threats.

V1: Unifying Generation and Self-Verification for Parallel Reasoners (ArXiv)

The $V_1$ framework optimizes test-time scaling for reasoning tasks by replacing traditional scalar scoring with more robust pairwise self-verification. It introduces $V_1$-Infer, an uncertainty-guided tournament ranking algorithm for efficient compute allocation, and $V_1$-PairRL, which jointly trains a model as both generator and pairwise verifier. Benchmarks in math and code generation show $V_1$ significantly outperforms pointwise verification and standard RL, achieving higher Pass@1 scores with greater efficiency.

Code

GLiNER2: Unified Schema-Based Information Extraction

GLiNER2 is a unified, schema-based information extraction model that performs NER, Text Classification, Structured Data Extraction, and Relation Extraction within a single 205M parameter model. Optimized for efficient CPU-based inference, it offers 100% local processing without requiring a GPU. The framework supports multi-task schema composition, custom model training, and parameter-efficient fine-tuning via LoRA adapters, positioning it as a versatile and privacy-focused alternative to LLM-based solutions for diverse IE tasks.

Keep large tool output out of LLM context: 3x accuracy 95% fewer tokens

Sift is a reliability gateway for MCP and CLI tool outputs that manages large JSON payloads by persisting them as SQLite artifacts. It optimizes LLM context usage by returning compact schema references instead of raw data, enabling agents to perform precise retrieval through Python-based queries. Benchmarks show this approach can reduce input tokens by over 95% while significantly improving factual accuracy through stable schemas, automated pagination, and built-in secret redaction.

Webmcp-react – React hooks that turn your website into an MCP server

webmcp-react provides React hooks to expose typed tools to AI agents via the navigator.modelContext API, implementing the emerging WebMCP standard. It features Zod-first type inference, a built-in polyfill for non-native environments, and SSR compatibility. The library enables dynamic tool registration and includes a Chrome extension to bridge browser-based tools to desktop MCP clients like Claude and Cursor.

Pre-execution verification for LLM-generated agentic workflows

workflow-verify is a framework for pre-execution validation of LLM-generated agentic workflows using a structured Workflow AST. It performs static analysis on type flows, schemas, side effects, and guard conditions to ensure safety before transpiling to Python, TypeScript, or Temporal. The library also enables an automated self-correction loop and integrates with MCP to facilitate reliable, iterative workflow generation.

Mnemora – Serverless memory DB for AI agents (no LLM in your CRUD path)

Mnemora is an open-source serverless memory database for AI agents that provides working, semantic, episodic, and procedural memory through a single API. It features sub-10ms read latency and removes LLMs from the CRUD path, offering native integrations with LangGraph, LangChain, and CrewAI. The architecture is fully self-hostable on AWS via CDK, utilizing DynamoDB, Aurora pgvector, and Bedrock Titan for embeddings.