August 24, 2025
Plain-Text Context Files for LLMs
What They Are, Why They Exist, Where They Break
Plain-text context files are the new config surface for AI. A few kilobytes of Markdown can change how a coding agent behaves, how a web crawler assembles context, and how your IDE chooses files.
This post maps (some) of the landscape, explains why so many tools went “just use a text file,” and calls out the sharp edges.
What we’re talking about
You've probably encountered these types of files when using AI-powered IDEs. This concept was essentially invented by them and generally take the form of repo-level files that are automatically loaded and presented to LLMs.
Examples include:
AGENTS.md — OpenAI’s neutral “README for agents.” Tools look here for project-specific guidance.
CLAUDE.md — Claude Code auto-loads this file at session start; commonly used for run/test instructions, coding conventions, and local quirks.
GEMINI.md— Gemini Code Assist discovers these files (hierarchically) and treats them as agent memory.
Copilot instructions — GitHub Copilot supports repo-scoped instruction files like .github/copilot-instructions.md.
Other IDE vendor rule files — Cursor .cursorrules (MDC format), Windsurf .windsurfrules, Amazon Q Markdown rules in .amazonq/rules.
AGENTS.md
is probably the best illustration of this:
1# AGENTS.md
2
3## Project
4TypeScript monorepo. pnpm. Turbo for tasks. Next.js app, FastAPI service.
5
6## Stack & commands
7- Install: pnpm i
8- Dev: pnpm dev:all
9- Test: pnpm test -w
10
11## Conventions
12- TS strict; no any
13- API errors: RFC 9457
14- Commit style: Conventional Commits
15
16## Tasks the agent may do
17- Add endpoints in /apps/api with tests
18- Touch only files it just created when refactoring
19
20## Guardrails
21- Never commit .env
22- Don’t bump Node version
It's also worth noting /llms.txt
here although it serves a slightly different purpose. A Markdown index pointing models to the right pages. GitBook and others now generate it automatically. Variants include llms-full.txt (entire docs rendered to Markdown).
Why plain text won
Zero-ops & portable. A Markdown file commits to Git, diffs cleanly and works offline (useful if you're using local coding agents). Many new IDEs auto-discover these files by name.
Markdown as a schema The general shape of all these formats is the same: sections of text the model can rely on. Formalisations like AGENTS.md are useful but don't really contsrain you in anyway.
0 learning curve: It’s just text in the repo. No new DSL or plugin config, no schema to memorise. Edit in any editor, review in PRs, search with grep/rg, and copy between vendors unchanged. CI can lint size and required sections with the tools you already use.
The sharp edges
Duplication & drift. Teams end up with CLAUDE.md, GEMINI.md, AGENTS.md, .cursorrules, and .amazonq/rules that all say the same thing, then drift. Even Google’s CLI exposes a setting to accept multiple context filenames, which is a tell.
Ambiguous precedence. Multiple files across directories make it unclear what actually got loaded. Some tools do hierarchical search, but specifics vary.
Context window pressure. Shoving giant files into prompts burns tokens and can drown out local context. You need discipline and evals. (Which is often missing.)
Security & governance. Markdown looks harmless, but you can leak secrets, internal URLs, or compliance steps that bypass review. Treat these like code. Copilot’s repo-level instructions and Amazon Q’s rules should go through PRs.
A simple decision guide
Scenario | Recommendation |
---|---|
Single IDE/agent, small team | One repo file the tool auto-loads (vendor-specific: CLAUDE.md or GEMINI.md). Keep it < 300 lines. |
Multiple agents/tools | Adopt AGENTS.md as the canonical source and generate vendor files as needed. Keep vendor files thin. |
Docs on the web | Serve /llms.txt. Consider /llms-full.txt if your audience is AI agents, not humans. Automate generation in CI. |
Dynamic data, tools | Move context & tools behind MCP and treat context files as pointers to these tools. |