I've been building a tool that changes how LLM coding agents explore codebases, and I wanted to share it along with some early observations.
Typically claude code globs directories, greps for patterns, and reads files with minimal guidance. It works in kind of the same way you'd learn to navigate a city by walking every street. You'll eventually build a mental map, but claude never does - at least not any that persists across different contexts.
The Recursive Language Models paper from Zhang, Kraska, and Khattab at MIT CSAIL introduced a cleaner framing. Instead of cramming everything into context, the model gets a searchable environment. The model can then query just for what it needs and can drill deeper where needed.
coderlm is my implementation of that idea for codebases. A Rust server indexes a project with tree-sitter, builds a symbol table with cross-references, and exposes an API. The agent queries for structure, symbols, implementations, callers, and grep results — getting back exactly the code it needs instead of scanning for it.
The agent workflow looks like:
1. `init` — register the project, get the top-level structure
2. `structure` — drill into specific directories
3. `search` — find symbols by name across the codebase
4. `impl` — retrieve the exact source of a function or class
5. `callers` — find everything that calls a given symbol
6. `grep` — fall back to text search when you need it
This replaces the glob/grep/read cycle with index-backed lookups. The server currently supports Rust, Python, TypeScript, JavaScript, and Go for symbol parsing, though all file types show up in the tree and are searchable via grep.
It ships as a Claude Code plugin with hooks that guide the agent to use indexed lookups instead of native file tools, plus a Python CLI wrapper with zero dependencies.
For anecdotal results, I ran the same prompt against a codebase to "explore and identify opportunities to clarify the existing structure".
Using coderlm, claude was able to generate a plan in about 3 minutes. The coderlm enabled instance found a genuine bug (duplicated code with identical names), orphaned code for cleanup, mismatched naming conventions crossing module boundaries, and overlapping vocabulary. These are all semantic issues which clearly benefit from the tree-sitter centric approach.
Using the native tools, claude was able to identify various file clutter in the root of the project, out of date references, and a migration timestamp collision. These findings are more consistent with methodical walks of the filesystem and took about 8 minutes to produce.
The indexed approach did better at catching semantic issues than native tools and had a key benefit in being faster to resolve.
I've spent some effort to streamline the installation process, but it isn't turnkey yet. You'll need the rust toolchain to build the server which runs as a separate process. Installing the plugin from a claude marketplace is possible, but the skill isn't being added to your .claude yet so there are some manual steps to just getting to a point where claude could use it.
Claude continues to demonstrate significant resistance to using CodeRLM in exploration tasks. Typically to use you will need to explicitly direct claude to use it.
---
Repo: github.com/JaredStewart/coderlm
Paper: Recursive Language Models https://arxiv.org/abs/2512.24601 — Zhang, Kraska, Khattab (MIT CSAIL, 2025)
Inspired by: https://github.com/brainqub3/claude_code_RLM
Hey, are you planning to update docs for end users of your CLI? I was an Aider user who switched to Opencode but I want to experiment with token and time-efficient agents, and I'm assuming OpenHands is one.
Aider's repo-map concept is great! thanks for sharing, I'd not been aware of it. Using tree-sitter to give the LLM structural awareness is the right foundation IMO. The key difference is how that information gets to the model.
Aider builds a static map, with some importance ranking, and then stuffs the most relevant part into the context window upfront. That's smart - but it is still the model receiving a fixed snapshot before it starts working.
What the RLM paper crystallized for me is that the agent could query the structure interactively as it works. A live index exposed through an API lets the agent decide what to look at, how deep to go, and when it has enough. When I watch it work it's not one or two lookups but many, each informed by what the previous revealed. The recursive exploration pattern is the core difference.
Aider actually prompts the model to say if it needs to see additional files. Whenever the model mentions file names, aider asks the user if they should be added to context.
As well, any files or symbols mentioned by the model are noted. They influence the repomap ranking algorithm, so subsequent requests have even more relevant repository context.
This is designed as a sort of implicit search and ranking flow. The blog article doesn’t get into any of this detail, but much of this has been around and working well since 2023.
I see, so the context adapts as the LLM interacts with the codebase across requests?
That's a clever implicit flow for ranking.
The difference in my approach is that exploration is happening within a single task, autonomously. The agent traces through structure, symbols, implementations, callers in many sequential lookups without human interaction. New files are automatically picked up with filesystem watching, but the core value is that the LLM can navigate the code base the same way that I might.
> That's smart - but it is still…
> That's a clever… The difference in my approach…
Are you using LLM to help you write these replies, or are you just picking up their stylistic phrasings the way expressions go viral at an office till everyone is saying them?
As an LLM, you wouldn't consider that you're replying confidently and dismissively while clearly having no personal experience with the CLI coding agent that not only started it all but for a year (eternity in this space) was so far ahead of upstarts (especially the VSCode forks family) it was like a secret weapon. And still is in many ways thanks to its long lead and being the carefully curated labor of a thoughtful mind.
As a dev seeking to improve on SOTA, having no awareness of the progenitor and the techniques one most do better than, seems like a blind spot worth digging into before dismissing. Aider's benchmarks on practical applicability of model advancements vs. regressions in code editing observably drove both OpenAI and Anthropic to pay closer attention and improve SOTA for everyone.
Aider was onto something, and you are onto something, pushing forward the 'semantic' understanding. It's worth absorbing everything Paul documented and blogged, and spending some time in Aider to enrich a feel of what Claude Code chose to do the same or differently, which ideas may be better, and what could be done next to go further.
Aider's repomap is a great idea. I remember participating in the discussion back then.
The unfortunate thing for Python that the repomap mentions, and untyped/duck-typed languages, is that function signatures do not mean a lot.
When it comes to Rust, it's a totally different story, function and method signatures convey a lot of important information. As a general rule, in every LLM query I include maximum one function/method implementation and everything else is function/method signatures.
By not giving mindlessly LLMs whole files and implementations, I have never used more than 200.000 tokens/day, counting input and output. This counts as 30 queries for a whole day of programming, and costs less than a dollar per day not matter which model I use.
Anyway, putting the agent to build the repomap doesn't sound such a great idea. Agents are horribly inefficient. It is better to build the repomap deterministically using something like ast-grep, and then let the agent read the resulting repomap.
Typed languages definitely provide richer signal in there signatures - and my experience has been that I get more reliable generations from those languages.
On the efficiency point, the agent isn't doing any expensive exploration here. There is a standalone server which builds and maintains the index, the agent is only querying it. So it's closer to the deterministic approach implemented in aider (at least in a conceptual sense) with the added benefit that the LLM can execute targeted queries in a recursive manner.
I am planning to add similar concepts to Yek. Either tree-sitter or ast-grep. Your work here and Aider's work would be my guiding prior art. Thank you for sharing!
https://github.com/mohsen1/yek
I just looked and it was posted a number of times with 0 discussion
https://news.ycombinator.com/item?id=38062493
https://news.ycombinator.com/item?id=41411187
https://news.ycombinator.com/item?id=40231527
https://news.ycombinator.com/item?id=39993459
https://news.ycombinator.com/item?id=41393767
https://news.ycombinator.com/item?id=39391946