Hacker News

Show HN: Iterm-Mcp – AI Terminal/REPL Control for iTerm2 github.com

Hi HN! Ever wish you could just point your AI assistant at your terminal and say 'what's wrong with this output?' That's why I built iterm-mcp. It lets MCP clients like Claude Desktop directly interact with your iTerm2 terminal - reading logs, running commands, using REPLs, and helping debug issues. Want to explore data or debug using a REPL? The AI can start the REPL, run commands, and help interpret the results.

This is an MCP server that integrates with Claude Desktop, LibreChat, and other Model Context Protocol compatible clients.

https://github.com/ferrislucas/iterm-mcp

Note: Independent project, not officially affiliated with iTerm2

## Features

*Efficient Token Use:* iterm-mcp gives the model the ability to inspect only the output that the model is interested in. The model typically only wants to see the last few lines of output even for long running commands.

*Natural Integration:* You share iTerm with the model. You can ask questions about what's on the screen, or delegate a task to the model and watch as it performs each step.

*Full Terminal Control and REPL support:* The model can start and interact with REPL's as well as send control characters like ctrl-c, ctrl-z, etc.

*Easy on the Dependencies:* iterm-mcp is built with minimal dependencies and is runnable via npx. It's designed to be easy to add to Claude Desktop and other MCP clients. It should just work.

## Real-World Example: Debugging Sidekiq Jobs

I needed to debug a Sidekiq job with complex arguments. The arguments were partially obfuscated in the logs. I asked Claude: "open rails console, show me arguments for the latest XYZ job". The model:

1. Launched Rails console 2. Retrieved job details 3. Displayed the arguments that I was looking for

## Architectural Journey

This project had a couple interesting constraints around command execution:

### 1. Token Efficiency Challenge

I wanted to constrain tokens as much as possible. I didn't want to send the entire output of a long running command to the model, but there's not a great way to know which parts of the output are important to what the model is doing. Sampling could be used here, but it's not well supported yet.

*Solution:* I arrived at a pull-based solution for this. The command from the model is sent to the terminal, and the model is made aware of how many lines of output were generated. The model can choose to retrieve as many lines of the buffer that it thinks are relevant.

### 2. Long-Running Process Support

I wanted to support long running processes. It turns out that when you run `brew install ffmpeg` - it takes a while, and it's not always clear when the job is done. In early proof of concepts, the model would assume the command completed successfully and begin sending additional commands to the terminal before the first command had finished.

*Solution:* iTerm provides a way to ask if the terminal is waiting for user input, but I found that it tended to show false positives in certain situations. For example, a long running command would result in iTerm reporting that the terminal was waiting for input when in fact the command was still running. I found that inspecting the processes associated with the terminal and waiting until the most interesting of those processes settles to a low resource usage is a fair indicator of long running commands being ready for input.

## Requirements

* iTerm2 must be running

* Node version 18 or greater

## Safety Considerations

* The user is responsible for using the tool safely.

* No built-in restrictions: iterm-mcp makes no attempt to evaluate the safety of commands that are executed.

* Models can behave in unexpected ways. The user is expected to monitor activity and abort when appropriate.

* For multi-step tasks, you may need to interrupt the model if it goes off track. Start with smaller, focused tasks until you're familiar with how the model behaves.

23 comments

pcwelder•5m
Good work.
I wonder if there's really a need for separate write to terminal and read output functions? I was hoping that write command itself would execute and return the output of the command, saving back and forth latency.
> and it's not always clear when the job is done
I've authored a similar mcp [1] (but without terminal ui)
The way I solved it is by setting a special PS1 prompt. So as soon as I get that prompt I know the task is done. I wonder if a similar thing can be done in your mcp?
[1] https://github.com/rusiaaman/wcgw
- deathmonger5000•5m
  Hi, thanks for the comment! wcgw looks really cool - nice job with it!
  > I wonder if there's really a need for separate write to terminal and read output functions? I was hoping that write command itself would execute and return the output of the command, saving back and forth latency.
  I traded back and forth latency for lower token use. I didn't want to return gobs of output from `brew install ffmpeg` when the model really only needs to see the last line of output in order to know what to do next.
  > The way I solved it is by setting a special PS1 prompt. So as soon as I get that prompt I know the task is done. I wonder if a similar thing can be done in your mcp?
  What you suggested with changing the prompt is a good idea, but it breaks down in certain scenarios - particularly if the user is using a REPL. Part of my goal for this is to not have to modify the shell prompt or introduce visual indicators for the AI because I don't want the user to have to work around the AI. I want the AI to help as requested as if it's sitting at your keyboard. I don't want to introduce any friction or really any unwanted change to the user's workflow at all.
  It's important to me that this work with REPL's and other interactive CLI utilities. If that weren't a design concern then I'd definitely explore the approach that you suggested.
  - v3ss0n•5m
    Why not tail it?
    - deathmonger5000•5m
      I think you might mean use the tail command? See my other comments about not wanting to change the user’s workflow. I don’t want to get in between the user and their commands in any way. That’s what drove my design decisions.
      Would you mind elaborating if I misunderstood what you meant?
wrs•5m
I’m about as likely to use this as buy a self-driving car at this point, but thought I’d point out that iTerm does know when the command has finished, thanks to its shell integration scripts, so does that help? Or if that’s not exposed, maybe just set the shell prompt to some distinctive value.
- deathmonger5000•5m
  Hi, thanks for your comment! I haven't explored that approach. Can you say more? Will using the approach that you suggested support interactive CLI utilities like a REPL? Those are use cases that I definitely want to support with this project.
  - wrs•5m
    It’s just in response to “it's not always clear when the job is done”. With shell integration installed, iTerm knows how to separate commands and responses (you can cmd-shift-arrow to move between them, select the entire response with a click, etc.) so I thought it might expose that knowledge. It does that just by putting a distinctive character sequence in the shell prompt, and you could do that directly if needed. Other REPLs have customizable prompts too.
    - deathmonger5000•5m
      I wonder how much of what iTerm knows about the current terminal state is exposed. I got it working "good enough for me" and moved on to other pieces of the puzzle. I'm sure what I did could be improved upon quite a bit. I bet you're right that there's additional juice to squeeze out of whatever iTerm exposes.
toprerules•5m
Why would you favor this approach over say, a command line tool that can pipe input into and out of a configurable AI backend, fork subprocesses to perform agent based tasks, etc. The amount of tokens is always bounded by what the user chooses to pipe into the tool. The Unix model is battle hardened, time tested approach. This tool seems like it locks you into iTerm2.
- deathmonger5000•5m
  > Why would you favor this approach over say, a command line tool that can pipe input into and out of a configurable AI backend, fork subprocesses [...]
  I think what you're describing is something that's built to perform agent based tasks. iterm-mcp isn't intended to be that. It's intended to be a bridge from something like Claude Desktop to iTerm. The REPL use case is a key thing to understand here.
  What you're describing is great if you want to delegate "install python on my system" for example, but it doesn't support the REPL use case where you want to work with the REPL through something like Claude Desktop.
  The other key use case iterm-mcp addresses is asking questions about what's sitting in the terminal right now. For example, you ran `brew install ffmpeg` and something didn't work: you can ask Claude using iterm-mcp.
  > This tool seems like it locks you into iTerm2.
  This tool is intended for use with iTerm2. It's not that it "locks you into iTerm2" - iterm-mcp is something that you would choose to use if you already use iTerm2.
  - toprerules•5m
    You can do this without needing a terminal aware tool. There are tools, and it's easy to write one yourself, that tee buffers the output of every command you run, then pipe the last command into your AI tool. You could also easily support N command buffers.
    Then you don't need to be locked into using iTerm2.
    - deathmonger5000•5m
      I see what you're saying. Yes, what you described sounds like a much better approach in terms of being terminal agnostic. It would be awesome to have a tool like iterm-mcp that supports any terminal, any OS, etc. iterm-mcp is limited specifically to iTerm.
- scottyeager•5m
  I'm working on something a bit like what you described. So far it has a command suggestion mode and also a generic question mode. Inbound pipe support is a TODO, but command substitution already works fine for many use cases like passing file contents to the LLM. I'm pretty wary of adding any automatic execution of AI generated commands, though using some isolation scheme like containers is an interesting possibility.
  https://github.com/scottyeager/Pal
selcuka•5m
It may not have the same functionality, but there is an official AI plugin [1] for iTerm2 (that can be enabled from Settings > AI).
[1] https://iterm2.com/ai-plugin.html
•5m
[deleted]
larusso•5m
A shiver runs down my spine reading „full terminal control“. For me that is a definite no-go. We fight hard to get remote code execution abilities off our system and here we freely invite it. „Hey gpt let me sudo first so you can execute that“
- nerdjon•5m
  That and the idea of a poorly sanitized log entry or something slipping to the AI and then you have a big problem. Just seems like a security issue waiting to happen.
  There was a system not long ago on here for AI automatically running system recovery as triage. I just can’t imagine giving AI any rights to actually run commands without oversight.
  I guess good luck explaining why you deleted a database or whatever while diagnosing an app when it decides that the best course of action is to delete and start over or some other really stupid solution.
  - larusso•5m
    I‘m part of info sec group in the company I‘m working at and this would get banned right away. Don‘t get me wrong we use and are open to AI and all kinds of tooling ideas to make us more proficient.
    - deathmonger5000•5m
      It sounds like iterm-mcp isn’t a tool that would fit in your organization. I’m totally not trying to change that or sell you anything.
      I’m curious what your thoughts are around Cursor, Windsurf, etc. Those are IDE’s that provide the model with limited access to the terminal. Where do you feel like those tools and their AI features - terminal access specifically, fall in an org like yours? Are they disallowed due to terminal access or are the limitations of those tools safe enough?
      - larusso•5m
        We have a whitelist set of allowed ai tools and models. Cursor for example would fit better but I think. It really depends on the user. Engineers without access to major production system are minor risk compared to our dev ops engineers. It’s an audience issue as well I think. But we don‘t make this distinction because that can constantly shift. An engineer today may or may not have more critical access then yesterday. And it’s not about trust or anything. It’s also about liability. Our company is publicly traded which brings in a whole lot of fun when it comes to compliant topics. Who to blame when an ai disaster happens. Obviously its the operator who should monitor the output. Sadly too complicated.
  - deathmonger5000•5m
    > I just can’t imagine giving AI any rights to actually run commands without oversight
    We’re 100% on the same page here. No one should ask Claude (or any model) to do something using their terminal and then just walk away. I hope that’s clear from the safety section of what I posted (and in the project README).
    Claude REALLY wants to help, and it will go on a journey to the end of the earth to accomplish your task. If you delegate tasks to this tool then you’re going to have to babysit it.
    - jondwillis•5m
      I have yet to have anything catastrophic happen with pretty liberal usage of YOLO mode in Cursor with pretty weak “safe” instruction guardrails. Then again, I am working with dev credentials on non-critical projects, typically. It does seem like it’s a matter of time until I get prompt injected and divulge some secrets or an over-eager Claude `rm rf`’s /.
- •5m
  [deleted]