I built LocalGPT over 4 nights as a Rust reimagining of the OpenClaw assistant pattern (markdown-based persistent memory, autonomous heartbeat tasks, skills system).
It compiles to a single ~27MB binary — no Node.js, Docker, or Python required.
Key features:
- Persistent memory via markdown files (MEMORY, HEARTBEAT, SOUL markdown files) — compatible with OpenClaw's format - Full-text search (SQLite FTS5) + semantic search (local embeddings, no API key needed) - Autonomous heartbeat runner that checks tasks on a configurable interval - CLI + web interface + desktop GUI - Multi-provider: Anthropic, OpenAI, Ollama etc - Apache 2.0
Install: `cargo install localgpt`
I use it daily as a knowledge accumulator, research assistant, and autonomous task runner for my side projects. The memory compounds — every session makes the next one better.
GitHub: https://github.com/localgpt-app/localgpt Website: https://localgpt.app
Would love feedback on the architecture or feature ideas.
Yes this is not local first, the name is bad.
Horrible. Just because you have code that runs not in a browser doesn't mean you have something that's local. This goes double when the code requires API calls. Your net goes down and this stuff does nothing.
Not to mention that you can actually have something that IS local AND runs in a browser :D
In a world where IT doesn't mean anything, crypto doesn't mean anything, AI doesn't mean anything, AGI doesn't mean anything, End-to-end encryption doesn't mean anything, why should local-first mean anything? We must unite against the tyranny of distinction.
It absolutely can be pointed to any standard endpoint, either cloud or local.
It’s far better for most users to be able to specify an inference server (even on localhost in some cases) because the ecosystem of specialized inference servers and models is a constantly evolving target.
If you write this kind of software, you will not only be reinventing the wheel but also probably disadvantaging your users if you try to integrate your own inference engine instead of focusing on your agentic tooling. Ollama, vllm, hugging face, and others are devoting their focus to the servers, there is no reason to sacrifice the front end tooling effort to duplicate their work.
Besides that, most users will not be able to run the better models on their daily driver, and will have a separate machine for inference or be running inference in private or rented cloud, or even over public API.
It is not local first. Local is not the primary use case. The name is misleading to the point I almost didn't click because I do not run local models.
I think the author is using local-first as in “your files stay local, and the framework is compatible with on-prem infra”. Aside from not storing your docs and data with a cloud service though, it’s very usable with cloud inference providers, so I can see your point.
Maybe the author should have specified that capability, even though it seems redundant, since local-first implies local capability but also cloud compatibility, or it would be local or local-only.
It's called "LocalGPT". It's a bad name.
To be precise, it’s exactly as local first as OpenClaw (i.e. probably not unless you have an unusually powerful GPU).
Yes but OpenClaw (which is a terrible name for other reasons) doesn't have "local" in the name and so is not misleading.
As misleading. Lots of their marketing push or at least thr ClawBros pitch it as running local on your MacMini.
To be fair, you do keep significantly more control of your own data from a data portability perspective! A MEMORY.md file presents almost zero lock-in compared to some SaaS offering.
Privacy-wise, of course, the inference provider sees everything.
To be clear: keeping a local copy of some data provides not control over how the remote system treats that data once it’s sent.
I mean, at least OpenClaw is funny in the sense that a D port could finish the roundabout by calling itself "OpenClawD"...
Confused me at first as when I saw mention of local + the single file thing in the GitHub I assumed they were going to have llamafile bundled and went looking through to see what model they were using by default.
You absolutely do not have to use a third party llm. You can point it to any openai/anthropic compatible endpoint. It can even be on localhost.
Ah true, missed that! Still a bit cumbersome & lazy imo, I'm a fan of just shipping with that capability out-of-the-box (Huggingface's Candle is fantastic for downloading/syncing/running models locally).
In local setup you still usually want to split machine that runs inference from client that uses it, there are often non trivial resources used like chromium, compilation, databases etc involved that you don’t want to pollute inference machine with.
Ah come on, lazy? As long as it works with the runtime you wanna use, instead of hardcoding their own solution, should work fine. If you want to use Candle and have to implement new architectures with it to be able to use it, you still can, just expose it over HTTP.
I think one of the major problems with the current incarnation of AI solutions is that they're extremely brittle and hacked-together. It's a fun exciting time, especially for us technical people, but normies just want stuff to "work."
Even copy-pasting an API key is probably too much of a hurdle for regular folks, let alone running a local ollama server in a Docker container.
Unlike in image/video gen, at least with LLMs the "best" solution available isn’t a graph/node-based interface with an ecosystem of hundreds of hacky undocumented custom nodes that break every few days and way too complex workflows made up of a spaghetti of two dozen nodes with numerous parameters each, half of which have no discernible effect on output quality and tweaking the rest is entirely trial and error.
That's not the best solution for image or video (or audio, or 3D) any more than it is for LLMs (which it also supports.)
OTOH, its the most flexible and likely to have some support for what you are doing for a lot of those, and especially if yoj are combining multiple of them in the same process.
Yes, "best" is subjective and that’s why I put it in quotes. But in the community it’s definitely seen as something users should and do "upgrade" to from less intimidating but less flexible tools if they want the most power, and most importantly, support for bleeding-edge models. I rarely use Comfy myself, FWIW.
> but normies just want stuff to "work."
Where in the world are you getting that this project is for "normies"? Installation steps are terminal instructions and it's a CLI, clearly meant for technical people already.
If you think copying-pasting an API key is too much, don't you think cloning a git repository, installing the Rust compiler and compiling the project might be too much and hit those normies in the face sooner than the API key?
> but I'm not really sure about calling it "local-first" as it's still reliant on an `ANTHROPIC_API_KEY`.
See here:
https://github.com/localgpt-app/localgpt/blob/main/src%2Fage...
What reasonable comparable model can be run locally on say 16GB of video memory compared to Opus 4.6? As far as I know Kimi (while good) needs serious GPUs GTX 6000 Ada minimum. More likely H100 or H200.
Devstral¹ has very good models that can be run locally.
They are in the top of open models, and surpass some closed models.
I've been using devstral, codestral and Le Chat exclusively for three months now. All from misteals hosted versions. Agentic, as completion and for day-to-day stuff. It's not perfect, but neither is any other model or product, so good enough for me. Less anecdotal are the various benchmarks that put them surprisingly high in the rankings
¹https://mistral.ai/news/devstral
Nothing will come close to Opus 4.6 here. You will be able to fit a destilled 20B to 30B model on your GPU. Gpt-oss-20B is quite good in my testing locally on a Macbook Pro M2 Pro 32GB.
The bigger downside, when you compare it to Opus or any other hosted model, is the limited context. You might be able to achieve around 30k. Hosted models often have 128k or more. Opus 4.6 has 200k as its standard and 1M in api beta mode.
There are local models with larger context, but the memory requirements explode pretty quickly so you need to lower parameter count or resort to heavy quantization. Some local inference platforms allow you to place the KV cache in system memory (while still otherwise using GPU). Then you can just use swap to allow for even very long contexts, but this slows inference down quite a bit. (The write load on KV cache is just appending a KV vector per inferred token, so it's quite compatible with swap. You won't be wearing out the underlying storage all that much.)
I made something similar to this project, and tested it against a few 3B and 8B models (Qwen and Ministral, both the instruction and the reasoning variants). I was pleasantly surprised by how fast and accurate these small models have become. I can ask it things like "check out this repo and build it", and with a Ralph strategy eventually it will succeed, despite the small context size.
Nothing close to Opus is available in open weights. That said, do all your tasks need the power of Opus?
The problem is that having to actively decide when to use Opus defeats much of the purpose.
You could try letting a model decide, but given my experience with at least OpenAI’s “auto” model router, I’d rather not.
I also don't like having to think about it, and if it were free, I would not bother even though keeping up a decent local alternative is a good defensive move regardless.
But let's face it. For most people Opus comes at a significant financial cost per token if used more than very casual, so using it for rather trivial or iterative tasks that nevertheless consume a lot of those is something to avoid.
What does ANTHROPIC bring to this project that a local LLM cannot, e.g. Gwen3 Coder Next?
I'm playing with local first openclaw and qwen3 coder next running on my LAN. Just starting out but it looks promising.
On what sort of hardware/RAM? I've been trying ollama and opencode with various local models on a 16Gb RAM, but the speed, and accuracy/behaviour just isn't good enough yet.
> Say what you will, but AI really does feel like living in the future.
Love or hate it, the amount of money being put into AI really is our generation's equivalent of the Apollo program. Over the next few years there are over 100 gigawatt scale data centres planned to come online.
At least it's a better use than money going into the military industry.
The Apollo program was peanuts in comparison:
https://www.wsj.com/tech/ai/ai-spending-tech-companies-compa...
https://www.reuters.com/graphics/USA-ECONOMY/AI-INVESTMENT/g...
Most of these AI companies are part of the military industry. So the money is still going there at the end of the day.
What makes you think AI investment isn't a proxy for military advantage? Did you miss the saber rattling of anti-regulation lobbying, that we cannot pause or blink or apply rules to the AI industry because then China would overtake us?
You know they will never come on line. A lot of it is letters of intention to invest with nothing promised, mostly to juice the circular share price circuils.
LoL, don't worry they are getting their dose of the snakeoil too
IMHO it doesn't make sense, financially and resource wise to run local, given the 5 figure upfront costs to get an LLM running slower than I can get for 20 USD/m.
If I'm running a business and have some number of employees to make use of it, and confidentiality is worth something, sure, but am I really going to rely on anything less then the frontier models for automating critical tasks? Or roll my own on prem IT to support it when Amazon Bedrock will do it for me?
That’s probably true only as long as subscription prices are kept artificially low. Once the $20 becomes $200 (or the fast-mode inference quotas for cheap subs become unusably small), the equation may change.
This field is highly competitive. Much more than I expected it to. I thought the barrier to entry was so high, only big tech could seriously join the race, because of costs, or training data etc.
But there's fierce competition by new or small players (deepseek, Mistral etc), many even open source. And Icm convinced they'll keep the prices low.
A company like openai can only increase subscriptions x10 when they've locked in enough clients, have a monopoly or oligopoly, or their switching costs are multitudes of that.
So currently the irony seems to be that the larger the AI company, the more loss they're running at. Size seems to have a negative impact on business. But the smaller operators also prevent companies from raising prices to levels at which they make money.
It starts making a lot of sense if you can run the AI workloads overnight on leaner infrastructure rather than insist on real-time response.
The usage limits on most 20 USD/month subs are becoming quite restrictive though. API pricing is more indicative of true cost.
> but AI really does feel like living in the future.
Got the same feeling when I put on the Hololens for the first time but look what we have now.