AI & ML News

Deep dive into Mentat coding assistant

Following up from playing with Aider coding assistant, I've been using Mentat to code for the past few weeks. Comparing with Aider The way it works is similar. It runs on the terminal with a textual UI. It even lets you set light and dark mode color scheme. Coding sessions involve adding relevant files for context. In the chatbox, you make your wishes and it comes back with an action plan involving code changes. Approve them and changes are made. The experience as a whole is close to Aider, except Aider makes a git-commit for every change (which you can opt out). Mentat leaves the version management to you. The quality of how you phrase your wishes determines the quality of the work. You have to be pretty verbose about it. What comes back is a function of your choice of LLM. I won't attribute smartness to the coding assistants, but I would credit superior development experience to them if any. No matter which one you use, you still have to talk to them like a clueless intern. Context limit Out of the box, Mentat supports is a meager list of LLMs compared to Aider (that might or might not change). I didn't let that be a problem, I hooked it up to use a coding LLM in Together.ai. But it didn't matter; I ran into context window limit right off the bat. Granted some of my files are production-length, but I didn't include that many of them. I didn't even get the chance to make it do something clever yet. I was determined to make this work, only context-limit is getting in the way. The solution isn't to just use an LLM with larger context limit. There's always an upper limit, you'd just end up hitting that constantly. Built my own RAG I heard RAG is the answer. So I built a middleware that sits between Mentat and the LLM. This is an OpenAI-compatible REST API (http://localhost:<port>/chat/completions) running locally, all housed in one Python file. I call it Broken Sword for easy reference. As far as Mentat is concerned, Broken Sword is an actual LLM service. Within Broken Sword I capture Mentat's requests, massage the inputs, send to any LLM I want, and return the response in an OpenAI-compatible way. When doing this, I get to see the elaborate directives given by Mentat, that is what prompt-engineering looks like. Just by doing this I've enabled Mentat to use any LLM available to mankind. I proceeded use Google Gemini 1.5 to power Broken Sword, mostly because it has the right balance of quality and cost. This alone does not solve context window limit though. This is no more than a glorified pipe. Rather than sending inputs from Mentat verbatim, the huge amount of context can be stored in a vector database and sent over as embeddings instead. If I understand it right, large chunks for texts get turned into multidimensional matrices of numbers. This is much smaller for LLMs to use rather than original texts. I made all that work using LangChain (it has the series of processes abstracted away), with a dash for Flask for simple API. It felt like cheating when I don't yet know how this magic works, but I wanted to hack things fast. I know they say you don't really need LangChain and I believe them, but some day man, some day. It works When I'm done, Mentat ended up working like it's supposed to. I made it write unit tests, it got written in the style that's consistent with existing ones. I made it write a GitHub Actions workflow, the result was sensible. It was gratifying, when it works. Knowing I've made it work with Broken Sword is doubly satisfying. Which got me wondering, why does Mentat not use RAG or vector database like I just did? It felt almost trivial to do so. I took a browse in Mentat codebase, indeed Chroma DB is used (the same vector db I use). So maybe they are doing RAG somehow but not in ways that matter to me. But it's clunky As I put Mentat to work more and more, the clunkiness becomes apparent. It would crash from time to time. Sometimes because the LLM didn't come back with something it likes, but most of the time for reason unknown to me. Graceful failure isn't its strength. There would be times when Mentat would crash after I made a request. Upon relaunching and re-including the relevant files, I repeated the same request (good thing they have chat history to make this easy) and everything works out. Mixture of hand-coding One question I was hoping to answer in this adventure is the right mixture of using coding assistant this way and directly editing files when solving one problem. In that if possible, should all coding be done from just the coding assistant? Or are we expected to have a code editor ready in the next screen? In my case half of my screen is for Mentat, the other half for emacs. I expected Mentat to grant me most what I want but not perfect, and I would make minor adjustments by hand to the same files in emacs. If Mentat-style coding assistant has a future, I wonder if that's the way it should be.
dev.to
dev.to