I Used AI on a Production App. Now I Trust It Less and Use It More.

TL;DR Link to heading

Software engineering is not dead.
You won’t go any further with vibe coding if you never had a coding background before.
AI coding agents are here to stay, and for good.

I’ve been doing AI-assisted programming for a few months, but it was mostly for smaller projects (eg: for my blog framework), and basic stuff here and there. Until last month, I did not build any real projects with it. Over the last 30 or so days, I built a full production-ready application with Claude Code (Opus 4.6 high) (CC), and I’m sharing my experiences in this blog post.

Project Context Link to heading

First things first, some context about the application I built. It’s basically an AI agent, primarily based on local models (because there are some data compliance concerns from the customer if we use a cloud hosted model). The inference has to be local. In addition, there are some tools needed for web search, document analysis, and extracting of text from PDFs and so on. Basically, a hybrid RAG pipeline, but domain specific.

The setup is: Docling for PDF processing, PageIndex, Qdrant, BM25, and running the model locally using Ollama for now. When I deploy it, I’ll be using vLLM. All of the harness/agent code is in TypeScript. The app is an Electron desktop application. The primary goal of the application is to allow the customer to ask domain-specific questions, and the model should never try to answer from its own training. It should find answers through the local documents available via RAG. That’s the entire flow, but this needs to all run locally. No data should ever leave the system.

Workflow Link to heading

I started working with CC. First of all, I went into the plan mode, gave it all the context that I wrote above, and we worked on the initial design: how it should work end to end, and how all the components should come together. Everything was discussed with CC, and we came up with a decent spec document. Then I asked CC to divide the implementation into phases and tasks.

The first phase was to implement a very basic wrapper around the local model to ask questions through the API. The second phase was to use Docling as part of the backend server to analyze and extract text from PDFs. The third phase was to introduce BM25, Chunking, Vector Search, PageIndex for in-depth document analysis - all these were individual sub-phases. Fourth phase was to download all the relevant documents from a server and run Docling and PageIndex on them for getting the local data trees. And finally, the fifth phase was end-to-end testing and validation.

Findings Link to heading

Now, here are some key findings from working with CC on this production-ready application for continuously for several days. Of course, I was sleeping in the night, but the rest of the days I was pretty much chatting with CC.

Where the Agent Accelerated Everything Link to heading

First, the good parts. If I look at the entire flow, the application would have taken me at least three to four weeks to come up with the initial MVP itself, but CC did it in three to four hours (of course, with the right nudges and supervision). The agent’s ability to bootstrap and put together the boilerplate, have the right project set up, and get the initial positive flow working is absolutely amazing. It saves so much time.

The Mistakes I Had to Catch Link to heading

Firstly, when you give a clear goal to CC, it just does that thing. But the goal has to be abundantly clear, because it will not come back and ask you questions. It will just assume things and do those things for you, which must be avoided.

Secondly, in several cases I saw CC get into unneeded rabbit holes. To give an example, we were trying to figure out how to make the local model thinking, and CC went on to edit the manifest file of the model, and pass the parameters in several different ways. Ultimately it gave up and came back with a final answer that we should just leave it like that. Yes, really.

Then I just looked up the issue on the internet. I asked CC, “Why can’t we just use a direct API call with the Ollama API instead of going through the OpenAI API wrapper?” That way we can send the required configuration parameters in the request. It said, “Yes, that’s a good idea. Let’s do that.”

Another example was it kept doing three to four LLM calls per request when they could have been reduced to just one. When I looked into it and asked, “Why are we doing these three extra calls?” it said, “I thought those were needed, but they are clearly not needed.” The model makes some fundamental mistakes which cannot be caught if you do not have a coding background.

One more interesting thing that happened during testing was that there were three services that had to be started to test the entire flow. CC did not start two of them initially; it said those are not needed for this test flow. After the test run, it came back with, (paraphrasing) “Oh, those errors are there because those services are down.” I literally had to tell it, “You lazy bot, you need to run all the services and test it end to end,” and then it did that.

And, finally, it took us three to four iterations of code review, code clean-up, and security audit to finally have CC come back and say that there are no other issues to be found. It kept on finding more and more issues in all iterations of code reviews, and I asked it, “Why did you not find these issues before?” It always came back with some excuses.

The Result Link to heading

Finally, we did end up producing an app that looks decent as a local Electron application. It works. Of course, I had to ask the model to test everything multiple times, look at any security and resiliency issues, because in the beginning it just missed all that completely. It implemented a very basic positive flow in the first iteration, and after all the to and fro, we built an app that works with real customer data.

There are several things that we need to ask the models to fix explicitly in order to get a decent outcome. If we are not asking about code clean up, security, resiliency, testing, and end-to-end flow, the model is just going to produce crap.

Key Insights Link to heading

So these three days of intense AI-assisted coding confirm one thing for me, at least for now in May 2026: vibe coding is for people who have a developer background. If you are doing it and you have never written code before, you are only going to get something that may seem to work, but it could be insecure, and may not be for real usage scenarios.

If you have a coding background, these AI coding agents are superpowers. You don’t have to write any syntax anymore. All you have to do is think clearly, ask the right questions, and have the right vision for what you are building.

PS: This post was first published as an X article, here.