AI Coding Agents in Practice

Claude, Cursor, Kimi, Google Antigravity, OpenAI Codex. the list goes on for seemingly forever. They are all good right off the bat, they create things that compile and launch in a session. But 30,000 lines down? I went on a quest to just use them and find out. No science here, just vibes.

My Progression

Cursor

I started my vibe-coding adventure like many others, with Cursor. It’s a known environment (VS Code) and I felt like I was actually doing something. In the beginning, I was coding alongside the thing. Cursor is very usable as a pair programmer, even in full vibes, it’s telling you what it’s doing and you actually feel like you might learn something through osmosis.

To be honest, the journey with Cursor was great. it gave me good feedback. It was resistant to doing stupid things because I said so. It implemented a good structure, but it was still early days of my apps; they were small in scope.

Cursor also helped a bunch with my move from Windows to Linux. It rewrote my Docker containers as Podman quadlets, it helped me debug audio issues. It’s a very competent systems admin with some guidance.

This was fine for a while. I used Cursor every day for a few months, but then Cursor started getting real limited in terms of output. I suddenly reached my monthly limit only a few days into January. So I wanted to look for options.

To be fair, by their usage graph, I had used over $200 of API calls in a single day.

A Note About System Rules

You should know: no matter how fancy these apps say they work, with rules, commands, planning modes, agent modes, etc., it’s all just text in → text out.

Planning mode is just a rule saying: “You are in planning mode.”
No matter how many fucking system rules you create, they will eventually get ignored.
You MUST oversee what these agents currently do. They will happily create functions with 400+ points of “Cognitive Complexity” in SonarQube (the max is 15), that has so many nested if statements you have no idea how you even got there.
They will happily make all sorts of stupid systems that you asked for, without giving feedback on whether there’s a better way to do this to begin with.
For example: if you ask it to filter results from an API, it will just filter on your client, instead of actually updating the API call to filter out certain context.
If you’re integrating with external APIs, it’s pretty much up to you to figure those out.

OpenAI

OpenAI Codex 5.2 was next up. I chose to run it in opencode, and it’s probably the best coding agent I’ve tried for an hour. Until the $20 plan ran out for the session.

256K context is a lot, but these coding agents do not care about your wallet. They only care about output. So they will fill your entire context and ship your whole codebase with every interaction. They only start reducing it once you hit the ceiling, then it just fills right back up.

So it’s up to you to manage context. But when your project rises in complexity, it also rises in costs, quite dramatically. Because your coding AI agents do not give a shit about writing small functions.

Self-Hosted LLM

Just don’t. I’m not sure what kind of beast you’d have to be running to get a usable context size for your model. My 9070 XT can answer questions about coding, but you cannot vibe code. Don’t try.

Kimi

To cut costs, I thought I’d go Chinese. As long as I make code that is already generated in a sense, I don’t mind the Chinese seeing it, they will scrape it anyway. And if they can do it for less $$$, I’m all in.

I had heard good things about Kimi, people are constantly wanting to get it included in Cursor, so I thought I should check it out.

You can argue with the Kimi bot to get a reduced first month. I used OpenAI to argue with Kimi… which eventually got my cost down to $3.50 for the first month, fun feature.

Well, too bad. Kimi is actually terrible.

It’s not a good model for coding. I’m not sure what these people have been smoking when they made up their benchmarks, but even a standard TypeScript project with Vue.js and Bun is not something that Kimi can handle.

It constantly:

Fails to parse input
Makes TypeScript mistakes
Eventually gives up, triggers a git revert, and then tries again, until it fucks up badly enough again, and git reverts once more.

Just wasting my tokens on absolutely nothing.

Basically, it produced nothing of value, while the others shipped features. Kimi shipped build errors and git headaches.

Claude Code

Back to America, with Claude Code. Claude has its own coding agent (and seems to be locked to that for now), so I ran that instead of OpenAI.

But they both perform remarkably similar. Claude is also pretty smart, implementing new features without issues, but again, with some projects that have 3000+ lines of code spread across several files, you just run out of money real fast. It probably took an hour with the tool again before it backed me off.

Future

If I had to spend my money somewhere, I would probably spend it on Cursor, it’s the only tool where I felt like I had a semblance of control.

At the moment, I have 4 paid subscriptions, thhat i have to rotate between, very similar to our TV watching habits, but this is more expensive, totaling $100 monthly. I should probably stop making new projects just for the sake of it…

So What Do I Think?

Coding Agents are very cool
Cursor is the only one where I feel I learn something and am kept in the loop of what’s going on with my project
Running CLI agents seems optimized to only code and give very sparse feedback
💸 They drain your wallet
It’s going to be real scary once they actually charge what they need to keep this profitable

It’s very fun to just have an idea and make it come to life in 30 minutes.
But things get weird whhen you go too far with it, have a clear scope. and dont include features just for the sake of adding features. It migt not be good for your wallet or mental health.

← Back to all posts