Where AI helps and where it lies to you

I run LLM inference infrastructure at work — the actual GPU-level stack, scheduling, model serving — and I use a coding assistant every day on top of that. That combination gives me an unusual vantage point: I know roughly what's happening underneath, and I still find the output surprising in both directions. Sometimes surprisingly useful. More often, surprisingly confident while being wrong.

This isn't a take about whether AI is good or bad for the industry. It's what I've actually observed, working with it as infrastructure I maintain and as a tool I use.

Where it genuinely helps

The clearest wins are in work that's structurally obvious but tedious to type. Database migrations, configuration boilerplate, test scaffolding — these have a recognizable shape that most codebases follow, and a language model is very good at reproducing that shape. When I need a new migration with five columns and the right foreign key constraints, I describe it and review the output. It's usually correct. It took ten seconds instead of three minutes.

Regex is another one I'll admit to openly. I know what pattern I need to match. I don't want to count capture groups and escape characters by hand. AI generates a first draft; I verify it against real examples. This is the right division: AI does the mechanical transcription, I do the verification.

Reading unfamiliar code has also been genuinely useful. When I'm dropped into a codebase I've never seen — a client's legacy system, a library I'm integrating — asking for an explanation saves time. Not because I couldn't read it myself, but because a 30-second summary lets me decide whether I need to read it carefully or just call it. For a codebase with no documentation and unclear naming conventions, this is real.

The last category is translation between contexts I know well. Here's this Go struct — give me the equivalent Eloquent model. I know both sides. The AI bridges the mechanical distance. I review the output, catch the edge cases it missed, and move on. The time savings are real without the risk being high, because I can verify every line.

Where it lies to you

The most dangerous failure isn't the obvious one. It's the confident, plausible-looking answer that falls apart when you actually run it.

I've had AI generate code where the function signatures are right, the variable names make sense, the overall structure is exactly what I asked for — and the actual logic is a stub. Not a stub that says // TODO: implement this. A stub that looks complete. A loop that iterates but never accumulates. A conditional that branches but returns the same value either way. The code compiles. It might even pass shallow tests. You find it when the output is wrong and you trace back to discover a function that was doing nothing the whole time.

This happens because the model is completing a pattern, not implementing an algorithm. The shape of finished code and the substance of finished code look similar at the token level. The model is optimizing for the former.

The second category is hallucinated APIs. I've seen AI suggest library methods that don't exist, written with the right naming convention and the right parameter shape for that library — just not actually present in it. If you don't already know the library well, you might spend twenty minutes debugging before you think to check the documentation. The model has no representation of "I'm not sure this method exists." It generates the most plausible-sounding completion and presents it with the same confidence as something it knows.

Architecture advice is where I'm most skeptical. AI can describe patterns it has seen — CQRS, event sourcing, service layers. It can generate code that looks like those patterns. What it cannot do is reason about your specific system: the team size, the operational overhead, the existing coupling, the way this service will be queried at 3am when traffic spikes. I've seen it suggest approaches that would work fine in isolation and create serious problems in context. Race conditions it didn't model. Transaction boundaries it assumed away. Scaling assumptions that don't hold for the actual data volume.

The real problem isn't that AI gives wrong answers. It's that it gives wrong answers in the same tone as right ones.

How I actually use it

I've settled into a workflow that's specific about what I delegate and what I don't.

Boilerplate, test scaffolding, and translation tasks — I use AI freely. The risk is low, the verification is fast, and I know the domain well enough to spot errors immediately. For anything touching security, authorization, or financial logic — I write it myself, or I write the structure and treat the AI output as a draft I'll likely rewrite. This isn't caution for its own sake. It's that I've seen too many confident wrong answers in exactly these areas to trust a fast first draft without thorough review.

The biggest change in my workflow has been asking for explanation alongside code. When AI generates something non-trivial, I ask it to explain what the code does. If the explanation is wrong or vague, the code is suspect. This catches the stub problem more reliably than reading the code alone — I'm forcing the model to produce two representations of the same thing. When they don't match, I know something is off.

I've also stopped using AI for anything specific to the infrastructure I maintain. For the LLM stack — the actual configuration, the optimization parameters, the integration details — the models consistently produce answers that look like documentation for a slightly different version of the software. Sometimes software that doesn't exist. This is a specific failure mode: the models have seen too little real-world usage of these tools to give reliable answers. The irony of LLMs being bad at answering questions about LLM deployment is not lost on me.

One honest thought

The developers who get the most value from AI tools are the ones who already know what good output looks like. You need enough domain knowledge to verify the answer — which means the tools amplify existing competence more than they replace missing competence.

If you're using AI to learn a new area, use it to ask questions and explore concepts. Don't use it to generate answers you're going to ship without fully understanding. The confident wrong answer is much more expensive than no answer at all — it doesn't just fail, it fails in a way that looks like it should work, which costs you the time you would have spent thinking clearly in the first place.

That's the thing worth remembering. Not that AI is unreliable — you can work around unreliability. It's that the unreliability is invisible until it isn't.

If you're building something that involves AI integration and want someone who has actually run this stuff in production — I'm available for freelance work.