I am working on a project that uses LLM to pull certain pieces of information fr...

systemerror · 2025-08-21T16:55:53 1755795353

The big issue with LLMs is that they’re usually right — like 90% of the time — but that last 10% is tough to fix. A 10% failure rate might sound small, but at scale, it's significant — especially when it includes false positives. You end up either having to live with some bad results, build something to automatically catch mistakes, or have a person double-check everything if you want to bring that error rate down.

f3b5 · 2025-08-21T18:01:26 1755799286

Depending on the use case, a 10% failure rate can be quite acceptable. This is of course for non-critical applications, like e.g. top-of-funnel sales automation. In practice, for simple uses like labeling data at scale, I'm actually reaching 95-99% accuracy in my startup.

spogbiper · 2025-08-21T16:59:18 1755795558

yes, the entire design relies on a human to check everything. basically it presents what it thinks should be done, and why. the human then agrees or does not. much work is put into streamlining this but ultimately its still human controlled

wredcoll · 2025-08-21T17:05:35 1755795935

At the risk of being obvious, this seems set up for failure in the same way expecting a human to catch an automated car's mistakes is. Although I assume mistakes here probably don't matter very much.

LPisGood · 2025-08-21T17:15:05 1755796505

This reminds me the issue with the old windows access control system.

If those prompts pop up constantly asking for elevated privileges, this is actually worse because it trains people to just reflexively allow elevation.

spogbiper · 2025-08-21T17:58:27 1755799107

yes, mistakes are not a huge problem. they will become evident farther down the process and they happen now with the human only system. worst case is the LLM fails and they just have to do the manual work that they are doing now

whatever1 · 2025-08-21T17:03:25 1755795805

All of the AI projects promise that they just need some fine tuning to go from poc to actual workable product. Nobody was able to fine tune them.

Sorry this is some bull. Either it works or it doesn’t.

LPisGood · 2025-08-21T17:11:50 1755796310

> its going to save the clerical staff hundreds of hours per year

How many hundreds of hours is your team spending to get there? What is the ROI on this vs investing that money elsewhere?

spogbiper · 2025-08-21T18:03:51 1755799431

Can't speak to the financial benefit over other investment. Total dev/testing time looks to be fairly small in comparison to time saved in even one year, although with different salaries etc I cannot be too certain on the money ratio. Ultimately not my direct concern, but those making decisions are very happy with results so far and looking for additional processes to apply this type of system to.

kjkjadksj · 2025-08-21T16:54:06 1755795246

Isn’t that something you can do with non ai tooling to 100% accuracy?

spogbiper · 2025-08-21T17:01:46 1755795706

in some similar cases yes, and this client has tried to accomplish that for literally decades without success. i don't want to be too detailed for reasons, but basically they cannot standardize the input to the point where anything non AI has been able to parse it very well.

beepbooptheory · 2025-08-21T16:49:10 1755794950

How will you know in practice which 5% is wrong?

spogbiper · 2025-08-21T16:52:24 1755795144

the system presents a summary that a human has to approve, with everything laid out to make that as easy as possible, links to all the sources etc