Plausibility is not reliability

This week's three pieces on decision-grade data made an argument I had wanted to make for some time, but it needed more than one article to unfold properly.

Tuesday set the standard.

Wednesday detailed the six criteria.

Thursday turned the camera ninety degrees and examined how the system fails when those criteria aren't in place.

What I noticed only after publishing all three is that they share an axis I never explicitly named:

The relationship between plausibility and governance.

A dataset becomes useful for action not when it's clean, but when its plausibility is controlled.

In ungoverned operating models, plausibility becomes the default proxy for reliability.

What looks like the right number is treated as the right number, and the system has no mechanism to detect the difference.

The dangerous data is not the data that fails the dashboard. It is the data that passes because the dashboard has no test beyond plausibility checks.

That's the layer beneath the week's argument, and the reason decision-grade is a governance category before it's a data category.

The rest of this synthesis traces how the three pieces of the week build toward that point and why the test isn't about how clean the data is but about whether the operating model can distinguish plausible from reliable.

The problem the week revealed

In most operations under stress, the failure isn't due to a lack of information.

It's the gap between information and decision-making.

Planners can see volatility climbing.

They can see OTIF sliding.

They can see expenses trending in the wrong direction.

What they can't agree on is which signal to trust enough to act on.

The default move, waiting for the data to clean itself up, is rational for any individual planner but catastrophic for the operating model.

By the time the data aligns, the backlog has shifted.

The reflex to wait isn't laziness or analytical weakness.

It's a structural artifact.

In the absence of a governing threshold, audit-grade becomes the default standard, with no clock.

So decisions wait.

And the cost compounds quietly while everyone stays defensible.

Two failure modes that look like one

This week's most useful observation, I think, is that what we call "bad data" is actually two very different problems with very different costs.

The first is data that's obviously wrong.

It gets caught.

People question it.

It triggers escalation.

The system has reflexes for it.

The second is data that's almost right.

It looks like the number you'd expect.

It passes through governance because it doesn't trigger the reflex to doubt.

It quietly justifies a decision that should have been questioned.

AI compresses this asymmetry.

Models are optimized to produce outputs that are plausible, coherent, well-formed, and the right shape.

They are not optimized to flag those that are subtly off.

The most expensive data in operations isn't wrong. It's plausible enough to act on before anyone questions it.

This is where the conversation about AI data quality mislocates the problem.

The real question isn't whether the data is clean.

The real question is whether the system has any mechanism to catch plausible-but-wrong before it becomes a decision.

What "good enough" actually means

The right standard isn't perfect.

It's decision-grade.

Decision-grade is a threshold defined by the decision, not a purity level defined by the data.

The same dataset can be decision-grade for one operating model and inadequate for another because the operating model determines the threshold, the owner, and the cadence at which data turns into action.

This shifts the nature of the investment.

Most companies that decide to improve data quality invest in the technical layer:

ETL refinement.

Master data.

Single-source-of-truth architectures.

Those investments are real.

They don't, by themselves, produce decision-grade data.

Data quality and decision quality are different categories of problem.

The cheapest way to move data from available to decision-grade is rarely additional cleansing. It's governance.

The governance layer underneath

In this week's long-form article, I outlined six criteria for decision-grade data.

Freshness and completeness thresholds are familiar territory.

They're what most operations measure when they talk about data quality.

The other four, source reliability, variance signal, decision linkage, and audit trail, are governance criteria.

These four governance criteria catch the plausible-but-wrong.

Source reliability asks whether the data's source has operational control over the reality the data is supposed to describe.

Variance signal asks whether the data can distinguish real exceptions from normal noise.

Decision linkage ties every piece of data to a specific decision, owner, threshold, and cadence.

An audit trail records the history of decisions made under similar conditions, so thresholds get calibrated over time rather than argued from scratch each cycle.

Together, these four are the test the data either passes or fails.

They're also the reason most data quality projects don't produce decision quality.

They invest in the technical layer and leave the governance layer untouched.

An immaculate data layer feeding an ungoverned operating model produces immaculate dashboards and slow decision-making.

What this week made explicit

The axis I noticed, only after publishing, of plausibility versus governance is what makes the six criteria more than a checklist.

Without the four governance criteria, plausibility becomes the default test.

Does this number look right?

If yes, it gets acted on.

If no, it gets escalated.

The system has no third path.

No mechanism to say:

"This number looks right, but I'm not authorized to trust it for this decision."

The four governance criteria establish that third path.

They turn plausibility into something the system tests against rather than relying on it.

That's the operating model that uses AI well: not the one with cleaner data, but the one that doesn't confuse plausibility with reliability.

Perfect data often becomes the most sophisticated alibi for inaction.

Decision-grade data is a harder standard because it forces the operating model to commit.

Once committed, the data either supports the commitment or it doesn't.

The argument progresses from data to governance.

That is where the leverage has always been.

Next edition

The next edition examines what separates teams that use AI well from those that don't and why the differentiator turns out to be something other than the prompt.

If decision-grade data is the standard for input, the next question is who's qualified to interpret the output.

The answer is less about prompting and more about a harder-to-teach skill.

If you want the operating-level diagnostics that don't fit a long-form article, I publish them weekly.

This newsletter goes deeper into a single argument every other Wednesday.

The Tuesday and Thursday posts on LinkedIn are where the smaller, sharper observations land first.

If you want the operating-level diagnostics that don't fit a long-form article, that's where I publish them weekly: https://www.linkedin.com/in/psegala/

This newsletter goes deeper into a single argument every other Wednesday. The Tuesday/Thursday posts on LinkedIn are where the smaller, sharper observations land first.

Plausibility is not reliability.

The problem the week revealed

Two failure modes that look like one

What "good enough" actually means

The governance layer underneath

What this week made explicit

Next edition

Keep Reading

Putty

Frameworks that turn supply chain noise into executive decisions.