The AI capability visibility gap

Glenn Martin
Feb 17
4 min read

Why HR doesn’t have an AI tools problem, it has a judgement problem it cannot yet see.

In a recent session with a Global Leadership team, I opened with a simple line:

HR does not have an AI tools problem.

HR has a capability visibility problem.

The room went quiet, and not because it was controversial, but because it was uncomfortably accurate.

Across enterprise and scaling People functions, AI adoption is now measurable. Dashboards exist, licences are activated, prompts are being shared, and usage is climbing.

But something more important remains largely invisible.

Judgement.

The metrics we can see

Most People Leaders are measuring AI adoption through proxies such as:

Tool usage
Licences activated
Prompts shared
Time saved claims
“Best practice” examples

These metrics are understandable, they are easy to capture, and they fit neatly into activity trackers and dashboard, and especially board slides. They create the impression of momentum.

But they measure activity, not capability.

They tell you whether AI is being used.

They do not tell you whether AI is being used well, and in scale-ups and enterprise environments, that distinction matters.

What actually compounds

The capabilities that shape long-term performance look very different:

Judgement under uncertainty
Confidence to challenge outputs
Decision quality with AI in the loop
Consistency across the team
Knowing when not to use AI

These do not show up on a usage report.

They show up in moments of friction, escalations, legal reviews, and most importantly, in employee trust; and by the time they become visible, the cost of misalignment is already higher.

Example 1: Confident use without calibrated judgement

A People Leader uses generative AI to draft a performance feedback summary; it reads well, it is structured, and it saves time.

A less capable user sends it directly to the employee.

A more AI-literate user pauses.

They ask:

Has the tool removed nuance that matters legally?
Has it softened language that required clarity?
Has it introduced phrasing misaligned with our culture?

The difference between these two users is not technical skill.

It is calibrated judgement.

The real test of capability is not “Can you generate something quickly?”It is “Can you detect when something sounds right but is subtly wrong?”

That pause, that interrogation of output, that refinement. That is capability.

Example 2: High individual variance

In the same leadership session, I introduced another pattern I’m seeing that expanded on the scenario in Example 1 (above);

Imagine two People Leaders drafting performance feedback summaries with AI assistance, without shared reasoning standards and no explicit boundaries on AI use; the result is zero clarity on what “good AI-assisted work” looks like.

This is not a tools problem. It is a capability alignment problem, and the risk remains invisible until someone challenges the validity of the output.

An employee challenges the performance feedback as generic.Legal flags inconsistency in tone in the summary. Trust erodes because language feels impersonal or templated.

Multiply that across geographies, business units, and managers, and you do not have a productivity gain, you have a cultural fracture forming.

The absence of shared judgement standards is rarely visible at the beginning. It becomes painfully visible later.

Example 3: Measuring activity, not decision quality

Company-wide reporting of AI usage often includes statements like:

“We saved four hours.”
“We automated X process.”
“We drafted policies faster.”

These statements sound impressive, but they do not answer the questions that matter:

Did decision quality improve or erode?
Did clarity increase?
Did bias reduce or amplify?
Did trust in the output increase or decrease?

Speed without improved judgement is not maturity, it is just acceleration.

The uncomfortable truth is that proxy metrics make transformation feel safer than it is, because they create visibility around activity while obscuring variance in judgement.

How do I define AI capability?

In simple terms, I define AI capability as four things:

Knowing when AI is appropriate and when it is not.
Being able to explain how AI influenced a decision.
Challenging outputs rather than accepting them at face value.
Applying AI consistently across similar scenarios.

Notice what is absent - there is nothing here about prompt sophistication, the number of tools used or about how fluent someone sounds when describing AI.

Capability becomes visible when judgement becomes explainable.

That line matters because boards, regulators, and employees are not going to ask:

“How many prompts did you write?”

They will ask:

“Why did you make that decision?”“How did AI influence it?”“What safeguards were in place?”

If your leaders cannot answer that clearly, you do not have AI maturity, you have AI activity.

The Leadership Shift

For CPOs, CHROs and global People teams, this is a maturity question.

AI adoption is not a procurement milestone.It is a behavioural shift.

And behavioural shifts cannot be managed purely through licences and learning portals.

The People function sits in a uniquely sensitive position:

Performance management
Policy interpretation
Workforce planning
Talent decisions
Employee relations

These are judgement-heavy domains, and AI will increasingly sit inside them.

If capability variance remains invisible, risk accumulates quietly.

Not dramatic risk.Not headline risk.Subtle, compounding inconsistency.

In tone.In fairness.In clarity.In employee experience.

That is where the real exposure sits.

Why the capability gap exists

There are three structural reasons the capability gap has developed:

Activity is easy to measure, judgement is not.
Usage feels like momentum, calibration feels slower.
Leaders are rewarded for visible adoption, not capability refinements.

So teams optimise for what is visible.

Licences. Training completion. Prompt libraries. Internal AI champions.

None of these are wrong, they are simply insufficient. Without a way to surface judgement quality, organisations are scaling variance.

Three questions for People Leaders

If you are leading a Global People function, start here:

Where do you see evidence of good judgement, not just AI usage?
Where would inconsistent or poor AI use create risk in your organisation?
What capability would you need to see clearly before scaling AI any further?

If those questions feel difficult to answer with evidence rather than anecdotes, you are likely facing a capability visibility gap, and that gap will not close itself.

Your next phase of AI maturity will not be defined by better tools, it will be defined by leaders who can make judgement explainable.

That is the real signal of AI readiness.