Why AI Training is a poor measure of AI Competence (and what to measure instead)

Glenn Martin
Feb 17
4 min read

Most organisations now accept that AI training is necessary.What far fewer are clear on is how to tell whether that training actually worked.

Completion rates are high, internal feedback is positive and there is a perception that collective confidence has gone up.And yet, weeks later, decision quality looks unchanged.

That gap is not accidental. It is structural.

Why AI competence is so hard to measure

There is no globally recognised scale for AI competence, and that is not an oversight, it is a consequence of two realities.

First, the tools are changing so fast, any fixed definition of “AI skill” is becoming quickly outdated, almost as soon as it is written.

Second, it is unclear what such a scale would even measure. Prompt writing? Tool familiarity? Model logic? Build logic? Governance awareness? Decision judgement?

Most organisations quietly solve this ambiguity by reaching for the nearest available proxy: training completion.

That shortcut is understandable, it is also misleading.

The proxy problem: when training stands in for competence

A common assumption still holds inside leadership teams:

Employee X completed Training Y, therefore Employee X is competent.

This is a category error: confusing exposure with capability.

Training delivers knowledge, context, and a shared baseline; at best, well-designed training can also surface early capability through practical exercises.

What it cannot do is guarantee competence.

Competence only emerges through repeated application, judgement under pressure, critical analysis of outcomes, and adaptation as conditions change.

This distinction matters more for AI than almost any other skill domain.

Why AI competence is different

AI competence cannot be treated like software adoption or tool onboarding.

It is different for three reasons.

First, AI outputs are uncertain but linguistically confident. The system speaks fluently even when it is wrong, and humans are not wired to question confident language.

Second, errors slip unnoticed into downstream work. A flawed summary becomes the basis for a decision and a weak assumption may become a policy. The mistake rarely announces itself at the point of origin.

Third, the cost of misplaced trust is delayed, not immediately visible. By the time the impact shows up, the source is often forgotten.

On top of this, most AI systems generate outputs without observable reasoning chains, and confidence is separated from correctness. The result is that human trust calibrates badly when systems feel authoritative, but provide no clear trail of how they reached a conclusion.

This is why AI competence is less about usage and more about critical thinking and judgement.

A cleaner definition of AI competence

Gartner frames this cleanly:

“AI competence exists only insofar as it improves decision quality at scale.”

I like this definition because it cuts through the noise by being concise and memorable.

It avoids tool obsession.It avoids skill inflation.It anchors competence to outcomes that matter.

Under this definition, an organisation is not competent because its people use AI often, it is competent because its people make better decisions when AI is involved.

That also exposes the flaw in how competence is usually measured.

Training is not a measure of competence

Training is not the problem. Misuse of training is.

Two statements can be true at once:

Training is not a measure of competence.
Training is a diagnostic input into a longer capability system.

Most AI training today is treated as an endpoint, and this is where things break.

If training is well-scoped, it should do three things:

Establish shared language and mental models
Reveal capability gaps
Create a safe environment for early testing

What it should not do is certify readiness.

When organisations treat training as proof of competence, they end up assuming that “everyone is trained” means “everyone is capable”. The capability gaps remain. They simply move underground, where they show up later as poor judgement, over-reliance, or silent risk.

Competence as stability under change

Immediate post-training behaviour is a weak signal. People are motivated, attentive, and temporarily cautious.

The real signal appears later.

AI competence shows up as stability under change, not repetition under fixed conditions.

Can people still question outputs when the tool updates?Do they adapt their judgement as prompts, models, or interfaces shift?Do they know when to slow down rather than speed up?

This is why a 30–90 day window matters more than day one enthusiasm.

Competence is not about remembering what was taught, it is about maintaining judgement as the environment moves and changes.

A more honest way to measure AI competence

If training cannot be the proxy, what replaces it?

The answer is not a single metric, it is a sequence.

1. Capability maturity framingSelf-identification creates awareness, not validation. People need a way to locate themselves without being certified prematurely.

2. Practical testing in trainingThis is where perceived competence meets reality. Exercises reveal gaps that confidence surveys never will.

3. Repeated use with behavioural indicatorsLook for observable changes in how people question, verify, and decide. Not outputs produced, but judgement applied.

4. Work redesign and decision auditsCompetence must be embedded into workflows. If judgement is optional, it will disappear under pressure.

5. Leadership coachingJudgement is socially reinforced. If leaders do not model challenge, caution, and override, teams will not sustain them.

This sequence aligns with how capability actually forms inside organisations, it also matches what McKinsey, Gartner, and learning science research have been saying for years, even if the language differs.

The test that actually matters

If your organisation cannot explain how AI competence shows up in everyday decisions 90 days after training, you are not measuring competence.

You are measuring attendance.