How we score

Why the main result is the level, how we work with evidence, and what SP Score is for.

Scoring follows one simple rule: real behavior matters more than self-assessment.

If someone says they use AI in an advanced way but cannot describe any repeatable workflow or concrete output, the result reflects that. If someone rates themselves modestly but shows clear examples and finished workflows in the interview, the result can be higher.

#The main result is the level

First we determine the level of work with AI. It says how much AI is part of someone's real work:

experimenting,
regular use,
building personal workflows,
changing a working system.

The level is not a simple sum of points. It is an evaluation of evidence: what the person actually does, how often, with what impact, and what outputs are created.

#SP Score is a supporting number

SP Score does not say "how good someone is". It shows their position inside the level.

That matters mainly in repeated measurement. A person can stay on the same level but move significantly inside it. For example, from early L2 to strong L2 close to L3. The level name alone would not capture that movement.

#What we look at during evaluation

When scoring, we look at several things at once:

Area	What we care about
Frequency	Whether AI is an occasional attempt or a normal part of work
Depth of work	Whether someone accepts the first answer or iterates and improves the output
Tools	Whether they use the right mode for the type of work
Outputs	Whether concrete documents, templates, tools, or decisions are created
Impact	Whether AI changes one task or an entire repeated workflow

No single item is enough on its own. For example, using an advanced tool does not automatically mean a higher level. What matters is what the person does with it.

#Why two people at the same level can have different numbers

Two people can both be L2 Builder because both build repeatable workflows with AI. One can still have an SP Score of 60 and the other 75, because the evidence is not equally strong.

The numbers are not a standalone grade. They show how developed and convincing the same level looks in practice.

An L2 Builder with a lower score typically has first working templates, an assistant, or a workflow. They use it mainly for themselves, the impact is described more qualitatively, and the workflow is still becoming stable.

An L2 Builder with a higher score shows the same level in a more developed way. The workflow is used repeatedly, it produces concrete artifacts, the person can explain why it works, they choose more suitable tools, and there is visible impact on time, quality, or other people's work.

That does not mean the first person is "worse" or that the second person received points for using more tools. It means the second person is closer to the L3 boundary inside L2. That is why an individual report reads the number together with the level, profile, and evidence.

#What can move the number up or down

The score starts from the determined level. Then adjustments move it within that level based on how convincing the evidence is.

What we see in the evidence	How it affects the score reading
A clear repeatable workflow	The number tends to move up, because it is more than a one-off attempt
Concrete artifacts	The number tends to move up when they have real impact
More suitable tools	The number moves up only when the tools change the quality of work
Weak or generic evidence	The number stays lower, even if someone says they use AI often
Ad hoc use without a stable output	The number can stay lower even with high frequency

Adjustments are not reward points. Their main job is to prevent the same number from meaning completely different realities for two people.

#Why evidence is not counted twice

If building personal repeatable workflows is what places someone at L2, we do not add the same fact again as a bonus to the score. Otherwise the number would exaggerate.

The score inside a level should show what is above the normal representative of that level: output quality, breadth of use, stability of habits, or impact on other people.

#What re-assessment is for

Repeated measurement is often more useful than the first result. It shows what changed after a program, coaching, or several months of personal practice.

Example:

First measurement:  L2 Builder, early L2
Repeat measurement: L2 Builder, strong L2 close to L3

The level is the same, but the progress is meaningful. This is where SP Score is most useful.

#What scoring does not do

It does not assess personality or job performance.
It does not reduce a person to one number.
It does not determine the level backwards from the score alone.
It does not punish beginners. AI Explorer is a legitimate starting phase.

If the number and concrete behavior point in different directions, behavior wins. The score is a tool, not the main truth.