AI Work Levels
The four levels of AI maturity, the typical signals in work, and the difference between using, building, and transforming with AI.
The L0-L3 levels do not measure how many AI tools someone knows. They measure how someone actually works with AI today: whether they are still trying it, use it regularly in everyday work, build repeatable workflows and tools with it, or use it to change a whole way of working, agenda, or operating model.
SP Assessment is therefore not a knowledge test or a personality typology. It is a behavioral picture of current practice: what someone does with AI, how often, with what context, how they iterate, what comes out of it, and whether their work really changes. In the report, this picture connects to the individual result and the next recommendations.
The methodology uses these levels:
| Practical label | Canonical name | In one sentence |
|---|---|---|
| L0 | AI Explorer | Tries AI, but does not yet have a stable working habit. |
| L1 | Operator | Uses AI regularly in everyday work, can provide context and iterate. |
| L2 | Builder | Turns repeated work into workflows, templates, assistants, or tools that can be reused. |
| L3 | Transformer | Changes a whole way of working, agenda, or operating model, not just personal productivity on one task. |
#L0 AI Explorer
An L0 Explorer tries AI and looks for where it can genuinely help. They may already have a few useful outputs, but they do not yet have a stable habit, a clear repeated use case, or their own workflow.
#L0 signals
- uses AI occasionally, irregularly, or only when prompted,
- tries individual questions, drafts, summaries, ideas, or translations,
- often accepts the first usable output without much iteration,
- does not have a saved workflow, template, assistant, or repeated process,
- cannot yet say clearly when to bring AI in and when not to,
- has low confidence in tools, data, or output quality,
- their AI use is more experiment than normal part of work.
#What must be visible in the evidence
One concrete work AI use case is enough for someone not to be a "zero". For example: they used AI to prepare a first email draft, summarize a document, explain a topic, or suggest an approach.
Without a concrete work example, the evidence remains weak. A person can know about AI from hearing about it, but the level is not determined by knowledge of concepts.
#What L0 does not have yet
- regularity,
- productive iteration,
- several work use cases,
- a saved workflow,
- a repeatable artifact,
- a change in working workflow.
#Further growth
Find the first practical situations where AI saves time or improves the output. The goal is not to automate work immediately, but to reduce friction and build a basic habit.
#Typical score
0-24 SP Score.
Zero does not mean a "bad result". It means we do not have meaningful evidence of practical work use of AI. An active Explorer with a concrete work use case should receive an activation floor of roughly 8-10.
#L1 Operator
An L1 Operator uses AI regularly inside existing work. AI helps with research, writing, analysis, preparing materials, summarizing, checking, or decision-making, but the basic workflow has not fundamentally changed yet.
#L1 signals
- uses AI regularly, often daily or weekly,
- has 2-3 concrete work use cases,
- can provide context to AI,
- does several rounds of iteration,
- adjusts the prompt based on the output,
- checks important outputs,
- uses AI for real work deliverables,
- work becomes faster or better, but the workflow stays the same.
#What must be visible in the evidence
L1 starts where AI is no longer just experimentation. The person uses it repeatedly at work and can describe concrete situations:
- "I use AI to prepare materials for meetings."
- "I have documents summarized and then iterate on the result."
- "I give AI context, ask for a first draft, then refine the output."
- "I use it for emails, research, translations, and checking."
One narrow use case can be enough for a weak L1 if it is repeated, work-relevant, and the person demonstrably iterates.
#What L1 does not have yet
- a persistent AI-enabled artifact,
- their own template or assistant used repeatedly,
- a workflow with clear inputs, steps, and outputs,
- a tool, script, or dashboard,
- a change in the operating model of work.
Frequent AI use by itself is not enough for L2. If someone starts from scratch every time, it is still L1.
#Further growth
Systematize what already works: save workflows, create templates, assistants, prompt patterns, or the first simple workflow.
#Typical score
25-45 SP Score.
A strong L1 can have a higher number than a weak L2 because score and level bands overlap. For the level itself, behavioral evidence matters more than the number.
#L2 Builder
An L2 Builder no longer uses AI only for individual tasks. They build repeatable work systems from it: workflows, templates, assistants, scripts, mini-tools, dashboards, prompt libraries, or custom GPTs.
#L2 signals
- has the L1 foundation: regular AI use, context, iteration, and real outputs,
- creates artifacts that persist between sessions,
- turns repeated work into a repeatable process,
- uses their own templates, assistants, prompt patterns, scripts, or tools,
- can describe inputs, steps, outputs, and when they use the workflow,
- works with documents, data, spreadsheets, CRM, internal materials, or other real context,
- improves and saves workflows,
- can verify output quality,
- starts thinking in systems, not just individual tasks.
#What must be visible in the evidence
L2 is based on reuse. It is not enough to say "I use AI often". There must be something the person uses again:
- a saved prompt pattern with clear inputs and usage,
- their own assistant or custom GPT,
- a template for repeated output,
- a workflow for processing documents or data,
- a script, dashboard, prototype, or mini-tool,
- a process someone else could also use.
It is not about the number of artifacts. One good workflow can be enough if it solves repeated work and the person truly uses it.
#What L2 is not yet
- merely storing old materials,
- one prompt without real reuse,
- frequent chatting with AI without a persistent workflow,
- one app or automation that has not changed a broader work cycle,
- personal acceleration of one task without a repeatable system.
#Further growth
Move from personal tools to shared systems: team workflows, documentation, governance, impact measurement, quality standards, and scaling.
#Typical score
45-78 SP Score.
A chat-based Builder is usually in the lower or middle part of the band. A code-based Builder who works with tools, data, workflows, and measurable impact can be high, sometimes near the L3 boundary.
The level is not determined by the number alone. The proof matters: whether AI creates a repeatable work system.
#L3 Transformer
An L3 Transformer has changed how work actually gets done. They do not have to change a whole company. It is enough that they have fundamentally rebuilt their own key agenda, work cycle, team process, or operating model around AI.
#L3 signals
- can describe a clear "before vs. now" for a whole repeated agenda,
- AI is part of the working system, not just a helper for individual tasks,
- the system is the default way of working for a whole class of tasks,
- works through roles, steps, rules, context, and checks,
- the impact affects rhythm, capacity, quality, or decision-making across the whole agenda,
- the system can be used by another person or team, or is designed that way,
- there is impact evidence: time saved, higher output volume, better quality, sharing, adoption, or process change.
#What must be visible in the evidence
L3 requires a "before vs. now" change. The person must be able to describe how the work used to be done, how it is done now, and what changed because of AI.
Examples of L3 evidence:
- "We used to prepare the report manually for several hours. Now we have an AI pipeline that creates the first version from data, a human checks it, and the team makes decisions from it."
- "AI is not just a tool at the beginning. It is built into the whole process from input through analysis to output and review."
- "I use this system for the whole agenda, not for one isolated task."
- "The rhythm of work changed: what used to be monthly is now weekly; what used to be manual now has a clear workflow."
#What L3 is not yet
- a management position by itself,
- use of agents without a change in work,
- one high-quality assistant,
- one automation,
- a personal tool without impact on the whole agenda,
- high SP Score without behavioral evidence.
Builder builds tools. Transformer changed the way work is done.
#Further growth
Improve the operating model: measure impact, solve reliability, documentation, governance, sharing, adoption, and other areas where AI can change work capacity.
#Typical score
72-96 SP Score.
L3 is not a title for managers. An individual contributor can be L3 if AI fundamentally changed how their key work is produced. Conversely, a manager is not automatically L3 if they have not actually rebuilt a working system with AI.
#Signals We Collect
SP Assessment combines the Spark questionnaire and the Deep interview. The questionnaire gives the declared profile skeleton. The Deep interview validates real behavior and has priority when determining the level.
#1. Frequency of use
We look at whether someone uses AI rarely, sometimes, most days, or several times a day.
Frequency helps, but does not determine the level by itself. Someone can use AI daily and still be L1 if they start from scratch every time. L2 begins only when a repeatable system appears.
#2. Work use cases
We look for concrete work situations:
- what task the person was solving,
- how they involved AI,
- what context they gave,
- how many rounds they iterated,
- what output was created,
- whether they used the output,
- what they would do differently next time.
A vague answer like "I use ChatGPT" is not enough. We need a work example.
#3. AI work enablers
Four enablers show the quality of work with AI:
| Enabler | What we look for |
|---|---|
| AI-first reflex | Whether AI is the first step, or only a rescue option when someone gets stuck. |
| Experimentation | Whether they try AI on new task types and tools. |
| Iteration | Whether they stay in the conversation, refine instructions, and push the output beyond the first draft. |
| Verification | Whether they check professional-looking AI outputs, especially for important work. |
Iteration is an entry habit. When someone treats the first output as finished, it slows down the quality of most AI work.
#4. Modes of working with AI
We look at what someone uses AI for and how deeply:
| Mode | What it includes |
|---|---|
| Conversational | Chat, brainstorming, analysis, research, summaries, translations, dialogue with AI. |
| Creative | Writing, presentations, visuals, video, audio, content, graphics. |
| Coding / Builder | Websites, scripts, apps, prototypes, internal tools, dashboards. |
| Orchestration | Multi-step workflows, automations, assistants, pipelines, systems. |
Modes are not levels. Someone can be strong in Conversational mode and still be L1. Coding or Orchestration can support L2/L3, but they do not determine the level by themselves.
#5. Artifacts
An artifact is something that persists:
- a template,
- a prompt pattern,
- a personal assistant,
- a custom GPT,
- a workflow,
- a script,
- a dashboard,
- a mini-tool,
- a documented process,
- a pipeline.
For L2 and L3, an artifact is a critical signal. For L0, the absence of artifacts is not a problem because an Explorer is not expected to have them yet.
#6. Tool sophistication
We look at what tool stack someone uses:
- one company chatbot,
- a standard chat model,
- several AI tools,
- specialized tools,
- coding/IDE tools,
- no-code or automation tools,
- an advanced stack with data, vibecoding, and pipelines.
Tools are not the goal. What matters is whether they fit the work and lead to a real output.
#7. Before-after change
This signal separates L2 from L3.
We ask:
- How was the work done before AI?
- How is it done now?
- What changed in rhythm, quality, capacity, or decision-making?
- Is the new system used repeatedly?
- Does the change affect a whole agenda, or only one task?
Without a "before vs. now" change across a whole repeated agenda, the person remains L2 even if they have strong tools.
#Short Scoring Logic
SP Score is a numeric index of AI maturity on a 0-100 range. It is not the main result. The main result is the level.
The most accurate interpretation:
SP Score = position inside the level.Calculation in methodology v3:
SP Score = level base + sum of 6 modifiers#Level base
| Level | Base |
|---|---|
| L0 AI Explorer | 0 |
| L1 Operator | 30 |
| L2 Builder | 50 |
| L3 Transformer | 75 |
#Six modifiers
| Modifier | Range | What it measures |
|---|---|---|
| level_confidence | -3 to +3 | Strength of evidence for the level. |
| mode_quality | -3 to +8 | Depth of work: chat, code, workflow, pipeline, measurable impact. |
| tool_sophistication | -2 to +6 | Sophistication of the tool stack. |
| artifact_evidence | -2 to +6 | Impact and quality of artifacts, not their number. |
| behavioral_enablers | -2 to +3 | Observed behavior: AI-first, iteration, experimentation, verification. |
| self_report_alignment | -2 to +3 | Alignment or mismatch between questionnaire and Deep evidence. |
The questionnaire should not raise the level by itself. If someone declares high maturity in the questionnaire but the Deep interview does not confirm it, Deep evidence decides.
#Typical bands
| Level | Typical SP Score |
|---|---|
| L0 AI Explorer | 0-24 |
| L1 Operator | 25-45 |
| L2 Builder | 45-78 |
| L3 Transformer | 72-96 |
The bands overlap intentionally. An exceptional L1 can have a higher score than a weak L2. A strong L2 can be numerically close to L3. The level is therefore not determined mechanically from the score.
#Decision Rules Between Levels
#L0 -> L1
Someone is L1 if they use AI regularly, have concrete work use cases, can provide context, and iterate.
They remain L0 if they use AI only occasionally, without a stable use case and without productive iteration.
#L1 -> L2
Someone is L2 if they create persistent artifacts or repeatable workflows.
They remain L1 if they use AI regularly but start from scratch every time.
#L2 -> L3
Someone is L3 if they changed the operating model of a whole repeated agenda or work cycle.
They remain L2 if they have strong tools, assistants, or automations, but those still change only individual tasks or personal productivity.
#What the Levels Are Not
- L0 is not failure. It is a starting phase.
- L1 is not "just a beginner". A strong L1 can do very high-quality work with AI.
- L2 is not everyone who uses AI often. L2 requires reuse and artifacts.
- L3 is not a manager. L3 is a changed operating model.
- SP Score is not a grade for a person. It is a baseline for benchmarking, re-assessment, and tracking movement.
#Summary
SP Assessment can describe four AI levels through concrete behavioral signals. It looks at frequency of use, work use cases, enablers, modes of AI work, tools, artifacts, repeatability, and the "before vs. now" change.
The level is determined by the quality of evidence, not by the number alone. SP Score helps show the position inside the level and measure progress over time.