How Scoring Works

Every PromptScore is calculated from 5 weighted dimensions that together measure how effectively someone uses AI to accomplish a task. No black boxes — here is exactly what we measure and why.

Methodology based on research into AI-assisted productivity from Harvard Business School, Wharton, and enterprise prompting benchmarks.

Score Scale

S
95-100
Exceptional
A
80-94
Strong Hire
B
65-79
Hire
C
50-64
Consider
D
35-49
Below Avg
F
0-34
Not Ready

Dimension Weights

Prompt Quality
25%
Efficiency
25%
Speed
15%
Response Quality
20%
Iteration Intelligence
15%

Weights are calibrated so that efficient, high-quality first prompts score highest. This reflects real-world productivity — the best AI users get great results fast.

The Five Dimensions

25%

Prompt Quality

How well-constructed are your prompts? We analyze clarity, specificity, structure, formatting instructions, constraints, and context-setting.

High Score Looks Like

  • Clear, structured instructions with numbered steps
  • Explicit constraints (word count, tone, what to avoid)
  • Role/persona setting for the AI
  • Audience awareness baked into the prompt

Low Score Looks Like

  • Vague, one-line prompts with no structure
  • No constraints or formatting guidance
  • Missing context about who the output is for
  • Copy-pasting the same prompt repeatedly

Anti-Gaming

We analyze linguistic patterns, not just length. A 500-word prompt full of filler scores lower than a precise 100-word prompt with clear structure.

25%

Efficiency

How economically do you use your resources? Measured by attempts used vs. allowed and tokens consumed vs. budget.

High Score Looks Like

  • Achieving the goal in 1-2 attempts
  • Using less than 50% of the token budget
  • Getting it right the first time

Low Score Looks Like

  • Using all available attempts
  • Burning through the entire token budget
  • Repeating similar prompts without meaningful changes

Anti-Gaming

Using fewer attempts only helps if the output quality is good. A single bad prompt scores lower than two well-crafted iterations.

15%

Speed

How quickly do you complete the task? Faster completion (with quality maintained) indicates confidence and fluency with AI tools.

High Score Looks Like

  • Completing in 20-50% of the allotted time
  • Quick, decisive prompting without long pauses
  • Finishing with significant time remaining

Low Score Looks Like

  • Using 90-100% of available time
  • Long pauses suggesting uncertainty
  • Running out the clock

Anti-Gaming

Completing in under 15% of the time triggers a review flag. Suspiciously fast completions are capped to prevent gaming.

20%

Response Quality

How good is the AI output you elicited? We evaluate the final response against the task requirements, expected keywords, structure, and constraints.

High Score Looks Like

  • Response covers all required elements
  • Proper structure (headings, lists, sections as needed)
  • Matches the expected tone and audience
  • Contains relevant domain-specific content

Low Score Looks Like

  • Response misses key requirements
  • No structure or formatting
  • Wrong tone for the audience
  • Generic output that could apply to any task

Anti-Gaming

We evaluate the best (final) response, not just the first. This rewards smart iteration — improving your output across attempts.

15%

Iteration Intelligence

When you iterate, do you improve? We track whether subsequent prompts build on AI feedback, introduce new requirements, and produce better results.

High Score Looks Like

  • Each prompt meaningfully different from the last
  • Referencing AI output ('change X to Y', 'instead of...')
  • Introducing new vocabulary and requirements
  • Responses improving in quality across attempts

Low Score Looks Like

  • Repeating the same prompt verbatim
  • Random changes without clear direction
  • No reference to what the AI previously produced
  • Response quality staying flat or declining

Anti-Gaming

Single-attempt completions receive a neutral score (60) for this dimension — you're not penalized for getting it right the first time.

Custom Scoring Criteria

Employers can add custom criteria on top of the standard 5 dimensions. When custom criteria are used, the final score blends standard dimensions (50%) with custom criteria (50%).

Keyword

Must-include and must-not-include terms in the output

Tone

Professional, casual, technical, or creative tone matching

Length

Word count within a specified min/max range

Rubric

Free-form criteria matched against response content

See it in action

Try a free demo assessment and get your PromptScore with a full breakdown.