How Scoring Works
Every PromptScore is calculated from 5 weighted dimensions that together measure how effectively someone uses AI to accomplish a task. No black boxes — here is exactly what we measure and why.
Methodology based on research into AI-assisted productivity from Harvard Business School, Wharton, and enterprise prompting benchmarks.
Score Scale
Dimension Weights
Weights are calibrated so that efficient, high-quality first prompts score highest. This reflects real-world productivity — the best AI users get great results fast.
The Five Dimensions
Prompt Quality
How well-constructed are your prompts? We analyze clarity, specificity, structure, formatting instructions, constraints, and context-setting.
High Score Looks Like
- Clear, structured instructions with numbered steps
- Explicit constraints (word count, tone, what to avoid)
- Role/persona setting for the AI
- Audience awareness baked into the prompt
Low Score Looks Like
- Vague, one-line prompts with no structure
- No constraints or formatting guidance
- Missing context about who the output is for
- Copy-pasting the same prompt repeatedly
Anti-Gaming
We analyze linguistic patterns, not just length. A 500-word prompt full of filler scores lower than a precise 100-word prompt with clear structure.
Efficiency
How economically do you use your resources? Measured by attempts used vs. allowed and tokens consumed vs. budget.
High Score Looks Like
- Achieving the goal in 1-2 attempts
- Using less than 50% of the token budget
- Getting it right the first time
Low Score Looks Like
- Using all available attempts
- Burning through the entire token budget
- Repeating similar prompts without meaningful changes
Anti-Gaming
Using fewer attempts only helps if the output quality is good. A single bad prompt scores lower than two well-crafted iterations.
Speed
How quickly do you complete the task? Faster completion (with quality maintained) indicates confidence and fluency with AI tools.
High Score Looks Like
- Completing in 20-50% of the allotted time
- Quick, decisive prompting without long pauses
- Finishing with significant time remaining
Low Score Looks Like
- Using 90-100% of available time
- Long pauses suggesting uncertainty
- Running out the clock
Anti-Gaming
Completing in under 15% of the time triggers a review flag. Suspiciously fast completions are capped to prevent gaming.
Response Quality
How good is the AI output you elicited? We evaluate the final response against the task requirements, expected keywords, structure, and constraints.
High Score Looks Like
- Response covers all required elements
- Proper structure (headings, lists, sections as needed)
- Matches the expected tone and audience
- Contains relevant domain-specific content
Low Score Looks Like
- Response misses key requirements
- No structure or formatting
- Wrong tone for the audience
- Generic output that could apply to any task
Anti-Gaming
We evaluate the best (final) response, not just the first. This rewards smart iteration — improving your output across attempts.
Iteration Intelligence
When you iterate, do you improve? We track whether subsequent prompts build on AI feedback, introduce new requirements, and produce better results.
High Score Looks Like
- Each prompt meaningfully different from the last
- Referencing AI output ('change X to Y', 'instead of...')
- Introducing new vocabulary and requirements
- Responses improving in quality across attempts
Low Score Looks Like
- Repeating the same prompt verbatim
- Random changes without clear direction
- No reference to what the AI previously produced
- Response quality staying flat or declining
Anti-Gaming
Single-attempt completions receive a neutral score (60) for this dimension — you're not penalized for getting it right the first time.
Custom Scoring Criteria
Employers can add custom criteria on top of the standard 5 dimensions. When custom criteria are used, the final score blends standard dimensions (50%) with custom criteria (50%).
Keyword
Must-include and must-not-include terms in the output
Tone
Professional, casual, technical, or creative tone matching
Length
Word count within a specified min/max range
Rubric
Free-form criteria matched against response content
See it in action
Try a free demo assessment and get your PromptScore with a full breakdown.