Rubrics
Rubrics are defined with assertions entries and support binary checklist grading and score-range analytic grading.
Basic Usage
Section titled “Basic Usage”The simplest form — list plain strings in assertions and each one becomes a required criterion:
tests: - id: quicksort-explain criteria: Explain how quicksort works input: Explain quicksort algorithm assertions: - Mentions divide-and-conquer approach - Explains partition step - States time complexityAll strings are collected into a single rubrics evaluator automatically.
Full form for advanced options
Section titled “Full form for advanced options”Use type: rubrics explicitly when you need weights, required flags, or score ranges:
tests: - id: quicksort-explain criteria: Explain how quicksort works input: Explain quicksort algorithm assertions: - type: rubrics criteria: - Mentions divide-and-conquer approach - Explains partition step - States time complexityChecklist Mode
Section titled “Checklist Mode”For fine-grained control, use rubric objects with weights and requirements:
assertions: - type: rubrics criteria: - id: core-concept outcome: Explains divide-and-conquer weight: 2.0 required: true - id: partition outcome: Describes partition step weight: 1.5 - id: complexity outcome: States O(n log n) average time weight: 1.0Rubric Object Fields
Section titled “Rubric Object Fields”| Field | Default | Description |
|---|---|---|
id | Auto-generated | Unique identifier for the criterion |
outcome | — | Description of what to check |
weight | 1.0 | Relative importance for scoring |
required | false | If true, failing this criterion fails the entire eval |
required_min_score | — | Minimum score threshold (score-range mode) |
score_ranges | — | Score range definitions (analytic mode) |
Score-Range Mode (Analytic)
Section titled “Score-Range Mode (Analytic)”For quality gradients instead of binary pass/fail, use score ranges:
assertions: - type: rubrics criteria: - id: accuracy outcome: Provides correct answer weight: 2.0 score_ranges: 0: Completely wrong 3: Partially correct with major errors 5: Mostly correct with minor issues 7: Correct with minor omissions 10: Perfectly accurate and completeEach criterion is scored 0–10 by the LLM grader with granular feedback.
Scoring
Section titled “Scoring”Checklist Mode
Section titled “Checklist Mode”score = sum(satisfied_weights) / sum(total_weights)Score-Range Mode
Section titled “Score-Range Mode”score = sum(criterion_score / 10 * weight) / sum(total_weights)Verdicts
Section titled “Verdicts”| Verdict | Score |
|---|---|
pass | ≥ 0.8 |
borderline | ≥ 0.6 |
fail | < 0.6 |
Authoring Rubrics
Section titled “Authoring Rubrics”Write rubric criteria directly in assertions. If you want help choosing between plain assertions, deterministic evaluators, and rubric or LLM-based grading, use the agentv-eval-writer skill. Keep the evaluator choice driven by the criteria rather than one fixed recipe.
Combining with Other Evaluators
Section titled “Combining with Other Evaluators”Rubrics work alongside code and LLM graders:
tests: - id: code-quality criteria: Generates correct, clean Python code input: Write a fibonacci function assertions: - type: rubrics criteria: - Returns correct values for n=0,1,2,10 - Uses meaningful variable names - Includes docstring - name: syntax_check type: code-grader command: [./validators/check_python.py]