Rubrics

Rubrics are defined with assertions entries and support binary checklist grading and score-range analytic grading.

Basic Usage

The simplest form — list plain strings in assertions and each one becomes a required criterion:

tests:
  - id: quicksort-explain
    criteria: Explain how quicksort works
    input: Explain quicksort algorithm
    assertions:
      - Mentions divide-and-conquer approach
      - Explains partition step
      - States time complexity

All strings are collected into a single rubrics evaluator automatically.

Full form for advanced options

Use type: rubrics explicitly when you need weights, required flags, or score ranges:

tests:
  - id: quicksort-explain
    criteria: Explain how quicksort works
    input: Explain quicksort algorithm
    assertions:
      - type: rubrics
        criteria:
          - Mentions divide-and-conquer approach
          - Explains partition step
          - States time complexity

Checklist Mode

For fine-grained control, use rubric objects with weights and requirements:

assertions:
  - type: rubrics
    criteria:
      - id: core-concept
        outcome: Explains divide-and-conquer
        weight: 2.0
        required: true
      - id: partition
        outcome: Describes partition step
        weight: 1.5
      - id: complexity
        outcome: States O(n log n) average time
        weight: 1.0

Rubric Object Fields

Field	Default	Description
`id`	Auto-generated	Unique identifier for the criterion
`outcome`	—	Description of what to check
`weight`	`1.0`	Relative importance for scoring
`required`	`false`	If true, failing this criterion fails the entire eval
`required_min_score`	—	Minimum score threshold (score-range mode)
`score_ranges`	—	Score range definitions (analytic mode)

Score-Range Mode (Analytic)

For quality gradients instead of binary pass/fail, use score ranges:

assertions:
  - type: rubrics
    criteria:
      - id: accuracy
        outcome: Provides correct answer
        weight: 2.0
        score_ranges:
          0: Completely wrong
          3: Partially correct with major errors
          5: Mostly correct with minor issues
          7: Correct with minor omissions
          10: Perfectly accurate and complete

Each criterion is scored 0–10 by the LLM grader with granular feedback.

Scoring

Checklist Mode

score = sum(satisfied_weights) / sum(total_weights)

Score-Range Mode

score = sum(criterion_score / 10 * weight) / sum(total_weights)

Verdicts

Verdict	Score
`pass`	≥ 0.8
`borderline`	≥ 0.6
`fail`	< 0.6

Authoring Rubrics

Write rubric criteria directly in assertions. If you want help choosing between plain assertions, deterministic evaluators, and rubric or LLM-based grading, use the agentv-eval-writer skill. Keep the evaluator choice driven by the criteria rather than one fixed recipe.

Combining with Other Evaluators

Rubrics work alongside code and LLM graders:

tests:
  - id: code-quality
    criteria: Generates correct, clean Python code
    input: Write a fibonacci function
    assertions:
      - type: rubrics
        criteria:
          - Returns correct values for n=0,1,2,10
          - Uses meaningful variable names
          - Includes docstring
      - name: syntax_check
        type: code-grader
        command: [./validators/check_python.py]