Evolution AI

Controlled experiments on skill configurations with statistical winner selection and zero-downtime deployment.

How Evolution Works

Create Variants

Define alternative configurations for a skill (different prompts, thresholds, or scripts).

Run Experiment

Traffic splits between variants. Metrics collected for each variant.

Statistical Analysis

When sample size is sufficient, determine winner with confidence.

Deploy Winner

Winning variant becomes the new default. Losers archived for reference.

Metrics Tracked

Metric	Description	Weight
Success Rate	Percentage of successful executions	40%
Execution Time	Average time to complete	20%
Token Usage	Average tokens consumed	20%
User Satisfaction	Implicit signals (retry rate, edits)	20%

Evolution Markers

Skills emit evolution markers to indicate they're ready for experimentation:

"evolution_config": {
  "enabled": true,
  "markers": {
    "prompt_variations": true,
    "threshold_tuning": true,
    "script_alternatives": false
  },
  "min_sample_size": 50,
  "confidence_threshold": 0.95
}

Rollback Protection

Evolution AI includes automatic rollback when a new variant performs worse than expected:

Auto-Rollback Triggers

• Success rate drops >20%
• Critical failures detected
• Trust score degradation
• Manual rollback request

Rollback Process

• Previous version restored immediately
• Metrics preserved for analysis
• Bounty created for investigation
• Team notified