Evolution AI
Controlled experiments on skill configurations with statistical winner selection and zero-downtime deployment.
How Evolution Works
1
Create Variants
Define alternative configurations for a skill (different prompts, thresholds, or scripts).
2
Run Experiment
Traffic splits between variants. Metrics collected for each variant.
3
Statistical Analysis
When sample size is sufficient, determine winner with confidence.
4
Deploy Winner
Winning variant becomes the new default. Losers archived for reference.
Metrics Tracked
| Metric | Description | Weight |
|---|---|---|
| Success Rate | Percentage of successful executions | 40% |
| Execution Time | Average time to complete | 20% |
| Token Usage | Average tokens consumed | 20% |
| User Satisfaction | Implicit signals (retry rate, edits) | 20% |
Evolution Markers
Skills emit evolution markers to indicate they're ready for experimentation:
"evolution_config": {
"enabled": true,
"markers": {
"prompt_variations": true,
"threshold_tuning": true,
"script_alternatives": false
},
"min_sample_size": 50,
"confidence_threshold": 0.95
}Rollback Protection
Evolution AI includes automatic rollback when a new variant performs worse than expected:
Auto-Rollback Triggers
- • Success rate drops >20%
- • Critical failures detected
- • Trust score degradation
- • Manual rollback request
Rollback Process
- • Previous version restored immediately
- • Metrics preserved for analysis
- • Bounty created for investigation
- • Team notified