CatalystRL

Evolution AI

Controlled experiments on skill configurations with statistical winner selection and zero-downtime deployment.

How Evolution Works

1

Create Variants

Define alternative configurations for a skill (different prompts, thresholds, or scripts).

2

Run Experiment

Traffic splits between variants. Metrics collected for each variant.

3

Statistical Analysis

When sample size is sufficient, determine winner with confidence.

4

Deploy Winner

Winning variant becomes the new default. Losers archived for reference.

Metrics Tracked

MetricDescriptionWeight
Success RatePercentage of successful executions40%
Execution TimeAverage time to complete20%
Token UsageAverage tokens consumed20%
User SatisfactionImplicit signals (retry rate, edits)20%

Evolution Markers

Skills emit evolution markers to indicate they're ready for experimentation:

"evolution_config": {
  "enabled": true,
  "markers": {
    "prompt_variations": true,
    "threshold_tuning": true,
    "script_alternatives": false
  },
  "min_sample_size": 50,
  "confidence_threshold": 0.95
}

Rollback Protection

Evolution AI includes automatic rollback when a new variant performs worse than expected:

Auto-Rollback Triggers

  • • Success rate drops >20%
  • • Critical failures detected
  • • Trust score degradation
  • • Manual rollback request

Rollback Process

  • • Previous version restored immediately
  • • Metrics preserved for analysis
  • • Bounty created for investigation
  • • Team notified