Auto-Tuner Anatomy โ : Engine Overview and Design Philosophy
A comprehensive anatomy of the EXAWin Auto-Tuner architecture. Six learning targets, five-stage data maturity, and the design principle of "not fitting, but making accurate."
This document series dissects the internals of EXAWin's Auto-Tuner engine. We explain the meaning behind every line of code, the rationale for each statistical technique, and why each parameter must remain within its specific range โ all in a lecture-style narrative.
By the time you finish this series, you will be able to explain why every Auto-Tuner recommendation is that specific value.
1. What is the Auto-Tuner?
1.1 One-Line Definition
Auto-Tuner = A system that "makes accurate" the Bayesian engine's parameters based on historical project outcomes (Won/Lost)
The key word here is "accurate." It does not raise P(Win) โ rather, it adjusts parameters so that Won deals have high P(Win) and Lost deals have low P(Win) โ aligning predictions with reality.
1.2 The Car Analogy
The engine (Bayesian formula) itself doesn't change. What the Auto-Tuner does is adjust the fuel mixture:
| Engine Component | Car Analogy | EXAWin Equivalent |
|---|---|---|
| Ignition threshold | Ignition timing | T โ Stage threshold |
| Fuel injection | Injector open time | Impact โ Signal weights |
| Acceleration response | Throttle sensitivity | k โ Slope (Velocity) |
| Exhaust treatment | Catalytic converter efficiency | Dampening โ Duplicate signal attenuation |
| Fuel leak penalty | Leak alarm | Silence Penalty โ Activity gap penalty |
1.3 Five Design Principles
โ Not fitting, but making accurate
โก Preserve the impedance dual-structure
โข Provide recommendation + rationale together
โฃ Human approval mandatory โ no automatic application
โค Stored data immutable โ simulations are pure computation
Principle โค is particularly important. The Auto-Tuner never modifies the database. When the analysis button is pressed, simulations run in memory, and only when the administrator clicks "Apply" does the database get updated.
2. Six Learning Targets
The Auto-Tuner analyzes and recommends exactly six parameters.
โ Signal Lift โ Discriminative Power Analysis
"When this signal appears, does the probability of winning actually increase?"
Calculates the Lift = (appearance rate in Won) / (appearance rate in Lost) for each signal. Lift > 1 indicates a positive indicator; Lift < 1 indicates a negative indicator. Validates whether the current classification (Positive/Negative) matches actual discriminative power.
๐ Details: โก Signal Lift Anatomy
โก Impact Score โ Optimal Weights
"Is 5.0 really the optimal value for Game Changer?"
Varies each ImpactType's score within a ยฑ range to find the value that maximizes Separation (Won avg P(Win) โ Lost avg P(Win)). Search range expands by Phase.
๐ Details: โข Grid Search Engine Anatomy
โข T โ Threshold Optimization
"Where should each stage's threshold be placed to best distinguish Won from Lost?"
Finds the T that maximizes Youden J statistic = Sensitivity + Specificity โ 1. If J < 0.20, it means "the data cannot distinguish Won/Lost at this stage," so no recommendation is made.
๐ Details: โฃ Threshold ยท k Anatomy
โฃ k โ Slope (Velocity)
"How sharply should P(Win) react when crossing T?"
Previously used an empirical formula 1 + ln(ratio) based on evidence ratio (ฮฑ+ฮฒ), now switched to Grid Search-based optimization that directly maximizes separation. Upper bound is 12 per the theoretical reference.
๐ Details: โฃ Threshold ยท k Anatomy
โค Dampening โ Duplicate Signal Attenuation
"When three signals appear simultaneously in the same meeting, should they all receive equal weight?"
Compound Score = MAX(signals) + remaining ร dampening. If dampening is 0, only the strongest signal counts; if 1, all signals are weighted equally. The current default of 0.25 is optimized via Grid Search.
โฅ Silence Penalty โ Activity Gap Penalty
"How much penalty should accumulate when the customer hasn't been contacted for an extended period?"
Optimizes the penalty ratio added to ฮฒ via Grid Search.
3. Five-Stage Data Maturity (Phase)
The Auto-Tuner prevents overfitting when data is scarce by assigning a 5-stage confidence level based on the lesser count of Won/Lost (min).
| Phase | Condition | Emoji | Adjustment Scope | Confidence |
|---|---|---|---|---|
| 1 | min < 5 | โ | Analysis impossible | none |
| 2 | min 5โ9 | ๐ | Direction reference only, apply locked | low |
| 3 | min 10โ19 | ๐ก | Impact, T, k | moderate |
| 4 | min 20โ49 | ๐ข | Impact, T, k, Dampening, Silence | high |
| 5 | min โฅ 50 | ๐ต | All + MCMC posterior | stable |
Why min?
If there are 100 Won projects but only 3 Lost, you cannot claim "this parameter distinguishes Lost well" based on just 3 cases. Statistical significance is always limited by the smaller sample.
What Changes by Phase
As the Phase increases, the Auto-Tuner's behavior progressively expands:
| Behavior | Phase 2 | Phase 3 | Phase 4 | Phase 5 |
|---|---|---|---|---|
| Signal Lift min appearances | 3 | 5 | 8 | 10 |
| Grid Search range | ยฑ20% | ยฑ30% | ยฑ40% | ยฑ50% |
| T/k adjustment | โ | โ | โ | โ |
| Dampening/Silence adjustment | โ | โ | โ | โ |
| MCMC posterior | โ | โ | โ | โ |
| Prior ฮฑ/ฮฒ recommendation | Manual | MoM | MLE | MLE |
4. Core Metric: Separation
The Auto-Tuner's objective function is Separation.
- Separation > 0.40: Excellent (A) โ Parameters closely reflect reality
- 0.25 โ 0.40: Good (B) โ Room for improvement
- 0.10 โ 0.25: Needs Improvement (C)
- < 0.10: Urgent (D) โ Parameter adjustment required
Limitations of Separation and AUC
Separation only measures the difference in means. It does not account for distribution overlap.
Example:
- Scenario A: Won avg 0.70, Lost avg 0.30 โ Separation 0.40 โ Excellent!
- Scenario B: Won range [0.20, 0.90], Lost range [0.10, 0.80] โ Same average difference but heavy overlap
To compensate, ROC AUC is introduced. AUC represents "the probability that a randomly selected Won project has a higher P(Win) than a randomly selected Lost project." Overlap reduces AUC.
๐ Details: โค Statistical Validation Anatomy
5. Simulation Engine
The core of the Auto-Tuner is memory-based simulation. Instead of using actual BayesianUpdate records stored in the database, it recalculates from scratch using raw data (activities, signals, Prior).
Why Recalculate?
To try different parameters, you need to calculate "what would P(Win) have been if Impact were 3.0?" This cannot be determined from stored historical results. Only by simulating from scratch with hypothetical parameters can you answer this.
One simulation cycle:
ฮฑ, ฮฒ โ Prior initial values
for each activity (chronological):
โ Calculate Compound Score from activity's signals
โ ฮฑ += SWV ร positive Compound
โ ฮฒ += SWV ร negative Compound
โ ฮฒ += silence penalty (for activity gaps)
P(Win) = ฮฑ / (ฮฑ + ฮฒ)
Repeating this simulation for all Won/Lost projects reveals the separation for those parameters.
DB Queries = 0
During simulation, not a single DB query is executed. All data is preloaded into memory during initialization, and only pure computation follows. This is the implementation of Principle โค.
6. Document Series Guide
| Part | Title | Content |
|---|---|---|
| โ [Current] | Engine Overview and Design Philosophy | Overall structure, 6 learning targets, Phase, Separation |
| โก | Signal Lift Anatomy | Lift calculation, Laplace smoothing, classification validation |
| โข | Grid Search Engine Anatomy | Impact optimization, Phase-based ranges, Dampening, Silence |
| โฃ | Threshold ยท k Anatomy | Youden J, T optimization, k Grid Search |
| โค | Statistical Validation Anatomy | AUC, K-fold CV, Prior recommendation |
| โฅ | MCMC Posterior Anatomy | Emcee Ensemble MCMC, model definition, HDI, convergence diagnostics |