Bayesian Application: Parameter Calibration and Automatic Optimization
The calibration principles of Signal Impact (f-coupling), EPR guardrails, and the 6-stage automatic parameter optimization (BAT) system based on data maturity.
In the previous part: Prior Ξ±/Ξ² Configuration Principles, we covered the configuration and learning roadmap for prior probabilities. This article answers the next question: How are signal sizes (Impact) determined, and how does the system optimize itself?
No matter how sophisticated the engine's formulas are, if the numbers fed into those formulas are wrong, the answers will be wrong. Just as a car engine β however precisely machined β will knock if the fuel's octane rating is off. This article is about the principles of putting the right fuel into EXAWin's Bayesian engine, and the mechanism by which the system adjusts its own fuel as data accumulates.
Chapter 1. A World Without Likelihood: The Pseudo-Count Approach
1.1 An Honest Starting Point
In standard Bayesian inference, updates are performed through the likelihood function . However, sales signals β "the customer response was positive", "the decision maker is leaning our way" β are not observations drawn from a mathematically defined probability distribution.
EXAWin resolves this limitation with the pseudo-count approach:
This declares: "This signal has evidential weight equivalent to SWV Γ Impact virtual success observations." Since the Beta distribution is valid for all real numbers , the pseudo-count need not be an integer.
This methodology is established in statistics under the name expert elicitation (O'Hagan et al., 2006). When an expert assesses "this evidence is equivalent to N direct observations," adding N as a pseudo-count to or is a justified methodology.
The key question is: How do we determine that N?Chapter 2. f-Coupling: Aligning Signal and Prior Scales
2.1 The Danger of Independent Scales
If the Prior (, strength ) and Signal Impact are set independently, a single signal can overwhelm the company's entire historical experience in one instant β an unrealistic scenario.
2.2 Evidence Fraction ()
The solution is to define Signal Impact as a fraction of Prior strength:
represents the ratio: "What percentage of the company's total prior experience does one occurrence of this signal represent as evidence?"
| Signal Type | Impact () | Interpretation | |
|---|---|---|---|
| Game Changer | 0.50 | 5.0 | A single piece of evidence half as strong as total prior experience |
| Strong Affirmation/Negation | 0.10 | 1.0 | 10% of prior experience β clear signal |
| Weak Affirmation/Negation | 0.04 | 0.4 | 4% of prior experience β subtle hint |
| No Signal | 0.01 | 0.1 | Virtually noise level |
2.3 Scale Invariance
The key property of this coupling: As long as is the same, the P(Win) trajectory is completely identical regardless of Prior strength . Setting , , cancels out, leaving a function of only. Whether or , the learning trajectory is identical β this is the raison d'Γͺtre of coupling.
2.4 EPR Guardrails
Evidence-Prior Ratio (EPR) is a diagnostic metric measuring how much the maximum evidence from a single meeting can affect the Prior:
EXAWin enforces EPR upper bounds at the code level:
| Signal Type | EPR Cap | Max Impact () | Design Rationale |
|---|---|---|---|
| Game Changer | 2.0 | 7.7 | Intentional override permitted, but cannot exceed 200% |
| Regular signals | 0.5 | 1.9 | Limited to within 50% of Prior |
When a user changes an Impact value in Signal Master, if the cap is exceeded, the save is rejected. This is a safety mechanism preventing inexperienced users from destabilizing the system by setting extreme parameters.
Chapter 3. Decision Impedance: From Probability to Action
3.1 Impedance Formula
- = Threshold β "The minimum bar a deal must clear at each stage"
- = Slope β Discrimination sensitivity near the threshold
3.2 Default Parameters by Stage
| Stage | SWV | Design Intent | ||
|---|---|---|---|---|
| Discovery | 1.00 | 0.35 | 5 | Low bar, generous exploration |
| Qualification | 1.69 | 0.40 | 7 | Basic verification |
| Solution-Fit | 2.10 | 0.45 | 7 | Confirming fit |
| Proposal | 2.39 | 0.50 | 12 | Cost commitment β sharp discrimination |
| Negotiation | 2.61 | 0.55 | 11 | Final gate, strictest |
increases as stages progress. "P(Win)=30% at Discovery is fine β keep exploring. But reaching Proposal at P(Win)=40% demands serious reconsideration."
determines discrimination sensitivity. At Proposal (), whether P(Win) is above or below causes a sharp Impedance split β a design philosophy of no tolerance for ambiguity at the cost commitment stage.
Chapter 4. Auto-Tuner: How the Engine Adjusts Its Own Fuel
4.1 Data Maturity Phase β "Is There Enough Track Record?"
The Auto-Tuner's first question is straightforward: "How many completed projects does this company have?"
Why completed projects? β An in-progress deal cannot verify "whether these settings were right or wrong." Like evaluating a coach's tactics at halftime. Only finished matches (Won/Lost) provide grounds for tactical adjustment.
Key: The lesser count of Won and Lost (min) determines the overall confidence tier. Even with 50 Won, if there are only 3 Lost, there's insufficient basis to learn Lost patterns.
| Phase | Grade | min(Won, Lost) | Scope | Learning Confidence |
|---|---|---|---|---|
| β Phase 1 | Impossible | < 5 | Analysis impossible | Insufficient data, binomial test power virtually 0 |
| π Phase 2 | Minimal | 5 ~ 9 | Display only (Apply locked) | Directional reference only, extreme overfitting risk |
| β Phase 3 | Moderate | 10 ~ 19 | Impact, T, k + MCMC | CLT begins operating, MCMC executable (convergence may be unstable) |
| π’ Phase 4 | Good | 20 ~ 49 | Full (Dampening, Silence included) + MCMC | Most parameters with high confidence, meaningful cross-validation |
| π΅ Phase 5 | Excellent | 50+ | Full + MCMC stable convergence | Grid Search convergence, maximum MCMC posterior confidence |
Phase-based dynamic adjustments also apply. Grid Search range expands from Β±20% at Phase 2 to Β±50% at Phase 5, while Signal Lift minimum appearances strengthen from 3 at Phase 2 to 10 at Phase 5. As data grows richer, the system explores wider ranges while demanding stricter evidence β an exquisite balance of humility and ambition.
For a complete technical anatomy of the Auto-Tuner, refer to the separate series:
- AT-01. Overall Architecture and Design Philosophy β 6 learning targets, Separation, simulation engine
- AT-02. Signal Lift β Laplace smoothing, dynamic minimum appearances
- AT-03. Grid Search β Impact/Dampening/Silence optimization
- AT-04. T/k Optimization β Youden J, k Grid Search
- AT-05. Statistical Validation β ROC AUC, K-fold, Prior recommendation
- AT-06. MCMC Posterior β Emcee Ensemble MCMC, HDI, RΜ convergence diagnostics
4.2 What the Auto-Tuner Does β "A Coach Analyzing Past Game Films"
The best analogy for understanding the Auto-Tuner is a sports coach's game analysis β all explanations from here use this analogy.
Your company has completed 25 projects. 10 won, 15 lost. The Auto-Tuner pulls up these 25 game records and checks one by one: "Were our team's tactics (parameters) optimal?"
β "What's our team's baseline strength?" β Prior Recommendation
Question: "What is our fundamental probability of winning?"
System setting: Current Prior = Ξ± 2, Ξ² 8 β baseline success rate 20%. But looking at the actual 25 project results, 10 were won for a success rate of 40%. The system says:
"Your team's actual success rate is closer to 40%. With the starting point set at 20%, the first 3β4 meetings will just keep repeating 'probability is still low.' Should we raise the starting point?"
However, this value is not automatically applied. The Prior is a strategic judgment by leadership about "our team's baseline capability." "Raising to 40% might breed complacency, and conservatively setting 25% is also a valid strategy" β that decision belongs to the manager (leadership), not the coach (system).
β‘ "Which signals were genuinely meaningful?" β Signal Lift
Question: "Among the signals we recorded, which were actually correlated with success?"
The system examines signals across 25 projects:
| Signal | Appeared in Won | Appeared in Lost | Lift | Interpretation |
|---|---|---|---|---|
| "Technical fit confirmed" | 8 of 10 Won | 3 of 15 Lost | 4.0 | β Deals with this signal succeeded 4Γ more often |
| "Budget secured" | 6 | 4 | 2.3 | Meaningful |
| "Competitor presence confirmed" | 7 | 10 | 1.1 | β No meaningful difference |
Lift > 1: "Deals where this signal appeared actually performed better." Lift β 1: "This signal had no correlation with success."
This information tells the sales team "which signals to pay more attention to."
β’ "Were the signal weights appropriate?" β Impact Calibration
Question: "We gave Strong Affirmation a score of 1.0 β would 0.7 have been more accurate?"
The system tries Impact values from 0.1 to 10.0 one by one through simulation:
- "Lowering Strong to 0.5 β Won and Lost P(Win) overlap too much (indistinguishable)."
- "Setting Strong to 1.0 β Won avg P(Win)=55%, Lost avg=30%. Clean separation."
- "Raising Strong to 2.0 β Won goes to 70%, but Lost also rises to 45% (overreaction)."
It finds and recommends the Impact value that most cleanly separates Won from Lost. The same principle a doctor uses when looking at blood test results: "Where should we place the normal/abnormal threshold for the most accurate diagnosis?"
β£ "Where do we draw the pass line?" β Threshold Optimization
Question: "T=0.35 at Discovery β was this too lenient or too strict?"
The system validates against historical data:
- T=0.25 lower: All Won pass, but most Lost also pass β "Passing score too low β failing deals pass too"
- T=0.35 current: 80% of Won pass, 70% of Lost fail β "Appropriate level"
- T=0.50 higher: Nearly all Lost fail, but half of Won also fail β "Passing score too high β good deals get filtered"
Same as setting an exam cutoff. "Set it too low and unqualified students pass; set it too high and decent students fail." The optimal cutoff simultaneously maximizes the proportion of actual top performers among those who pass and actual underperformers among those who fail.
β€ "How sharply should we discriminate?" β Slope Adjustment
Question: "How sensitively should the system react near the threshold (T)?"
Small : When P(Win) is near T, the system gently says "still ambiguous." Suitable for early stages β still exploring.
Large : If P(Win) drops even slightly below T, the system immediately flags "π΄ danger." Suitable for Proposal and Negotiation β "maybe..." is unacceptable once money is being spent.
The system performs Grid Search across k from 1 to 12 for each stage, finding the value that maximizes impedance separation between Won and Lost. The theoretical upper bound is β beyond this, the sigmoid becomes effectively a step function, degrading from "discrimination" to "binary chopping."
β₯ "Fine-Tuning" β Dampening and Silence Penalty (Phase 4+ Only)
These two parameters are adjusted only at Phase 4 (min 20+ projects). Meaningful fine-tuning requires sufficient data β moving too many parameters simultaneously with too little data leads to the overfitting trap.
Dampening: When 3 signals emerge simultaneously from one meeting, only the strongest signal is reflected at 100%, and the rest at only 25%. The Auto-Tuner validates via Grid Search whether this 25% is optimal β searching from 0% to 100% to find the attenuation rate that maximizes Won/Lost separation.
Silence Penalty: If the customer hasn't been contacted for 2+ weeks, Ξ² gradually increases and P(Win) declines. This is the mathematical implementation of the sales maxim "no news is bad news." The system validates whether the penalty magnitude is appropriate against historical data. Starting from the default 30% (Weak Negation Impact Γ 0.30), it Grid Searches the 0β100% range, raising the ratio if more projects became Lost after silence and lowering it if many won despite silence.
MCMC posterior estimation runs from Phase 3 (min 10+), with most stable convergence at Phase 5 (min 50+). Uncertainty intervals (HDI) are provided for all parameters, not just point estimates.
4.3 "What Changes if Applied?" β Impedance Impact Simulation
Before pressing the recommend button, the administrator's burning question: "If we change this, how do our deals' scores change?"
The Impedance Impact table answers this question:
| Stage | P(Win) | Current Impedance | With Recommendation | Change | Count |
|---|---|---|---|---|---|
| Discovery | 21.5% | 28.4% | 53.5% | β 25.1%p | 15 |
| Qualification | 31.7% | 30.7% | 60.3% | β 29.6%p | 15 |
| Proposal | 46.4% | 40.8% | 74.0% | β 33.4%p | 15 |
"Current Discovery average impedance is 28%, and applying the recommended T/k raises it to 54%." This means the recommended settings better distinguish Won from Lost deals. Larger changes indicate the current settings were further from optimal.
Chapter 5. Complete Signal System Map
5.1 Impact Types
| Order | Type | Direction | Impact | f | Interpretation |
|---|---|---|---|---|---|
| 1 | Game Changer | Ξ± increase | 5.0 | 0.50 | Decisive positive single evidence |
| 5 | Strong Affirmation | Ξ± increase | 1.0 | 0.10 | Clear positive signal |
| 10 | Weak Affirmation | Ξ± increase | 0.4 | 0.04 | Subtle positive hint |
| 15 | No Signal | Neutral | 0.1 | 0.01 | Noise level |
| 20 | Weak Negation | Ξ² increase | 0.4 | 0.04 | Subtle negative hint |
| 25 | Strong Negation | Ξ² increase | 1.0 | 0.10 | Clear negative signal |
| 30 | Game Changer (Negative) | Ξ² increase | 5.0 | 0.50 | Decisive negative single evidence |
Symmetric structure: Positive and Negative are symmetric on the same f-scale. Game Changer Negative corresponds to critical negative signals like "competitor confirmed" or "entire budget eliminated."
5.2 What f-Coupling Guarantees
- Scale invariance: Identical learning trajectories regardless of Prior strength
- EPR guardrails: Code-level blocking of excessive single-signal influence
- Symmetry: Positive and negative on identical scales β "Evidence of equal strength produces effects of equal magnitude"
- Interpretability: Every value explainable as "X% of the Prior"
References
- O'Hagan, A., Buck, C.E., Daneshkhah, A., et al. (2006). Uncertain Judgements: Eliciting Experts' Probabilities. Wiley. β Expert elicitation and pseudo-count methodology.
- Ibrahim, J.G. & Chen, M.H. (2000). "Power prior distributions for regression models." Statistical Science, 15(1), 46-60. β Evidence Discounting.
- Youden, W.J. (1950). "Index for rating diagnostic tests." Cancer, 3(1), 32-35. β Statistical basis for threshold optimization.
- Cooper, R.G. (2008). "Perspective: The Stage-Gate Idea-to-Launch Process." JPIM, 25(3). β Stage-Gate decision process.