DOCUMENTATION

Bayesian Application: Parameter Calibration and Automatic Optimization

The calibration principles of Signal Impact (f-coupling), EPR guardrails, and the 6-stage automatic parameter optimization (BAT) system based on data maturity.

In the previous part: Prior α/β Configuration Principles, we covered the configuration and learning roadmap for prior probabilities. This article answers the next question: How are signal sizes (Impact) determined, and how does the system optimize itself?

No matter how sophisticated the engine's formulas are, if the numbers fed into those formulas are wrong, the answers will be wrong. Just as a car engine — however precisely machined — will knock if the fuel's octane rating is off. This article is about the principles of putting the right fuel into EXAWin's Bayesian engine, and the mechanism by which the system adjusts its own fuel as data accumulates.

Chapter 1. A World Without Likelihood: The Pseudo-Count Approach

1.1 An Honest Starting Point

In standard Bayesian inference, updates are performed through the likelihood function $P(D \mid \theta)$ . However, sales signals — "the customer response was positive", "the decision maker is leaning our way" — are not observations drawn from a mathematically defined probability distribution.

EXAWin resolves this limitation with the pseudo-count approach:

\alpha_{\text{new}} = \alpha_{\text{prev}} + \text{SWV} \times \text{Impact}

This declares: "This signal has evidential weight equivalent to SWV × Impact virtual success observations." Since the Beta distribution is valid for all real numbers $\alpha, \beta > 0$ , the pseudo-count need not be an integer.

This methodology is established in statistics under the name expert elicitation (O'Hagan et al., 2006). When an expert assesses "this evidence is equivalent to N direct observations," adding N as a pseudo-count to $\alpha$ or $\beta$ is a justified methodology.

The key question is: How do we determine that N?

Chapter 2. f-Coupling: Aligning Signal and Prior Scales

2.1 The Danger of Independent Scales

If the Prior ( $\alpha_0=2, \beta_0=8$ , strength $S=10$ ) and Signal Impact are set independently, a single signal can overwhelm the company's entire historical experience in one instant — an unrealistic scenario.

2.2 Evidence Fraction ( $f$ )

The solution is to define Signal Impact as a fraction of Prior strength:

\text{Impact}_i = f_i \times S \quad \text{where } S = \alpha_0 + \beta_0

$f_i$ represents the ratio: "What percentage of the company's total prior experience does one occurrence of this signal represent as evidence?"

Signal Type	$f$	Impact ( $S=10$ )	Interpretation
Game Changer	0.50	5.0	A single piece of evidence half as strong as total prior experience
Strong Affirmation/Negation	0.10	1.0	10% of prior experience — clear signal
Weak Affirmation/Negation	0.04	0.4	4% of prior experience — subtle hint
No Signal	0.01	0.1	Virtually noise level

2.3 Scale Invariance

The key property of this coupling: As long as $f$ is the same, the P(Win) trajectory is completely identical regardless of Prior strength $S$ . Setting $\alpha_0 = rS$ , $\beta_0 = (1-r)S$ , $S$ cancels out, leaving a function of $f$ only. Whether $S=10$ or $S=100$ , the learning trajectory is identical — this is the raison d'être of coupling.

2.4 EPR Guardrails

Evidence-Prior Ratio (EPR) is a diagnostic metric measuring how much the maximum evidence from a single meeting can affect the Prior:

\text{EPR} = \frac{\text{SWV}_{\max} \times \text{Impact}}{S}

EXAWin enforces EPR upper bounds at the code level:

Signal Type	EPR Cap	Max Impact ( $S=10$ )	Design Rationale
Game Changer	2.0	7.7	Intentional override permitted, but cannot exceed 200%
Regular signals	0.5	1.9	Limited to within 50% of Prior

When a user changes an Impact value in Signal Master, if the cap is exceeded, the save is rejected. This is a safety mechanism preventing inexperienced users from destabilizing the system by setting extreme parameters.

Chapter 3. Decision Impedance: From Probability to Action

3.1 Impedance Formula

I = \frac{1}{1 + \exp\bigl(-k \cdot (P(\text{Win}) - T)\bigr)}

$T$ = Threshold — "The minimum bar a deal must clear at each stage"
$k$ = Slope — Discrimination sensitivity near the threshold

3.2 Default Parameters by Stage

Stage	SWV	$T$	$k$	Design Intent
Discovery	1.00	0.35	5	Low bar, generous exploration
Qualification	1.69	0.40	7	Basic verification
Solution-Fit	2.10	0.45	7	Confirming fit
Proposal	2.39	0.50	12	Cost commitment — sharp discrimination
Negotiation	2.61	0.55	11	Final gate, strictest

$T$ increases as stages progress. "P(Win)=30% at Discovery is fine — keep exploring. But reaching Proposal at P(Win)=40% demands serious reconsideration."

$k$ determines discrimination sensitivity. At Proposal ( $k=12$ ), whether P(Win) is above or below $T=0.50$ causes a sharp Impedance split — a design philosophy of no tolerance for ambiguity at the cost commitment stage.

Chapter 4. Auto-Tuner: How the Engine Adjusts Its Own Fuel

4.1 Data Maturity Phase — "Is There Enough Track Record?"

The Auto-Tuner's first question is straightforward: "How many completed projects does this company have?"

Why completed projects? — An in-progress deal cannot verify "whether these settings were right or wrong." Like evaluating a coach's tactics at halftime. Only finished matches (Won/Lost) provide grounds for tactical adjustment.

Key: The lesser count of Won and Lost (min) determines the overall confidence tier. Even with 50 Won, if there are only 3 Lost, there's insufficient basis to learn Lost patterns.

Phase	Grade	min(Won, Lost)	Scope	Learning Confidence
❌ Phase 1	Impossible	< 5	Analysis impossible	Insufficient data, binomial test power virtually 0
🟠 Phase 2	Minimal	5 ~ 9	Display only (Apply locked)	Directional reference only, extreme overfitting risk
✅ Phase 3	Moderate	10 ~ 19	Impact, T, k + MCMC	CLT begins operating, MCMC executable (convergence may be unstable)
🟢 Phase 4	Good	20 ~ 49	Full (Dampening, Silence included) + MCMC	Most parameters with high confidence, meaningful cross-validation
🔵 Phase 5	Excellent	50+	Full + MCMC stable convergence	Grid Search convergence, maximum MCMC posterior confidence

Phase-based dynamic adjustments also apply. Grid Search range expands from ±20% at Phase 2 to ±50% at Phase 5, while Signal Lift minimum appearances strengthen from 3 at Phase 2 to 10 at Phase 5. As data grows richer, the system explores wider ranges while demanding stricter evidence — an exquisite balance of humility and ambition.

For a complete technical anatomy of the Auto-Tuner, refer to the separate series:

AT-01. Overall Architecture and Design Philosophy — 6 learning targets, Separation, simulation engine
AT-02. Signal Lift — Laplace smoothing, dynamic minimum appearances
AT-03. Grid Search — Impact/Dampening/Silence optimization
AT-04. T/k Optimization — Youden J, k Grid Search
AT-05. Statistical Validation — ROC AUC, K-fold, Prior recommendation
AT-06. MCMC Posterior — Emcee Ensemble MCMC, HDI, R̂ convergence diagnostics

4.2 What the Auto-Tuner Does — "A Coach Analyzing Past Game Films"

The best analogy for understanding the Auto-Tuner is a sports coach's game analysis — all explanations from here use this analogy.

Your company has completed 25 projects. 10 won, 15 lost. The Auto-Tuner pulls up these 25 game records and checks one by one: "Were our team's tactics (parameters) optimal?"

① "What's our team's baseline strength?" — Prior Recommendation

Question: "What is our fundamental probability of winning?"

System setting: Current Prior = α 2, β 8 → baseline success rate 20%. But looking at the actual 25 project results, 10 were won for a success rate of 40%. The system says:

"Your team's actual success rate is closer to 40%. With the starting point set at 20%, the first 3–4 meetings will just keep repeating 'probability is still low.' Should we raise the starting point?"

However, this value is not automatically applied. The Prior is a strategic judgment by leadership about "our team's baseline capability." "Raising to 40% might breed complacency, and conservatively setting 25% is also a valid strategy" — that decision belongs to the manager (leadership), not the coach (system).

② "Which signals were genuinely meaningful?" — Signal Lift

Question: "Among the signals we recorded, which were actually correlated with success?"

The system examines signals across 25 projects:

Signal	Appeared in Won	Appeared in Lost	Lift	Interpretation
"Technical fit confirmed"	8 of 10 Won	3 of 15 Lost	4.0	✅ Deals with this signal succeeded 4× more often
"Budget secured"	6	4	2.3	Meaningful
"Competitor presence confirmed"	7	10	1.1	❌ No meaningful difference

Lift > 1: "Deals where this signal appeared actually performed better." Lift ≈ 1: "This signal had no correlation with success."

This information tells the sales team "which signals to pay more attention to."

③ "Were the signal weights appropriate?" — Impact Calibration

Question: "We gave Strong Affirmation a score of 1.0 — would 0.7 have been more accurate?"

The system tries Impact values from 0.1 to 10.0 one by one through simulation:

"Lowering Strong to 0.5 — Won and Lost P(Win) overlap too much (indistinguishable)."
"Setting Strong to 1.0 — Won avg P(Win)=55%, Lost avg=30%. Clean separation."
"Raising Strong to 2.0 — Won goes to 70%, but Lost also rises to 45% (overreaction)."

It finds and recommends the Impact value that most cleanly separates Won from Lost. The same principle a doctor uses when looking at blood test results: "Where should we place the normal/abnormal threshold for the most accurate diagnosis?"

④ "Where do we draw the pass line?" — Threshold $T$ Optimization

Question: "T=0.35 at Discovery — was this too lenient or too strict?"

The system validates against historical data:

T=0.25 lower: All Won pass, but most Lost also pass → "Passing score too low — failing deals pass too"
T=0.35 current: 80% of Won pass, 70% of Lost fail → "Appropriate level"
T=0.50 higher: Nearly all Lost fail, but half of Won also fail → "Passing score too high — good deals get filtered"

Same as setting an exam cutoff. "Set it too low and unqualified students pass; set it too high and decent students fail." The optimal cutoff simultaneously maximizes the proportion of actual top performers among those who pass and actual underperformers among those who fail.

⑤ "How sharply should we discriminate?" — Slope $k$ Adjustment

Question: "How sensitively should the system react near the threshold (T)?"

Small $k$ : When P(Win) is near T, the system gently says "still ambiguous." Suitable for early stages — still exploring.

Large $k$ : If P(Win) drops even slightly below T, the system immediately flags "🔴 danger." Suitable for Proposal and Negotiation — "maybe..." is unacceptable once money is being spent.

The system performs Grid Search across k from 1 to 12 for each stage, finding the value that maximizes impedance separation between Won and Lost. The theoretical upper bound is $k = 12$ — beyond this, the sigmoid becomes effectively a step function, degrading from "discrimination" to "binary chopping."

⑥ "Fine-Tuning" — Dampening and Silence Penalty (Phase 4+ Only)

These two parameters are adjusted only at Phase 4 (min 20+ projects). Meaningful fine-tuning requires sufficient data — moving too many parameters simultaneously with too little data leads to the overfitting trap.

Dampening: When 3 signals emerge simultaneously from one meeting, only the strongest signal is reflected at 100%, and the rest at only 25%. The Auto-Tuner validates via Grid Search whether this 25% is optimal — searching from 0% to 100% to find the attenuation rate that maximizes Won/Lost separation.

Silence Penalty: If the customer hasn't been contacted for 2+ weeks, β gradually increases and P(Win) declines. This is the mathematical implementation of the sales maxim "no news is bad news." The system validates whether the penalty magnitude is appropriate against historical data. Starting from the default 30% (Weak Negation Impact × 0.30), it Grid Searches the 0–100% range, raising the ratio if more projects became Lost after silence and lowering it if many won despite silence.

MCMC posterior estimation runs from Phase 3 (min 10+), with most stable convergence at Phase 5 (min 50+). Uncertainty intervals (HDI) are provided for all parameters, not just point estimates.

4.3 "What Changes if Applied?" — Impedance Impact Simulation

Before pressing the recommend button, the administrator's burning question: "If we change this, how do our deals' scores change?"

The Impedance Impact table answers this question:

Stage	P(Win)	Current Impedance	With Recommendation	Change	Count
Discovery	21.5%	28.4%	53.5%	↑ 25.1%p	15
Qualification	31.7%	30.7%	60.3%	↑ 29.6%p	15
Proposal	46.4%	40.8%	74.0%	↑ 33.4%p	15

"Current Discovery average impedance is 28%, and applying the recommended T/k raises it to 54%." This means the recommended settings better distinguish Won from Lost deals. Larger changes indicate the current settings were further from optimal.

Chapter 5. Complete Signal System Map

5.1 Impact Types

Order	Type	Direction	Impact	f	Interpretation
1	Game Changer	α increase	5.0	0.50	Decisive positive single evidence
5	Strong Affirmation	α increase	1.0	0.10	Clear positive signal
10	Weak Affirmation	α increase	0.4	0.04	Subtle positive hint
15	No Signal	Neutral	0.1	0.01	Noise level
20	Weak Negation	β increase	0.4	0.04	Subtle negative hint
25	Strong Negation	β increase	1.0	0.10	Clear negative signal
30	Game Changer (Negative)	β increase	5.0	0.50	Decisive negative single evidence

Symmetric structure: Positive and Negative are symmetric on the same f-scale. Game Changer Negative corresponds to critical negative signals like "competitor confirmed" or "entire budget eliminated."

5.2 What f-Coupling Guarantees

Scale invariance: Identical learning trajectories regardless of Prior strength
EPR guardrails: Code-level blocking of excessive single-signal influence
Symmetry: Positive and negative on identical scales — "Evidence of equal strength produces effects of equal magnitude"
Interpretability: Every value explainable as "X% of the Prior"

References

O'Hagan, A., Buck, C.E., Daneshkhah, A., et al. (2006). Uncertain Judgements: Eliciting Experts' Probabilities. Wiley. — Expert elicitation and pseudo-count methodology.
Ibrahim, J.G. & Chen, M.H. (2000). "Power prior distributions for regression models." Statistical Science, 15(1), 46-60. — Evidence Discounting.
Youden, W.J. (1950). "Index for rating diagnostic tests." Cancer, 3(1), 32-35. — Statistical basis for threshold optimization.
Cooper, R.G. (2008). "Perspective: The Stage-Gate Idea-to-Launch Process." JPIM, 25(3). — Stage-Gate decision process.