DOCUMENTATION

Bayesian Application: Parameter Calibration and Automatic Optimization

The calibration principles of Signal Impact (f-coupling), EPR guardrails, and the 6-stage automatic parameter optimization (BAT) system based on data maturity.

In the previous part: Prior Ξ±/Ξ² Configuration Principles, we covered the configuration and learning roadmap for prior probabilities. This article answers the next question: How are signal sizes (Impact) determined, and how does the system optimize itself?

No matter how sophisticated the engine's formulas are, if the numbers fed into those formulas are wrong, the answers will be wrong. Just as a car engine β€” however precisely machined β€” will knock if the fuel's octane rating is off. This article is about the principles of putting the right fuel into EXAWin's Bayesian engine, and the mechanism by which the system adjusts its own fuel as data accumulates.



Chapter 1. A World Without Likelihood: The Pseudo-Count Approach

1.1 An Honest Starting Point

In standard Bayesian inference, updates are performed through the likelihood function P(D∣θ)P(D \mid \theta). However, sales signals β€” "the customer response was positive", "the decision maker is leaning our way" β€” are not observations drawn from a mathematically defined probability distribution.

EXAWin resolves this limitation with the pseudo-count approach:

Ξ±new=Ξ±prev+SWVΓ—Impact\alpha_{\text{new}} = \alpha_{\text{prev}} + \text{SWV} \times \text{Impact}

This declares: "This signal has evidential weight equivalent to SWV Γ— Impact virtual success observations." Since the Beta distribution is valid for all real numbers Ξ±,Ξ²>0\alpha, \beta > 0, the pseudo-count need not be an integer.

This methodology is established in statistics under the name expert elicitation (O'Hagan et al., 2006). When an expert assesses "this evidence is equivalent to N direct observations," adding N as a pseudo-count to Ξ±\alpha or Ξ²\beta is a justified methodology.

The key question is: How do we determine that N?

Chapter 2. f-Coupling: Aligning Signal and Prior Scales

2.1 The Danger of Independent Scales

If the Prior (Ξ±0=2,Ξ²0=8\alpha_0=2, \beta_0=8, strength S=10S=10) and Signal Impact are set independently, a single signal can overwhelm the company's entire historical experience in one instant β€” an unrealistic scenario.

2.2 Evidence Fraction (ff)

The solution is to define Signal Impact as a fraction of Prior strength:

Impacti=fiΓ—SwhereΒ S=Ξ±0+Ξ²0\text{Impact}_i = f_i \times S \quad \text{where } S = \alpha_0 + \beta_0

fif_i represents the ratio: "What percentage of the company's total prior experience does one occurrence of this signal represent as evidence?"

Signal TypeffImpact (S=10S=10)Interpretation
Game Changer0.505.0A single piece of evidence half as strong as total prior experience
Strong Affirmation/Negation0.101.010% of prior experience β€” clear signal
Weak Affirmation/Negation0.040.44% of prior experience β€” subtle hint
No Signal0.010.1Virtually noise level

2.3 Scale Invariance

The key property of this coupling: As long as ff is the same, the P(Win) trajectory is completely identical regardless of Prior strength SS. Setting Ξ±0=rS\alpha_0 = rS, Ξ²0=(1βˆ’r)S\beta_0 = (1-r)S, SS cancels out, leaving a function of ff only. Whether S=10S=10 or S=100S=100, the learning trajectory is identical β€” this is the raison d'Γͺtre of coupling.

2.4 EPR Guardrails

Evidence-Prior Ratio (EPR) is a diagnostic metric measuring how much the maximum evidence from a single meeting can affect the Prior:

EPR=SWVmax⁑×ImpactS\text{EPR} = \frac{\text{SWV}_{\max} \times \text{Impact}}{S}

EXAWin enforces EPR upper bounds at the code level:

Signal TypeEPR CapMax Impact (S=10S=10)Design Rationale
Game Changer2.07.7Intentional override permitted, but cannot exceed 200%
Regular signals0.51.9Limited to within 50% of Prior

When a user changes an Impact value in Signal Master, if the cap is exceeded, the save is rejected. This is a safety mechanism preventing inexperienced users from destabilizing the system by setting extreme parameters.



Chapter 3. Decision Impedance: From Probability to Action

3.1 Impedance Formula

I=11+exp⁑(βˆ’kβ‹…(P(Win)βˆ’T))I = \frac{1}{1 + \exp\bigl(-k \cdot (P(\text{Win}) - T)\bigr)}
  • TT = Threshold β€” "The minimum bar a deal must clear at each stage"
  • kk = Slope β€” Discrimination sensitivity near the threshold

3.2 Default Parameters by Stage

StageSWVTTkkDesign Intent
Discovery1.000.355Low bar, generous exploration
Qualification1.690.407Basic verification
Solution-Fit2.100.457Confirming fit
Proposal2.390.5012Cost commitment β€” sharp discrimination
Negotiation2.610.5511Final gate, strictest

TT increases as stages progress. "P(Win)=30% at Discovery is fine β€” keep exploring. But reaching Proposal at P(Win)=40% demands serious reconsideration."

kk determines discrimination sensitivity. At Proposal (k=12k=12), whether P(Win) is above or below T=0.50T=0.50 causes a sharp Impedance split β€” a design philosophy of no tolerance for ambiguity at the cost commitment stage.



Chapter 4. Auto-Tuner: How the Engine Adjusts Its Own Fuel

4.1 Data Maturity Phase β€” "Is There Enough Track Record?"

The Auto-Tuner's first question is straightforward: "How many completed projects does this company have?"

Why completed projects? β€” An in-progress deal cannot verify "whether these settings were right or wrong." Like evaluating a coach's tactics at halftime. Only finished matches (Won/Lost) provide grounds for tactical adjustment.

Key: The lesser count of Won and Lost (min) determines the overall confidence tier. Even with 50 Won, if there are only 3 Lost, there's insufficient basis to learn Lost patterns.

PhaseGrademin(Won, Lost)ScopeLearning Confidence
❌ Phase 1Impossible< 5Analysis impossibleInsufficient data, binomial test power virtually 0
🟠 Phase 2Minimal5 ~ 9Display only (Apply locked)Directional reference only, extreme overfitting risk
βœ… Phase 3Moderate10 ~ 19Impact, T, k + MCMCCLT begins operating, MCMC executable (convergence may be unstable)
🟒 Phase 4Good20 ~ 49Full (Dampening, Silence included) + MCMCMost parameters with high confidence, meaningful cross-validation
πŸ”΅ Phase 5Excellent50+Full + MCMC stable convergenceGrid Search convergence, maximum MCMC posterior confidence

Phase-based dynamic adjustments also apply. Grid Search range expands from Β±20% at Phase 2 to Β±50% at Phase 5, while Signal Lift minimum appearances strengthen from 3 at Phase 2 to 10 at Phase 5. As data grows richer, the system explores wider ranges while demanding stricter evidence β€” an exquisite balance of humility and ambition.

For a complete technical anatomy of the Auto-Tuner, refer to the separate series:

4.2 What the Auto-Tuner Does β€” "A Coach Analyzing Past Game Films"

The best analogy for understanding the Auto-Tuner is a sports coach's game analysis β€” all explanations from here use this analogy.

Your company has completed 25 projects. 10 won, 15 lost. The Auto-Tuner pulls up these 25 game records and checks one by one: "Were our team's tactics (parameters) optimal?"


β‘  "What's our team's baseline strength?" β€” Prior Recommendation

Question: "What is our fundamental probability of winning?"

System setting: Current Prior = Ξ± 2, Ξ² 8 β†’ baseline success rate 20%. But looking at the actual 25 project results, 10 were won for a success rate of 40%. The system says:

"Your team's actual success rate is closer to 40%. With the starting point set at 20%, the first 3–4 meetings will just keep repeating 'probability is still low.' Should we raise the starting point?"

However, this value is not automatically applied. The Prior is a strategic judgment by leadership about "our team's baseline capability." "Raising to 40% might breed complacency, and conservatively setting 25% is also a valid strategy" β€” that decision belongs to the manager (leadership), not the coach (system).


β‘‘ "Which signals were genuinely meaningful?" β€” Signal Lift

Question: "Among the signals we recorded, which were actually correlated with success?"

The system examines signals across 25 projects:

SignalAppeared in WonAppeared in LostLiftInterpretation
"Technical fit confirmed"8 of 10 Won3 of 15 Lost4.0βœ… Deals with this signal succeeded 4Γ— more often
"Budget secured"642.3Meaningful
"Competitor presence confirmed"7101.1❌ No meaningful difference

Lift > 1: "Deals where this signal appeared actually performed better." Lift β‰ˆ 1: "This signal had no correlation with success."

This information tells the sales team "which signals to pay more attention to."


β‘’ "Were the signal weights appropriate?" β€” Impact Calibration

Question: "We gave Strong Affirmation a score of 1.0 β€” would 0.7 have been more accurate?"

The system tries Impact values from 0.1 to 10.0 one by one through simulation:

  • "Lowering Strong to 0.5 β€” Won and Lost P(Win) overlap too much (indistinguishable)."
  • "Setting Strong to 1.0 β€” Won avg P(Win)=55%, Lost avg=30%. Clean separation."
  • "Raising Strong to 2.0 β€” Won goes to 70%, but Lost also rises to 45% (overreaction)."

It finds and recommends the Impact value that most cleanly separates Won from Lost. The same principle a doctor uses when looking at blood test results: "Where should we place the normal/abnormal threshold for the most accurate diagnosis?"


β‘£ "Where do we draw the pass line?" β€” Threshold TT Optimization

Question: "T=0.35 at Discovery β€” was this too lenient or too strict?"

The system validates against historical data:

  • T=0.25 lower: All Won pass, but most Lost also pass β†’ "Passing score too low β€” failing deals pass too"
  • T=0.35 current: 80% of Won pass, 70% of Lost fail β†’ "Appropriate level"
  • T=0.50 higher: Nearly all Lost fail, but half of Won also fail β†’ "Passing score too high β€” good deals get filtered"

Same as setting an exam cutoff. "Set it too low and unqualified students pass; set it too high and decent students fail." The optimal cutoff simultaneously maximizes the proportion of actual top performers among those who pass and actual underperformers among those who fail.


β‘€ "How sharply should we discriminate?" β€” Slope kk Adjustment

Question: "How sensitively should the system react near the threshold (T)?"

Small kk: When P(Win) is near T, the system gently says "still ambiguous." Suitable for early stages β€” still exploring.

Large kk: If P(Win) drops even slightly below T, the system immediately flags "πŸ”΄ danger." Suitable for Proposal and Negotiation β€” "maybe..." is unacceptable once money is being spent.

The system performs Grid Search across k from 1 to 12 for each stage, finding the value that maximizes impedance separation between Won and Lost. The theoretical upper bound is k=12k = 12 β€” beyond this, the sigmoid becomes effectively a step function, degrading from "discrimination" to "binary chopping."


β‘₯ "Fine-Tuning" β€” Dampening and Silence Penalty (Phase 4+ Only)

These two parameters are adjusted only at Phase 4 (min 20+ projects). Meaningful fine-tuning requires sufficient data β€” moving too many parameters simultaneously with too little data leads to the overfitting trap.

Dampening: When 3 signals emerge simultaneously from one meeting, only the strongest signal is reflected at 100%, and the rest at only 25%. The Auto-Tuner validates via Grid Search whether this 25% is optimal β€” searching from 0% to 100% to find the attenuation rate that maximizes Won/Lost separation.

Silence Penalty: If the customer hasn't been contacted for 2+ weeks, Ξ² gradually increases and P(Win) declines. This is the mathematical implementation of the sales maxim "no news is bad news." The system validates whether the penalty magnitude is appropriate against historical data. Starting from the default 30% (Weak Negation Impact Γ— 0.30), it Grid Searches the 0–100% range, raising the ratio if more projects became Lost after silence and lowering it if many won despite silence.

MCMC posterior estimation runs from Phase 3 (min 10+), with most stable convergence at Phase 5 (min 50+). Uncertainty intervals (HDI) are provided for all parameters, not just point estimates.

4.3 "What Changes if Applied?" β€” Impedance Impact Simulation

Before pressing the recommend button, the administrator's burning question: "If we change this, how do our deals' scores change?"

The Impedance Impact table answers this question:

StageP(Win)Current ImpedanceWith RecommendationChangeCount
Discovery21.5%28.4%53.5%↑ 25.1%p15
Qualification31.7%30.7%60.3%↑ 29.6%p15
Proposal46.4%40.8%74.0%↑ 33.4%p15

"Current Discovery average impedance is 28%, and applying the recommended T/k raises it to 54%." This means the recommended settings better distinguish Won from Lost deals. Larger changes indicate the current settings were further from optimal.



Chapter 5. Complete Signal System Map

5.1 Impact Types

OrderTypeDirectionImpactfInterpretation
1Game ChangerΞ± increase5.00.50Decisive positive single evidence
5Strong AffirmationΞ± increase1.00.10Clear positive signal
10Weak AffirmationΞ± increase0.40.04Subtle positive hint
15No SignalNeutral0.10.01Noise level
20Weak NegationΞ² increase0.40.04Subtle negative hint
25Strong NegationΞ² increase1.00.10Clear negative signal
30Game Changer (Negative)Ξ² increase5.00.50Decisive negative single evidence

Symmetric structure: Positive and Negative are symmetric on the same f-scale. Game Changer Negative corresponds to critical negative signals like "competitor confirmed" or "entire budget eliminated."

5.2 What f-Coupling Guarantees

  1. Scale invariance: Identical learning trajectories regardless of Prior strength
  2. EPR guardrails: Code-level blocking of excessive single-signal influence
  3. Symmetry: Positive and negative on identical scales β€” "Evidence of equal strength produces effects of equal magnitude"
  4. Interpretability: Every value explainable as "X% of the Prior"


References

  1. O'Hagan, A., Buck, C.E., Daneshkhah, A., et al. (2006). Uncertain Judgements: Eliciting Experts' Probabilities. Wiley. β€” Expert elicitation and pseudo-count methodology.
  2. Ibrahim, J.G. & Chen, M.H. (2000). "Power prior distributions for regression models." Statistical Science, 15(1), 46-60. β€” Evidence Discounting.
  3. Youden, W.J. (1950). "Index for rating diagnostic tests." Cancer, 3(1), 32-35. β€” Statistical basis for threshold optimization.
  4. Cooper, R.G. (2008). "Perspective: The Stage-Gate Idea-to-Launch Process." JPIM, 25(3). β€” Stage-Gate decision process.