DOCUMENTATION

Bayesian Application: Prior Alpha, Beta

The principles behind setting the Bayesian engine Prior α/β and the automatic learning roadmap

This article was written to provide EXAWin users with an in-depth explanation of the system's core — the Bayesian engine, particularly the principles behind configuring Prior α\alpha/β\beta and the automatic learning roadmap. It's too deep for a help screen, yet too important to omit.

Setting Bayesian Prior

You don't need to disassemble and reassemble an engine just to drive a car. But a driver who understands how the engine works drives better. They understand why the accelerator responds the way it does, they don't panic when a warning light comes on, and they intuitively grasp the car's limits and possibilities. The same applies to EXAWin's Bayesian engine. Understanding its workings means you won't blindly follow the numbers the system produces — instead, you'll comprehend why those numbers are what they are, trust the system, and maximize its utility.

This article is a journey toward that understanding.

EXAWin's Bayesian engine uses the NSBI (Normalized Sequential Bayesian Inference) model. We now begin the story of how this engine transforms a company's intangible assets — intuition, experience, market sense — into 💡 Prior (prior probability), a solid mathematical language, and how qualitative experience is translated into precise mathematical language to meet 70 years of statistical achievement, evolving into a self-improving intelligent engine.

Before or after reading this article, if you want a deeper understanding of the overall picture of EXAWin's Bayesian engine, please consult the following series. The EXAWin Bayesian engine operates on the structural premises described in Appendices 1, 2, and 3 below:

  • EXA Bayesian Inference: The Invisible Hand of Sales, 60-Day Gamble — Using a novel format, this vividly depicts how EXAWin operates in the actual sales field. Following a sales team through a 60-day project as they make decisions based on the system's probability predictions, you'll naturally feel why the mathematics in the appendices below is necessary.

Appendices 1-3 below are technical commentaries dissecting the internals of the EXAWin engine featured in the above story:



Prologue: Every Prediction Begins with a Bias

"A prediction without bias does not exist."

This statement may sound provocative, but it is one of the most fundamental consensuses of modern statistics. A single posthumous paper left by 18th-century English Presbyterian minister Thomas Bayes (1701-1761) forever changed how humanity confronts uncertainty. His core insight was remarkably simple: By combining what we already know (prior belief) with what we newly observe (data), we can arrive at better knowledge (posterior belief).

This principle has become the universal framework powering decision engines of modern civilization — from spam filters, autonomous driving, drug development, weather forecasting, to financial risk modeling. And now, to answer the hardest question in the sales field — "Can this project succeed?" — the same principle pulses at the heart of EXAWin.

This article unfolds in a single narrative — from the theoretical roots to practical application — how EXAWin's Bayesian engine transforms a company's empirical intuition into the language of mathematics, and how it operates as an intelligent system that evolves itself as data accumulates. There's no need to fear the formulas. Before every formula, your intuition will always guide you first.



Chapter 1. Bayes' Theorem: The Most Powerful Learning Formula — 200 Years Old

Every story begins with a single formula. Bayes' Theorem provides the only mathematically consistent answer to the question: "When new evidence appears, how should we update our existing beliefs?"

P(θD)=P(Dθ)P(θ)P(D)P(\theta \mid D) = \frac{P(D \mid \theta) \cdot P(\theta)}{P(D)}

Each element translates into the language of the sales field:

Mathematical SymbolSales Field LanguageMeaning
P(θ)P(\theta)PriorProbability of success estimated from past experience alone, without data
P(Dθ)P(D \mid \theta)LikelihoodHow well the currently observed sales activity is explained under this success rate
P(θD)P(\theta \mid D)PosteriorUpdated probability of success after incorporating evidence
P(D)P(D)EvidenceOverall probability of the data considering all hypotheses (normalizing constant)

The philosophical revolution of this formula lies in recognizing P(θ)P(\theta), the Prior, as a legitimate starting point. Traditional frequentist statistics insisted "only data can speak" and rejected the involvement of prior knowledge. But real-world decision makers — doctors, judges, investors, and sales leaders — must constantly make judgments even when there isn't a single data point. Bayes' Theorem mathematically embraces this reality.


1.1 Sequential Learning: Yesterday's Conclusion Is Today's Starting Point

The most elegant property of Bayes' Theorem is that sequential updating is possible. Today's Posterior becomes tomorrow's Prior. This recursive structure is the foundation of EXAWin's real-time learning engine.

After observing the first datum D1D_1:

P(θD1)=P(D1θ)P(θ)P(D1)P(\theta \mid D_1) = \frac{P(D_1 \mid \theta) \cdot P(\theta)}{P(D_1)}

After observing the second datum D2D_2:

P(θD1,D2)=P(D2θ)P(θD1)P(D2D1)P(\theta \mid D_1, D_2) = \frac{P(D_2 \mid \theta) \cdot P(\theta \mid D_1)}{P(D_2 \mid D_1)}

After observing through the nn-th datum:

P(θD1,D2,,Dn)P(θ)i=1nP(Diθ)P(\theta \mid D_1, D_2, \ldots, D_n) \propto P(\theta) \cdot \prod_{i=1}^{n} P(D_i \mid \theta)

The implication is clear. EXAWin automatically recalculates the success probability every time a sales activity (meeting, proposal, demo) is recorded. Like an analyst who never sleeps, the system grows by consuming evidence, never discarding a single piece of information from the first meeting to the last negotiation signal.



Chapter 2. The Beta Distribution: A Tool for Reasoning About the Probability of Probability

2.1 "Can We Say the Success Rate Is Exactly 32.7%?"

In reality, fixing a project's success rate to a single number is dangerous self-deception. A statement closer to the truth is: "The success rate is probably somewhere between 25% and 40%, with 30% being most likely." The mathematical tool specialized for expressing this uncertainty about the probability itself is the 💡 Beta Distribution.

The Beta distribution was designed to model random variables taking values between 0 and 1, and its shape is completely determined by two shape parameters α\alpha and β\beta. Its probability density function (PDF) is:

f(θ;α,β)=θα1(1θ)β1B(α,β)f(\theta; \alpha, \beta) = \frac{\theta^{\alpha-1}(1-\theta)^{\beta-1}}{B(\alpha, \beta)}

Where B(α,β)B(\alpha, \beta) is the Beta function, which normalizes so the total probability equals 1:

B(α,β)=Γ(α)Γ(β)Γ(α+β)B(\alpha, \beta) = \frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha + \beta)}

Γ()\Gamma(\cdot) is the Gamma function, where for natural number nn, Γ(n)=(n1)!\Gamma(n) = (n-1)!.

Don't be intimidated by the formula's appearance. The key is that just two numbers α\alpha and β\beta can completely describe the "distribution of belief about success."


2.2 Intuitive Interpretation of α and β: Virtual Experiment Records

The most intuitive way to understand α\alpha and β\beta is as "results of virtual experiments that haven't happened yet."

ParameterBusiness InterpretationMathematical Role
α\alphaNumber of virtual times considered "successful"Pushes the distribution rightward (toward higher probability)
β\betaNumber of virtual times considered "failed"Pushes the distribution leftward (toward lower probability)
α+β\alpha + \betaTotal virtual experiment count = Strength of convictionMakes the distribution sharper (narrower)

From this interpretation, the key statistics of the Beta distribution emerge naturally.

Expected Value (Mean) — The success rate we believe is "most plausible":

E[θ]=αα+βE[\theta] = \frac{\alpha}{\alpha + \beta}

Variance — How much that belief fluctuates:

Var[θ]=αβ(α+β)2(α+β+1)\text{Var}[\theta] = \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}

Mode — The peak of the distribution, the most likely value (when α,β>1\alpha, \beta > 1):

Mode[θ]=α1α+β2\text{Mode}[\theta] = \frac{\alpha - 1}{\alpha + \beta - 2}

2.3 Concrete Example: The Moment a Company's Intuition Becomes a Formula

Suppose a company's sales director says:

"Our company typically wins about 1 in 5 projects, and this pattern is fairly consistent."

From this single piece of intuition, EXAWin extracts two numbers:

  • Expected success rate: E[θ]=0.20E[\theta] = 0.20 (20%)
  • Strength of conviction: Moderate (not rock-solid, but experience-based)

Setting this as α=2\alpha = 2, β=8\beta = 8:

E[θ]=22+8=210=0.20E[\theta] = \frac{2}{2 + 8} = \frac{2}{10} = 0.20 Var[θ]=2×8(2+8)2×(2+8+1)=16100×11=1611000.0145\text{Var}[\theta] = \frac{2 \times 8}{(2+8)^2 \times (2+8+1)} = \frac{16}{100 \times 11} = \frac{16}{1100} \approx 0.0145

This variance of 0.0145 translates to a standard deviation of approximately 0.12, meaning the uncertainty of the success rate is roughly ±12\pm 12 percentage points. "We believe 20%, but it could be anywhere from 8% to 32%" — honest self-awareness projected into a formula.

If the same 20% expected value comes with stronger conviction — say, from a company with hundreds of historical data points — it could be set to α=20\alpha = 20, β=80\beta = 80:

E[θ]=2020+80=0.20E[\theta] = \frac{20}{20 + 80} = 0.20 Var[θ]=20×80(100)2×101=160010100000.0016\text{Var}[\theta] = \frac{20 \times 80}{(100)^2 \times 101} = \frac{1600}{1010000} \approx 0.0016

The variance decreased from 0.0145 to 0.0016, approximately 9 times smaller. Same "20%" belief, but the latter is far more solid conviction. This is the role of α+β\alpha + \beta, the "Precision" of information. The larger it is, the sharper the distribution, and the smaller the influence of each new data point. Conversely, the smaller it is, the wider the distribution, and the more sensitively the system responds to new evidence.



Chapter 3. Conjugate Prior Distribution: Where Mathematical Elegance Meets Practicality

3.1 The Beta-Binomial Conjugate Relationship

The Beta distribution was chosen as Prior not out of aesthetic preference but out of mathematical necessity. The outcome of a sales project (success/failure) follows a Binomial Distribution. And the Beta distribution is the 💡 conjugate prior of the Binomial distribution. This means the prior and posterior distributions belong to the same distribution family, and updates happen in closed form.

Given nn trials (sales activities) with kk successes observed, the Binomial likelihood is:

P(kθ,n)=(nk)θk(1θ)nkP(k \mid \theta, n) = \binom{n}{k} \theta^k (1-\theta)^{n-k}

When the Prior is Beta(α,β)\text{Beta}(\alpha, \beta), the posterior distribution is:

P(θk,n)θk(1θ)nkθα1(1θ)β1P(\theta \mid k, n) \propto \theta^k (1-\theta)^{n-k} \cdot \theta^{\alpha-1} (1-\theta)^{\beta-1} =θ(α+k)1(1θ)(β+nk)1= \theta^{(\alpha + k) - 1} (1-\theta)^{(\beta + n - k) - 1}

This is again the form of a Beta distribution:

θk,nBeta(α+k,β+nk)\theta \mid k, n \sim \text{Beta}(\alpha + k, \, \beta + n - k)

Savor the elegance of this result. The observed number of successes kk is added to α\alpha, and the number of failures nkn-k is added to β\beta. As if actual experience points naturally accumulate on top of virtual past records.


3.2 Dynamics of Updating: The Structure of the Posterior Mean

Expanding the expected value of the updated success rate reveals an interesting structure:

E[θk,n]=α+kα+β+nE[\theta \mid k, n] = \frac{\alpha + k}{\alpha + \beta + n}

This can be restructured as a weighted average:

E[θk,n]=α+βα+β+nPrior weightαα+βPrior mean+nα+β+nData weightknObserved success rateE[\theta \mid k, n] = \underbrace{\frac{\alpha + \beta}{\alpha + \beta + n}}_{\text{Prior weight}} \cdot \underbrace{\frac{\alpha}{\alpha + \beta}}_{\text{Prior mean}} + \underbrace{\frac{n}{\alpha + \beta + n}}_{\text{Data weight}} \cdot \underbrace{\frac{k}{n}}_{\text{Observed success rate}}

This formula summarizes EXAWin's engine behavior in a single line:

  • When data is scarce (small nn): Prior weight is large → The company's empirical intuition dominates predictions
  • As data accumulates (larger nn): Data weight grows → Actually observed success rates dominate predictions
  • In the limit (nn \to \infty): Posterior mean → Observed success rate k/nk/n → Prior's influence completely vanishes

This is the Bayesian system's self-correction ability. No matter what Prior is initially set, with enough data, truth reveals itself. In statistics, this property is called 💡 posterior consistency, and it is a mathematically proven theorem.



Chapter 4. The 70-Year Pursuit: Empirical Bayes — When Data Surpasses the Teacher

4.1 Herbert Robbins's Revolutionary Question (1956)

In 1956, mathematical statistician Herbert Robbins of Columbia University presented a paper at the Third Berkeley Symposium on Mathematical Statistics that ignited a quiet revolution: "An Empirical Bayes Approach to Statistics."

His question was bold: "Can we estimate the Prior from the data itself, without subjective specification?"

This was an oblique yet profound answer to the decades-long philosophical debate between Bayesians and frequentists. Robbins showed that when multiple related problems exist simultaneously — for example, when a company manages dozens of sales projects at once — individual problem data can be pooled to reverse-estimate structural patterns of the entire population.

This idea is called 💡 Empirical Bayes.


4.2 Efron and Morris's Baseball Story (1975)

Robbins's theory gained attention beyond academia with Bradley Efron and Carl Morris's 1975 paper "Data Analysis Using Stein's Estimator and its Generalizations."

They tackled the problem of predicting full-season batting averages from just 45 at-bats early in the 1970 MLB season. Remarkably, instead of using each player's individual stats (45-AB average), "shrinkage" estimates — pulled toward the group mean — turned out to be closer to actual end-of-season averages.

The mathematical basis for this phenomenon is the 💡 James-Stein Estimator:

θ^iJS=θˉ+(1(p2)σ2i=1p(θiθˉ)2)(θiθˉ)\hat{\theta}_i^{JS} = \bar{\theta} + \left(1 - \frac{(p-2)\sigma^2}{\sum_{i=1}^{p}(\theta_i - \bar{\theta})^2}\right) (\theta_i - \bar{\theta})

Where θˉ\bar{\theta} is the overall mean, pp is the number of simultaneously estimated parameters (p3p \geq 3), and σ2\sigma^2 is the variance of individual estimates. This estimator always has smaller mean squared error (MSE) than the individual maximum likelihood estimator (MLE). This is called Stein's paradox, and since Charles Stein proved it in 1961, it remains one of the most counterintuitive yet powerful results in modern statistics.

The implication for EXAWin is clear: Estimating each project's success rate from that project's data alone is mathematically guaranteed to be inferior to extracting structural patterns from the company's entire project pool and using them to calibrate each project's estimate.


4.3 Hyperparameter Estimation via Method of Moments

One of the core techniques of Empirical Bayes is 💡 Method of Moments for hyperparameter estimation. When dozens of projects have concluded and the distribution of actual success rates is observed, the Beta distribution's parameters can be calculated inversely from that distribution's mean and variance.

Let xˉ\bar{x} be the sample mean and s2s^2 the sample variance of observed success rates:

α=xˉ(xˉ(1xˉ)s21)\alpha = \bar{x} \left( \frac{\bar{x}(1-\bar{x})}{s^{2}} - 1 \right) β=(1xˉ)(xˉ(1xˉ)s21)\beta = (1-\bar{x}) \left( \frac{\bar{x}(1-\bar{x})}{s^{2}} - 1 \right)

The insight this formula reveals lies in control over variance (s2s^2).

First, interpreting the structure of α+β\alpha + \beta:

α+β=xˉ(1xˉ)s21\alpha + \beta = \frac{\bar{x}(1-\bar{x})}{s^{2}} - 1
  • Small variance (all projects show similar success patterns): α+β\alpha + \beta gets large → Strong Prior → System is confident in the company's pattern
  • Large variance (results vary wildly across projects): α+β\alpha + \beta gets small → Weak Prior → System responds more sensitively to each project's individual data

This automatic adjustment mechanism is the process of the system listening to the voice of the company's data. "Our company's sales patterns are consistent, so trust past experience more" or "Results vary too much project to project, so respect each deal's field data more" — these strategic judgments are made automatically by the system.


4.4 Precise Estimation via MLE

A statistically more efficient method than moments (capable of more accurate estimation with less data) is 💡 Maximum Likelihood Estimation (MLE).

If mm projects each had nin_i trials with kik_i successes, the marginal likelihood is:

L(α,β)=i=1mB(α+ki,β+niki)B(α,β)L(\alpha, \beta) = \prod_{i=1}^{m} \frac{B(\alpha + k_i, \, \beta + n_i - k_i)}{B(\alpha, \beta)}

Where B(,)B(\cdot, \cdot) is the Beta function. Finding the α\alpha, β\beta that maximize this likelihood yields the Prior best fitting the data. Expanding the log-likelihood:

(α,β)=i=1m[lnB(α+ki,β+niki)lnB(α,β)]\ell(\alpha, \beta) = \sum_{i=1}^{m} \left[ \ln B(\alpha + k_i, \beta + n_i - k_i) - \ln B(\alpha, \beta) \right]

Differentiating using the digamma function ψ()\psi(\cdot):

α=i=1m[ψ(α+ki)ψ(α+β+ni)ψ(α)+ψ(α+β)]=0\frac{\partial \ell}{\partial \alpha} = \sum_{i=1}^{m} \left[ \psi(\alpha + k_i) - \psi(\alpha + \beta + n_i) - \psi(\alpha) + \psi(\alpha + \beta) \right] = 0 β=i=1m[ψ(β+niki)ψ(α+β+ni)ψ(β)+ψ(α+β)]=0\frac{\partial \ell}{\partial \beta} = \sum_{i=1}^{m} \left[ \psi(\beta + n_i - k_i) - \psi(\alpha + \beta + n_i) - \psi(\beta) + \psi(\alpha + \beta) \right] = 0

This system of equations has no analytical solution and must be solved numerically via Newton-Raphson or fixed-point iteration. EXAWin automatically runs this MLE estimation at Phase 3 with sufficient data accumulation, precisely calibrating the Method of Moments estimate.



Chapter 5. EXAWin's Intelligent Evolution Roadmap

Data gains vitality as it accumulates. EXAWin divides parameter learning strategy into 5 stages (Phases) according to the company's data maturity. This is not an arbitrary division but a design reflecting the transition from the 💡 law of small numbers to the 💡 law of large numbers.

One key principle: The lesser count of Won and Lost (min) determines the overall confidence tier. Even with 50 Won, if there are only 3 Lost, there's insufficient basis to learn "Lost patterns." Analyzing only winning games while ignoring losses and then discussing tactics is not worthy of a coach — same principle.

PhaseGrademin(Won, Lost)Learning ScopeDesign Rationale
❌ Phase 1Impossible< 5Analysis impossibleBinomial test power virtually 0 at n<5n < 5
🟠 Phase 2Minimal5 ~ 9Display only (Apply locked)Directional reference only, extreme overfitting risk
✅ Phase 3Moderate10 ~ 19Impact, T, k + MCMCCLT begins operating, MCMC executable (possible convergence instability)
🟢 Phase 4Good20 ~ 49Full (Dampening, Silence included) + MCMCMost parameters with high confidence
🔵 Phase 5Excellent50+Full + MCMC stable convergenceGrid Search convergence, maximum MCMC posterior confidence

Below we explain each Phase's statistical rationale and EXAWin's behavior; for complete technical anatomy, refer to the Auto-Tuner anatomy series:


Phase 1. Analysis Impossible (min < 5)

"An engine that hasn't yet opened its eyes"

When Won or Lost is under 5, sample statistics are extremely unstable. 2 successes out of 3 attempts gives a sample success rate of 66.7%, but concluding the company's structural win rate is 67% based on this is like flipping a coin three times, getting two heads, and declaring "this coin has a 67% probability of landing heads."

In this period, EXAWin:

  • Administrators manually set α\alpha, β\beta based on the company's historical success rate and industry benchmarks
  • The system visually displays the meaning of the configured Prior (expected value, confidence interval)
  • No automatic analysis — a period of respecting human judgment

Phase 2. Directional Reference (min 5 ~ 9)

"The engine begins to open its eyes — but its hands are still tied"

When the minimum data requirement (Won ≥ 5 AND Lost ≥ 5) is met, the Auto-Tuner begins displaying analysis results for the first time. Signal Lift, Impact recommendations, and T/k recommendations are computed, but the Apply button remains locked. Estimates at this stage carry extremely high overfitting risk.

The width of the 95% confidence interval for binomial proportion estimation at n=5n = 5 is approximately ±40\pm 40 percentage points. Saying "3 wins in 5 means 60% success rate" actually means the true value could be anywhere between 17% and 93% — too wide a net to call it prediction.


Phase 3. Major Parameter Adjustment (min 10 ~ 19)

"The eye of discrimination opens"

When min(Won, Lost) ≥ 10 is achieved, the Central Limit Theorem (CLT) causes the sampling distribution of the mean to begin converging to a normal distribution. From this point, Method of Moments estimation is statistically justified, and the Auto-Tuner unlocks Apply for the three core parameters: Impact, T, k.

However, fine-tuning parameters like Dampening and Silence Penalty are not yet adjusted. Isolating optimal values for these parameters requires richer data — moving too many parameters simultaneously with too little data leads to the overfitting trap.


Phase 4. Full Parameter Unlock (min 20 ~ 49)

"The engine begins adjusting its own fuel"

When 20+ data points have accumulated on both sides, the law of large numbers begins to take full effect. In this period, EXAWin optimizes the remaining two parameters — Dampening (simultaneous signal attenuation ratio) and Silence Penalty (silence penalty intensity) — via Grid Search.

Additionally, Method of Moments estimates and Grid Search optima are cross-checked using K-fold Cross-Validation. If the gap between training separation and validation separation is large, an overfitting warning is issued; if the gap is small, the recommendation's generalizability is deemed high.


Phase 5. Statistical Stability + MCMC Stable Convergence (min 50+)

"A mature engine's self-governance"

When 50+ data points have accumulated on both sides, MLE estimation achieves asymptotic efficiency. This means no unbiased estimator with smaller variance than MLE exists, guaranteed by the 💡 Cramér-Rao Lower Bound:

Var(α^MLE)1I(α)\text{Var}(\hat{\alpha}_{MLE}) \geq \frac{1}{I(\alpha)}

Where I(α)=E[2α2]I(\alpha) = -E\left[\frac{\partial^2 \ell}{\partial \alpha^2}\right] is the Fisher Information.

In this Phase, EXAWin goes beyond Grid Search point estimates to estimate the posterior distribution of parameters via 💡 MCMC (Markov Chain Monte Carlo). MCMC runs from Phase 3, but at Phase 5 data is sufficient for the most stable convergence. Using Emcee (Affine-Invariant Ensemble Sampler), it samples the joint posterior of the (N+2)-dimensional parameter space — NN Impact values + Dampening + Silence Penalty — and provides HDI (Highest Density Interval) for each parameter (NN = number of Impact types excluding No Signal, 8 in standard configuration).

While Grid Search finds "the single optimal point," MCMC draws "the uncertainty terrain around that point." When R̂ (convergence diagnostic) is close to 1.0, the chains are deemed converged, and narrow HDI indicates strong data confidence for that parameter.

In this period, EXAWin:

  • Provides Grid Search point estimates + MCMC interval estimates simultaneously
  • Monitors overfitting via K-fold cross-validation
  • Reports overall discriminative power via ROC AUC
  • Provides Prior α\alpha, β\beta recommendation via Method of Moments (human approval mandatory, no auto-application)
  • Data-driven — but the final application decision always rests with humans


Chapter 6. Evidence Maturity: The Process of Adding Weight to Predictions

6.1 Monotonic Accumulation of Evidence

As the data (nn) observed across a project's lifecycle increases, the posterior distribution's total parameter sum (α+k)+(β+nk)=α+β+n(\alpha + k) + (\beta + n - k) = \alpha + \beta + n increases monotonically. This means the posterior distribution's variance decreases monotonically:

Var[θk,n]=(α+k)(β+nk)(α+β+n)2(α+β+n+1)\text{Var}[\theta \mid k, n] = \frac{(\alpha+k)(\beta+n-k)}{(\alpha+\beta+n)^2(\alpha+\beta+n+1)}

As nn increases, the denominator dominates:

Var[θk,n]=O(1n)0asn\text{Var}[\theta \mid k, n] = O\left(\frac{1}{n}\right) \to 0 \quad \text{as} \quad n \to \infty

This mathematically proves that the Bayesian engine is more than a mere calculator — it is a "learning organism" that builds conviction as evidence accumulates.


6.2 Three Stages of Growth

StageStateDescription
🌱 Early Stageα+β+n<15\alpha + \beta + n < 15Information is scarce; reacts sensitively to small external stimuli. Probability predictions fluctuate across a wide range.
🌿 Growing Stage15α+β+n<5015 \leq \alpha + \beta + n < 50Data direction emerges; predictions begin finding a solid center. Confidence intervals narrow.
🌳 Mature Stageα+β+n50\alpha + \beta + n \geq 50Overwhelming evidence secured; maintains "expert-level reliability" unshaken by typical noise.

The width of the 95% confidence interval at each stage, calculated using the Beta distribution's quantile function:

CI95%=FBeta1(0.975;α+k,β+nk)FBeta1(0.025;α+k,β+nk)\text{CI}_{95\%} = F_{\text{Beta}}^{-1}(0.975; \, \alpha+k, \, \beta+n-k) - F_{\text{Beta}}^{-1}(0.025; \, \alpha+k, \, \beta+n-k)

If this width was 0.4 or more (40 percentage points) at the Early Stage, it typically converges to under 0.1 (10 percentage points) at the Mature Stage. This is the process of data adding the weight of conviction.



Chapter 7. Theoretical Guarantees: Why This System Can Be Trusted

7.1 Posterior Consistency Theorem

The most frequent criticism of Bayesian inference is "Isn't the Prior subjective?" The mathematical answer to this critique is 💡 posterior consistency.

Established by the work of Doob (1949), Schwartz (1965), and Ghosh and Ramamoorthi (2003), this theorem states that for every open neighborhood UU of the true value θ0\theta_0:

P(θUD1,D2,,Dn)a.s.1asnP(\theta \in U \mid D_1, D_2, \ldots, D_n) \xrightarrow{a.s.} 1 \quad \text{as} \quad n \to \infty In other words, regardless of where the Prior is set (as long as it's not an extremely biased Prior), the posterior converges to the truth with probability 1 given sufficient data.

This means "whether you initially set 20% or 50%, after 30 completed projects, the system reaches the same conclusion." The Prior is a starting point, not a destination. Because of this mathematical guarantee, EXAWin can accept a company's initial intuitive settings without fear.


7.2 Optimality Guarantee: Superior Even in Frequentist Risk

Empirical Bayes estimators are proven superior not only within the Bayesian framework but also from a frequentist perspective. Synthesizing the results of Robbins (1956) and Efron-Morris (1975): In simultaneous estimation problems with p3p \geq 3, the frequentist risk of the empirical Bayes shrinkage estimator is always less than that of MLE:

R(θ,θ^EB)=E[θ^EBθ2]<E[θ^MLEθ2]=R(θ,θ^MLE)R(\theta, \hat{\theta}^{EB}) = E\left[\|\hat{\theta}^{EB} - \theta\|^2\right] < E\left[\|\hat{\theta}^{MLE} - \theta\|^2\right] = R(\theta, \hat{\theta}^{MLE})

This inequality holds for all values of θ\theta (admissibility). For any company simultaneously managing 3 or more projects, empirical Bayes is universally superior to individual estimation.



Epilogue: The Moment the Weight of Conviction Changes

In retrospect, EXAWin's Bayesian Prior configuration is not simply "a screen for entering two numbers."

It is the act of recording — in the most rigorous mathematical language — a company's unique intuition and empirical win rate, cultivated over decades in the market: intangible but real assets. And upon that record, each day's sales activities accumulate as new evidence, while the system respects the initial human settings yet ultimately corrects itself through the voice of data.

The weight of conviction that this process is logically, theoretically, and empirically justified did not come from one or two geniuses' brilliant intuitions. From Bayes's posthumous paper in 1763, through Robbins's empirical Bayes in 1956, Stein's paradox in 1961, Efron-Morris's shrinkage estimation in 1975, to Ghosh-Ramamoorthi's posterior consistency theorem in 2003 — it stands upon 260 years of mathematical guarantees proven line by line by the collective intelligence of the global academic community.

The moment users configure EXAWin's α\alpha and β\beta, they quietly shake hands with this 260-year intellectual heritage.



References

  1. Bayes, T. (1763). "An Essay towards Solving a Problem in the Doctrine of Chances." Philosophical Transactions of the Royal Society of London, 53, 370-418.

  2. Robbins, H. (1956). "An Empirical Bayes Approach to Statistics." Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 1, 157-163.

  3. Stein, C. (1956). "Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution." Proceedings of the Third Berkeley Symposium, 1, 197-206.

  4. James, W. and Stein, C. (1961). "Estimation with Quadratic Loss." Proceedings of the Fourth Berkeley Symposium, 1, 361-379.

  5. Efron, B. and Morris, C. (1975). "Data Analysis Using Stein's Estimator and its Generalizations." Journal of the American Statistical Association, 70(350), 311-319.

  6. Casella, G. (1985). "An Introduction to Empirical Bayes Data Analysis." The American Statistician, 39(2), 83-87.

  7. Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., and Rubin, D.B. (2013). Bayesian Data Analysis. 3rd Edition, CRC Press.

  8. Ghosh, J.K. and Ramamoorthi, R.V. (2003). Bayesian Nonparametrics. Springer Series in Statistics.