Replacing Intuition with Mathematical Language

This article explains the mathematical principles and effectiveness of the Bayesian engine covered in the BA02 Episode. The goal is to precisely predict sales success probabilities in an uncertain business environment. At its core, it addresses the process of deriving optimal decision-making indicators by combining the Beta distribution, which quantifies past experiences, and the Binomial distribution, which captures real-time signals from the field. In particular, it emphasizes maximizing the system’s real-time performance and computational efficiency by utilizing Conjugate Prior distributions, which allow for immediate updates without complex calculations. Furthermore, this model adopts a Recursive Estimation method that makes immediate judgments whenever data occurs, securing technical validity optimized for modern business. Consequently, this document clearly demonstrates how sophisticated mathematical modeling transforms vague intuition into reliable, data-driven insights.

In the fog of business, sales directors, managers, and executives who must make decisions always feel a sense of thirst. They crave an answer to the question, "What is the winning percentage in this situation right now?" The 'Bayesian Engine,' the heart of the Exa system, translates this abstract process into the most sophisticated language: mathematics.

In this article, we will deeply analyze the mathematical pillars supporting the architecture of this engine in sales environments or similar situations, and why this is the 'optimal solution' in an enterprise environment.

Meanwhile, Bayesian models based on MCMC or Deep Learning are great assets of humanity for solving high-dimensional complex problems. Nevertheless, in specific domains such as sales success probability inference, emphasizing that the 'mathematical efficiency' and 'clarity' of the Beta-Binomial model are the most powerful weapons is also a way to secure technical objectivity.

Note: Exa's AI engine uses the appropriate Bayesian mathematics according to individual situations. Since the applied situations vary, most Bayesian mathematics are applied, and AI technologies already proven in the field—such as ML, DL, RL, and LLM—are mobilized within the engine based on business needs. This article targets only the technical content of the mathematics used in the sales episode.

Reflecting this context, while respecting the reason for each technology's existence, I intend to logically describe why the technologies used in this episode are the 'Golden Standard' in this field.

1. Quantification of Experience: 'Beta Distribution' as a Prior Distribution

All Bayesian inference starts from the subjectivity, intuition, and beliefs of the individual (stakeholders), or researched and known empirical data of the domain—in other words, 'what we believe to start with.' In this type of scenario, the model contains the initial state of the business or accumulated experience in a vessel called the Beta distribution.

1.1 Mathematical Definition

The Beta distribution is a probability density function optimized for handling probability values between 0 and 1. The function is defined by the formula below. (Details of the Beta distribution are explained in another article dissecting the Beta distribution.)

f(x; \alpha, \beta) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha, \beta)}

Here, the denominator α, β is the Beta function, a normalization constant that makes the sum of total probabilities 1, and the core drivers are the two parameters α and β.

α (Alpha): Strength of accumulated evidence for success
β (Beta): Strength of accumulated evidence for risk or failure

1.2 Interpretation

Let's look at the structure of the numerator $α-1(1-x)β-1$ in the formula. As α increases, the center of the distribution moves toward 1 (success), and as β increases, it moves toward 0 (failure).

At the beginning of a business, based on market statistics, we can assign values like α=2, β=8. This shapes the 'prior empirical knowledge' that "so far, 2 out of 10 attempts were successful" into a mathematical curve.

The probability is calculated as α/(α+β). This allows us to model prior experience, knowledge, or domain intuition into numbers such as "The success rate, defect rate, or response rate is 20%." Here, 2 and 8 represent the strength of belief; the larger the numbers, the stronger the belief. For example, 20 and 80 result in the same 20% success rate as 2 and 8, but the strength of belief is much greater.

α and β are hyperparameters that we assign ourselves (or measure from past performance data) so that we can model prior knowledge. These values are adjusted by the Bayesian engine to actual values as data (evidence) accumulates. This is the starting point of the process of tracking how well subjective probability matches actual data.

In other words, the starting point of this model is that it begins with intelligence possessing experience, not in a state with zero data.

2. Signals from the Field: 'Binomial Distribution' as a Likelihood Function

Events occurring in the sales field (meetings, quote requests, etc.) eventually result in discrete outcomes: a 'successful signal' or a 'non-successful signal.' The tool that captures this is the Binomial distribution.

2.1 Mathematical Definition

The probability of succeeding k times when an event with a success probability of p is performed n times is as follows:

P(X=k) = {n \choose k} p^k (1-p)^{n-k}

This formula quantifies the 'fact (Evidence)' heard from the field into a numerical value (Likelihood). $p^k(1-p)n-k$ measures how much our assumed probability p matches the actual result k. The system regards the results of every step entered by the salesperson as this binomial trial, substituting rough interactions with refined mathematical signals.

2.2 Weight of Evidence (WoE)

Why do some signals have high weights while others have low weights?

The Bayesian model used in this episode reflects the concept of Weight of Evidence (WoE)—used by Claude Shannon in Information Theory and Alan Turing in cryptography—into the evidence data of the likelihood function Binomial distribution.

It is the logarithm of the Likelihood Ratio between the probability of a signal appearing in the 'success' group and the probability of it appearing in the 'failure' group. The reason why "mentioning a competitor at the final contract negotiation stage" is fatal is that the Information Gain when that signal occurs at that stage is much larger than at the initial stage.

The use of log-scale weights is the result of mathematically reflecting this 'density of information.'

2.3 Interpretation

This formula quantifies the 'fact (Evidence)' from the field by reflecting WoE. $p^k(1-p)^{n-k}$ measures how much our assumed probability p matches the actual result k. The system regards the results of every step entered by the salesperson as this binomial trial, substituting rough interactions with refined mathematical signals.

3. Combination of Knowledge: The Magic of the 'Conjugate Prior'

The pinnacle of the Bayesian engine lies in the update process that creates 'tomorrow's certainty' by adding 'today's signal' to 'yesterday's knowledge.'

3.1 Mathematical Combination (Posterior Update)

By Bayes' theorem, the Posterior probability is calculated as follows:

P(p|Data) \propto P(Data|p) \times P(p)

At this time, when the Beta distribution (Prior: prior knowledge, subjective belief) and the Binomial distribution (Likelihood: evidence data) are combined, an amazing mathematical harmony occurs. The mathematical process of this combination will be explained in a separate article dissecting the Beta distribution, but the resulting formula below can be verified through various mathematical textbooks.

P(p|k) = \frac{p^{(\alpha+k)-1}(1-p)^{(\beta+n-k)-1}}{B(\alpha+k, \beta+n-k)}

Looking at the result, the posterior distribution also becomes a Beta distribution with parameters α' = α + k and β'= β+ (n-k), taking the form of the prior Beta distribution.

3.2 The Elegance of the Analytical Solution

This is the power of the Conjugate Prior (the posterior distribution, combining the Beta distribution containing prior knowledge and the Binomial distribution which is the evidence data distribution, converges back to a Beta distribution). The update is completed by simply adding the signal to the existing value without complex integration operations. In computer science terms, this is a constant time operation with a computational complexity of O(1). This is the reason why there is almost no server load even when processing thousands or tens of thousands of orders in real-time—the basis for the proposition "The calculation is as light as a feather, but the result is as heavy as a rock."

4. Technical Justification: Why the 'Beta-Binomial Model' for This Problem?

The technical values possessed by Deep Learning Bayesian and MCMC (Markov Chain Monte Carlo) are core assets of modern data science. However, every tool has an optimal usage where its capabilities can be maximized.

For example, when calculating the on-time delivery probability of a Purchase Order (PO) through the Exa Bayesian engine, the MCMC simulation model is very effective. This is because the MCMC model is capable of large-scale Batch calculations and can sophisticatedly reflect not only average normal delivery data but also so-called 'Outlier' data such as 'delivery delays.'

Ultimately, the flexibility to select and apply the optimal model according to the complex variables in the field is paramount, and the importance of such appropriate model utilization cannot be overstated.

4.1 Roles of MCMC and Deep Learning Bayesian

MCMC is excellent for approximating high-dimensional probability distributions where thousands of variables are intertwined. Deep Learning-based Bayesian is essential for extracting complex patterns from unstructured data (images, voice, etc.). These are powerful solutions that find the answer through numerous simulations and sampling.

A(x^*, x_t) = \min \left( 1, \frac{P(x^*)g(x_t|x^*)}{P(x_t)g(x^*|x_t)} \right)

(MCMC sample acceptance probability formula: Requires tens of thousands of iterations)

4.2 Unique Strengths of the Beta-Binomial Model

On the other hand, in domains with clear targets of 'success and failure' like sales success rate prediction, the Analytical Solution provided by the Beta-Binomial model becomes the 'Golden Standard.'

Real-time: Immediate response is possible without heavy sampling.
Explainability: It can clearly explain why the probability changed through the increase or decrease of α and β.

We would use Deep Learning and MCMC for more complex problems, but at this point where rapid business decision-making is required, we have chosen this most clear and elegant method.

5. Revolution of Architecture: Recursive Bayesian Estimation

In an era of exploding data, reloading 'all past data' every time is inefficient. The engine of this model adopts a Recursive architecture that focuses on the 'essence of information.'

This is the deepest root of this model;

All past meeting logs are already perfectly compressed (Compression) into just two numbers, α and β, of the current state (the posterior distribution updated by the combination of prior knowledge and data evidence). When a new signal comes in, the system simply adds the signal to the current state instead of rummaging through past logs.

The Principle of NASA's Orbit Correction and Self-Driving Car's Real-Time Location Correction

This theory shares the exact same mathematical lineage as the Kalman Filter, which tracked the position of spacecraft in NASA's Apollo program as a technique to infer state in real-time whenever data comes in sequentially.

Traditional statistics start analysis "after all data is gathered," but Recursive Bayesian makes judgments "as soon as information occurs." This is the most rigorous algorithm for managing uncertainty in ERP environments where real-time performance is vital.

When Mathematics Becomes a Tool for Business

Through [Appendix Part 1], we have seen the mathematical order hidden beneath the massive iceberg of the Bayesian engine.

The Beta distribution is a vessel that holds your experience.
The Binomial distribution is a filter that accepts hot signals from the field.
And through the blessing of the Conjugate Prior, the system derives the most accurate conviction in the lightest way.

This is not a simple statistical tool. It is a 'Decision Compass' that tracks and guides your business sophisticatedly like a spacecraft orbit.

[Next Teaser: Part 2]

Why does the probability drop on days of 'Silence' when no data comes in?

Next time, it is time to examine the inside of the 'Paradox of Silence and Log Weights' from the perspective of Information Theory.

BA02.[Appendix 1] The Bayesian Engine: Mathematical Alchemy for Managing Uncertainty

Replacing Intuition with Mathematical Language

1. Quantification of Experience: 'Beta Distribution' as a Prior Distribution

1.1 Mathematical Definition

1.2 Interpretation

2. Signals from the Field: 'Binomial Distribution' as a Likelihood Function

2.1 Mathematical Definition

2.2 Weight of Evidence (WoE)

2.3 Interpretation

3. Combination of Knowledge: The Magic of the 'Conjugate Prior'

3.1 Mathematical Combination (Posterior Update)

3.2 The Elegance of the Analytical Solution

4. Technical Justification: Why the 'Beta-Binomial Model' for This Problem?

4.1 Roles of MCMC and Deep Learning Bayesian

4.2 Unique Strengths of the Beta-Binomial Model

5. Revolution of Architecture: Recursive Bayesian Estimation

When Mathematics Becomes a Tool for Business

Bayesian EXAWin-Rate Forecaster

Comments0

More in Bayesian

BA024. The Evolution of EXAWin Bayesian Engine: The Day Data Tuned Its Own Parameters

BA025. Finding the Optimal Boundary — The Math of Grid Search and Youden's J

BA026. Consensus of the Particles — The Math of MCMC Ensembles and Cross-Validation

제품

지식체계

블로그 / 연구

회사