DOCUMENTATION

Auto-Tuner Anatomy ②: Signal Lift — Is This Signal Really Meaningful?

Dissecting how Signal Lift measures the real-world discriminative power of each signal. Lift formula, Laplace smoothing, dynamic minimum appearance thresholds, and mismatch detection explained at the code level.

In the previous part: ① Engine Overview, we explored the Auto-Tuner's overall structure. This part answers the first question: "Is this signal really more common in Won projects?"

1. What is Lift?

1.1 Core Question

Signal Master classifies signals as Positive (Game Changer, Strong Affirmation, etc.) or Negative (Strong Negation, Weak Negation, etc.) based on domain expertise. But is this trust justified?

If "technical fit confirmed" appears equally in Won and Lost projects, it provides no discriminative power — even if it's classified as a Strong Affirmation.

Signal Lift quantifies this by measuring actual discriminative power from historical data.

1.2 Formula

\text{Lift}(s) = \frac{P(s \mid \text{Won})}{P(s \mid \text{Lost})}

$P(s \mid \text{Won})$ = How frequently signal $s$ appeared in Won projects
$P(s \mid \text{Lost})$ = How frequently signal $s$ appeared in Lost projects

This is mathematically equivalent to a Bayes Factor:

\text{BF} = \frac{P(D \mid H_1)}{P(D \mid H_0)}

"Given that this signal was observed, how much stronger is the evidence for Won compared to Lost?" — this is the exact question a Bayes Factor answers.

1.3 Interpretation

Lift	Jeffreys' Scale	Interpretation
> 10	Decisive	Overwhelmingly associated with Won
3 ~ 10	Strong	Strongly associated with Won
1 ~ 3	Moderate	Slight association with Won
≈ 1	None	No discriminative power
< 1	Reverse	Actually appears more in Lost

2. Concrete Example

2.1 Data

A company has 10 Won and 15 Lost completed projects. For each signal, we count the number of projects where it appeared:

Signal	Won (10)	Lost (15)	P(s\|Won)	P(s\|Lost)	Lift
Technical Fit Confirmed	8	3	0.80	0.20	4.00
Budget Secured	6	4	0.60	0.27	2.25
Competitor Presence	7	10	0.70	0.67	1.05
Decision Maker Absent	2	9	0.20	0.60	0.33

2.2 Interpretation

Technical Fit Confirmed (4.00): Projects with this signal won 4× more often. This signal is genuinely meaningful.
Budget Secured (2.25): Meaningful, but not decisive.
Competitor Presence (1.05): Lift ≈ 1 → No discriminative power. Useless at differentiating Win/Lose.
Decision Maker Absent (0.33): Appears 3× more in Lost. If currently classified as Positive, this is a classification error.

3. Laplace Smoothing — Preventing Division by Zero

3.1 The Problem

If "Game Changer" appeared in 5 Won projects but 0 Lost projects:

P(s \mid \text{Lost}) = \frac{0}{15} = 0

\text{Lift} = \frac{0.50}{0} = \text{undefined (division by zero)}

3.2 The Solution: Laplace Smoothing

def smoothed_rate(count, total)
  (count + 1.0) / (total + 2.0)
end

Adding 1 to the numerator and 2 to the denominator:

Value	Before Smoothing	After Smoothing
Won rate	5/10 = 0.500	6/12 = 0.500
Lost rate	0/15 = 0.000	1/17 = 0.059
Lift	∞	0.500/0.059 = 8.50

Smoothing prevents infinite values while having minimal impact when sufficient data exists. With n=1000, the +1 and +2 cause only a 0.1% difference.

3.3 Why "+1" and "+2"?

This comes from the uniform prior (Beta(1,1)) of Bayesian inference. Viewing it as "adding one virtual success and one virtual failure" naturally connects to the Bayesian framework. It is the most widely used smoothing method, also known as "add-one smoothing."

4. Dynamic Minimum Appearance

4.1 The Problem

If signal "X" appeared in only 1 Won project and 0 Lost projects, Lift = ∞ (after smoothing, approximately 8.5). But the sample size of 1 cannot guarantee statistical validity. This might be entirely coincidental.

4.2 Phase-Based Thresholds

Auto-Tuner requires a minimum number of appearances:

SIGNAL_MIN_APPEARANCES = {
  2 => 3,
  3 => 5,
  4 => 8,
  5 => 10
}.freeze

Phase	Min Appearances	Rationale
2	3	Even with sparse data, at least 3 observations
3	5	Basic statistical test possible
4	8	Sufficient for pattern convergence
5	10	Stringent standard

Signals that fall below the minimum threshold receive a Lift of nil and are excluded from GridSearch targeting.

4.3 Why Phase-Dependent?

With scarce data, you must reference whatever information is available (even Lift from 3 appearances is better than nothing). With abundant data, stricter criteria can be demanded. This represents the adaptive balance between Data Humility and Ambition.

5. Classification Mismatch Detection

5.1 What is a Mismatch?

If signal "Positive Meeting Atmosphere" is classified as Moderate Affirmation (α increases), but in reality it appeared in 3 Won and 9 Lost projects:

\text{Lift} = \frac{3/10}{9/15} = \frac{0.30}{0.60} = 0.50

Lift < 1 yet classified as Positive → This is a mismatch.

5.2 Why Does This Happen?

Signal name vs. actual usage gap: The real-world context of "meeting atmosphere was positive" by the sales team might not match
Environmental change: A signal that was once meaningful becomes meaningless due to changes in market conditions
Insufficient sample: May resolve naturally as more data accumulates

5.3 Mismatch Report

def detect_mismatches(lift_results)
  mismatches = []

  lift_results.each do |signal_name, data|
    expected_positive = data[:impact_type].positive?
    actual_positive = data[:lift] > 1.0

    if expected_positive != actual_positive && data[:total_appearances] >= min_appearances
      mismatches << {
        signal: signal_name,
        classified_as: expected_positive ? 'Positive' : 'Negative',
        actual_lift: data[:lift],
        interpretation: expected_positive ?
          "Classified as positive but Lift < 1 — appears more in Lost" :
          "Classified as negative but Lift > 1 — appears more in Won"
      }
    end
  end

  mismatches
end

5.4 What Does the User Do?

Mismatches appear as warning alerts in the Auto-Tuner report. The administrator decides whether to:

Reclassify the signal — e.g., change from Moderate Affirmation to Moderate Negative
Maintain current classification — if the mismatch is believed to be temporary (due to data insufficiency)
Remove the signal — retire it from the system if not meaningful

6. Signal Lift Summary

         Lift > 1            Lift ≈ 1             Lift < 1
     ┌───────────┐       ┌───────────┐       ┌───────────┐
     │  Appears   │       │  Equally  │       │  Appears  │
     │  more in   │       │  in both  │       │  more in  │
     │    Won     │       │           │       │   Lost    │
     └─────┬─────┘       └─────┬─────┘       └─────┬─────┘
           │                   │                     │
     ✅ As Expected       ⚠️ No Power         ❌ Mismatch?

Signal Lift is the Auto-Tuner's first task. Before optimizing parameters, you must first confirm whether each signal is truly meaningful. Only then can the subsequent Grid Search, T optimization, and MCMC analysis deliver reliable results.

Next: ③ Grid Search Engine Anatomy — Phase-based dynamic ranges, simulations, and the mathematics of Compound Score.