flightrisk.eval

class flightrisk.eval.metrics.RiskMetrics(auc, pr_auc, brier, ece, decile_lift)[source]

Bases: object

Metrics emitted by the risk track.

Parameters:
  • auc (float) – Area under the ROC curve.

  • pr_auc (float) – Average precision (area under the precision-recall curve).

  • brier (float) – Brier score (lower is better).

  • ece (float) – Expected calibration error.

  • decile_lift (float) – Lift in the top decile relative to the overall churn rate.

auc: float
pr_auc: float
brier: float
ece: float
decile_lift: float
as_dict()[source]

Return the metrics as a plain dict for MLflow logging.

Returns:

Mapping from metric name to value.

Return type:

Mapping[str, float]

flightrisk.eval.metrics.expected_calibration_error(y_true, y_prob, *, n_bins=20)[source]

Compute the expected calibration error with equal-width probability bins.

Parameters:
  • y_true (ndarray) – Binary ground truth (0/1).

  • y_prob (ndarray) – Predicted probabilities in [0, 1].

  • n_bins (int) – Number of equal-width probability bins.

Returns:

ECE in [0, 1].

Raises:

ValueError – If shapes mismatch or n_bins < 1.

Return type:

float

flightrisk.eval.metrics.lift_at_decile(y_true, y_prob, *, decile=1)[source]

Compute the lift in the top decile (1 = top 10%) versus base rate.

Parameters:
  • y_true (ndarray) – Binary ground truth.

  • y_prob (ndarray) – Predicted probabilities.

  • decile (int) – Decile rank, 1 for the top 10%, 2 for the top 20%, etc.

Returns:

Ratio of in-decile churn rate to overall churn rate.

Raises:

ValueError – If decile falls outside [1, 10].

Return type:

float

flightrisk.eval.metrics.risk_metrics(y_true, y_prob, *, n_calibration_bins=20)[source]

Compute the standard suite of risk metrics in one call.

Parameters:
  • y_true (ndarray) – Binary ground truth.

  • y_prob (ndarray) – Predicted probabilities.

  • n_calibration_bins (int) – Number of bins for ECE.

Returns:

A RiskMetrics bundle.

Return type:

RiskMetrics

flightrisk.eval.metrics.calibration_table(y_true, y_prob, *, n_bins=20)[source]

Return a per-bin calibration table for plotting reliability diagrams.

Parameters:
  • y_true (ndarray) – Binary ground truth.

  • y_prob (ndarray) – Predicted probabilities.

  • n_bins (int) – Number of equal-width bins.

Returns:

Frame with columns bin, count, mean_predicted, empirical_rate.

Return type:

DataFrame

class flightrisk.eval.survival_metrics.SurvivalMetrics(c_index, time_dependent_auc_mean, integrated_brier)[source]

Bases: object

Metrics emitted by the survival track.

Parameters:
  • c_index (float) – Harrell’s concordance on censored data.

  • time_dependent_auc_mean (float) – Mean of time-dependent AUC across horizons.

  • integrated_brier (float) – Integrated Brier score across horizons.

c_index: float
time_dependent_auc_mean: float
integrated_brier: float
as_dict()[source]

Return the metrics as a plain dict for MLflow logging.

NaN entries are dropped so MLflow’s metric store accepts the call when the IPCW estimator could not be computed (e.g. degenerate censoring).

Returns:

Mapping from metric name to finite value.

Return type:

Mapping[str, float]

flightrisk.eval.survival_metrics.survival_metrics(*, train_durations, train_events, test_durations, test_events, risk_scores, survival_at_horizons, horizons)[source]

Compute the survival metric suite.

Parameters:
  • train_durations (ndarray) – Train follow-up times.

  • train_events (ndarray) – Train 0/1 event indicators.

  • test_durations (ndarray) – Test follow-up times.

  • test_events (ndarray) – Test 0/1 event indicators.

  • risk_scores (ndarray) – Per-row risk score on test (higher = earlier event).

  • survival_at_horizons (ndarray) – (n_samples, len(horizons)) matrix of S(t).

  • horizons (ndarray) – 1-D array of horizons matching survival_at_horizons.

Returns:

A SurvivalMetrics bundle.

Return type:

SurvivalMetrics

class flightrisk.eval.uplift_metrics.UpliftMetrics(qini, auuc, uplift_at_k)[source]

Bases: object

Metrics emitted by the uplift track.

Parameters:
  • qini (float) – Qini coefficient (area between Qini curve and random line).

  • auuc (float) – Area under the uplift curve.

  • uplift_at_k (Mapping[int, float]) – Mapping from k (percent) to estimated uplift in the top-k% slice.

qini: float
auuc: float
uplift_at_k: Mapping[int, float]
as_dict()[source]

Return the metrics as a flat dict for MLflow logging.

Returns:

Mapping of scalar metrics, with one entry per k.

Return type:

Mapping[str, float]

flightrisk.eval.uplift_metrics.qini_curve(uplift, treatment, outcome)[source]

Build the Qini curve in cumulative form.

Parameters:
  • uplift (ndarray) – Per-row uplift score.

  • treatment (ndarray) – 0/1 treatment indicator.

  • outcome (ndarray) – 0/1 outcome.

Returns:

Frame with rank, cum_treated_outcomes, cum_control_outcomes, cum_treated_n, cum_control_n, qini columns.

Return type:

DataFrame

flightrisk.eval.uplift_metrics.qini_coefficient(uplift, treatment, outcome)[source]

Return the Qini coefficient against the random-targeting baseline.

Parameters:
  • uplift (ndarray) – Per-row uplift score.

  • treatment (ndarray) – 0/1 treatment indicator.

  • outcome (ndarray) – 0/1 outcome.

Returns:

Float Qini coefficient.

Return type:

float

flightrisk.eval.uplift_metrics.auuc(uplift, treatment, outcome)[source]

Compute the area under the uplift curve.

Parameters:
  • uplift (ndarray) – Per-row uplift score.

  • treatment (ndarray) – 0/1 treatment indicator.

  • outcome (ndarray) – 0/1 outcome.

Returns:

AUUC normalised by the number of observations.

Return type:

float

flightrisk.eval.uplift_metrics.uplift_at_k(uplift, treatment, outcome, *, k_percent)[source]

Estimate the uplift in the top-k% ranked slice.

Parameters:
  • uplift (ndarray) – Per-row uplift score.

  • treatment (ndarray) – 0/1 treatment indicator.

  • outcome (ndarray) – 0/1 outcome.

  • k_percent (float) – Top-percentile cutoff in (0, 100].

Returns:

Difference between treated outcome rate and control outcome rate within the top slice; nan if either arm is empty in the slice.

Raises:

ValueError – If k_percent is outside (0, 100].

Return type:

float

flightrisk.eval.uplift_metrics.uplift_metrics(*, uplift, treatment, outcome, k_percentiles=(10, 20, 30))[source]

Compute the standard uplift metric suite in one call.

Parameters:
  • uplift (ndarray) – Per-row uplift score.

  • treatment (ndarray) – 0/1 treatment indicator.

  • outcome (ndarray) – 0/1 outcome.

  • k_percentiles (tuple[int, ...]) – Top-percentiles for uplift_at_k.

Returns:

An UpliftMetrics bundle.

Return type:

UpliftMetrics

class flightrisk.eval.simulator.SimulatorConfig(cost_per_treated=5.0, revenue_per_retained=50.0, bootstrap_iters=1000, alpha=0.05, random_state=1337)[source]

Bases: object

Configuration for the campaign ROI simulator.

Parameters:
  • cost_per_treated (float) – Marginal cost of treating one customer.

  • revenue_per_retained (float) – Revenue captured by retaining one customer.

  • bootstrap_iters (int) – Number of bootstrap resamples for confidence bands.

  • alpha (float) – Two-sided significance level for the confidence interval.

  • random_state (int) – Seed used by the bootstrap resampler.

cost_per_treated: float = 5.0
revenue_per_retained: float = 50.0
bootstrap_iters: int = 1000
alpha: float = 0.05
random_state: int = 1337
class flightrisk.eval.simulator.SimulationResult(budget, n_targeted, expected_retained, expected_revenue, ci_lower, ci_upper, policy)[source]

Bases: object

Outcome of a budgeted targeting policy simulation.

Parameters:
  • budget (float) – Budget cap in dollars.

  • n_targeted (int) – Number of customers actually treated under the policy.

  • expected_retained (float) – Mean retained customers across bootstrap samples.

  • expected_revenue (float) – Mean retained revenue net of treatment cost.

  • ci_lower (float) – Lower bound of the bootstrap CI for revenue.

  • ci_upper (float) – Upper bound of the bootstrap CI for revenue.

  • policy (str) – Identifier ("risk", "uplift", "random").

budget: float
n_targeted: int
expected_retained: float
expected_revenue: float
ci_lower: float
ci_upper: float
policy: str
as_dict()[source]

Return the result as a flat dict suitable for MLflow.

Returns:

Mapping of fields to values.

Return type:

Mapping[str, float | int | str]

flightrisk.eval.simulator.simulate_risk_policy(*, risk_scores, treatment_lift, base_outcome, budget, config=None)[source]

Simulate a top-by-P(churn) campaign at the given budget.

The policy treats the top-k customers by risk_scores where k is determined by budget / cost_per_treated. Each treated customer’s retention probability is increased by treatment_lift (clipped to [0, 1 - base_outcome]); each untreated customer is assumed to retain at base_outcome.

Parameters:
  • risk_scores (ndarray) – Per-row predicted churn probability (Track A output).

  • treatment_lift (ndarray) – Per-row uplift in retention if treated (e.g. RCT average lift; can be a constant repeated).

  • base_outcome (ndarray) – Baseline retention probability per customer.

  • budget (float) – Total dollars available.

  • config (SimulatorConfig | None) – Optional simulator configuration.

Returns:

A SimulationResult with bootstrapped CI bands.

Return type:

SimulationResult

flightrisk.eval.simulator.simulate_uplift_policy(*, uplift_scores, treatment_lift, base_outcome, budget, config=None)[source]

Simulate a top-by-uplift campaign at the given budget.

The policy treats the top-k customers by uplift_scores (Track C output).

Parameters:
  • uplift_scores (ndarray) – Per-row uplift estimate.

  • treatment_lift (ndarray) – Per-row uplift in retention if treated.

  • base_outcome (ndarray) – Baseline retention probability per customer.

  • budget (float) – Total dollars available.

  • config (SimulatorConfig | None) – Optional simulator configuration.

Returns:

A SimulationResult with bootstrapped CI bands.

Return type:

SimulationResult

flightrisk.eval.simulator.simulate_random_policy(*, n, treatment_lift, base_outcome, budget, config=None)[source]

Simulate a uniformly random targeting policy at the given budget.

Parameters:
  • n (int) – Population size.

  • treatment_lift (ndarray) – Per-row uplift in retention if treated.

  • base_outcome (ndarray) – Baseline retention probability per customer.

  • budget (float) – Total dollars available.

  • config (SimulatorConfig | None) – Optional simulator configuration.

Returns:

A SimulationResult with bootstrapped CI bands.

Return type:

SimulationResult

flightrisk.eval.simulator.compare_policies(*, risk_scores, uplift_scores, treatment_lift, base_outcome, budgets, config=None)[source]

Simulate risk, uplift, and random policies across multiple budgets.

Parameters:
  • risk_scores (ndarray) – Per-row Track A scores.

  • uplift_scores (ndarray) – Per-row Track C scores.

  • treatment_lift (ndarray) – Per-row retention lift if treated.

  • base_outcome (ndarray) – Baseline retention probability per customer.

  • budgets (tuple[float, ...]) – Budgets to evaluate.

  • config (SimulatorConfig | None) – Optional simulator configuration.

Returns:

Tidy frame ready for plotting.

Return type:

DataFrame

flightrisk.eval.plots.save_roi_chart(comparison, *, output)[source]

Render the headline ROI chart comparing policies across budgets.

Parameters:
Returns:

The path written.

Return type:

Path