flightrisk.eval¶

class flightrisk.eval.metrics.RiskMetrics(auc, pr_auc, brier, ece, decile_lift)[source]¶

Bases: object

Metrics emitted by the risk track.

Parameters:

auc (float) – Area under the ROC curve.
pr_auc (float) – Average precision (area under the precision-recall curve).
brier (float) – Brier score (lower is better).
ece (float) – Expected calibration error.
decile_lift (float) – Lift in the top decile relative to the overall churn rate.

auc: float¶

pr_auc: float¶

brier: float¶

ece: float¶

decile_lift: float¶

as_dict()[source]¶

Return the metrics as a plain dict for MLflow logging.

Returns:: Mapping from metric name to value.
Return type:: Mapping[str, float]

flightrisk.eval.metrics.expected_calibration_error(y_true, y_prob, *, n_bins=20)[source]¶

Compute the expected calibration error with equal-width probability bins.

Parameters:

y_true (ndarray) – Binary ground truth (0/1).
y_prob (ndarray) – Predicted probabilities in [0, 1].
n_bins (int) – Number of equal-width probability bins.

Returns:

ECE in [0, 1].

Raises:

ValueError – If shapes mismatch or n_bins < 1.

Return type:

float

flightrisk.eval.metrics.lift_at_decile(y_true, y_prob, *, decile=1)[source]¶

Compute the lift in the top decile (1 = top 10%) versus base rate.

Parameters:

y_true (ndarray) – Binary ground truth.
y_prob (ndarray) – Predicted probabilities.
decile (int) – Decile rank, 1 for the top 10%, 2 for the top 20%, etc.

Returns:

Ratio of in-decile churn rate to overall churn rate.

Raises:

ValueError – If decile falls outside [1, 10].

Return type:

float

flightrisk.eval.metrics.risk_metrics(y_true, y_prob, *, n_calibration_bins=20)[source]¶

Compute the standard suite of risk metrics in one call.

Parameters:

y_true (ndarray) – Binary ground truth.
y_prob (ndarray) – Predicted probabilities.
n_calibration_bins (int) – Number of bins for ECE.

Returns:

A RiskMetrics bundle.

Return type:

RiskMetrics

flightrisk.eval.metrics.calibration_table(y_true, y_prob, *, n_bins=20)[source]¶

Return a per-bin calibration table for plotting reliability diagrams.

Parameters:

y_true (ndarray) – Binary ground truth.
y_prob (ndarray) – Predicted probabilities.
n_bins (int) – Number of equal-width bins.

Returns:

Frame with columns bin, count, mean_predicted, empirical_rate.

Return type:

DataFrame

class flightrisk.eval.survival_metrics.SurvivalMetrics(c_index, time_dependent_auc_mean, integrated_brier)[source]¶

Bases: object

Metrics emitted by the survival track.

Parameters:

c_index (float) – Harrell’s concordance on censored data.
time_dependent_auc_mean (float) – Mean of time-dependent AUC across horizons.
integrated_brier (float) – Integrated Brier score across horizons.

c_index: float¶

time_dependent_auc_mean: float¶

integrated_brier: float¶

as_dict()[source]¶

Return the metrics as a plain dict for MLflow logging.

NaN entries are dropped so MLflow’s metric store accepts the call when the IPCW estimator could not be computed (e.g. degenerate censoring).

Returns:: Mapping from metric name to finite value.
Return type:: Mapping[str, float]

flightrisk.eval.survival_metrics.survival_metrics(*, train_durations, train_events, test_durations, test_events, risk_scores, survival_at_horizons, horizons)[source]¶

Compute the survival metric suite.

Parameters:

train_durations (ndarray) – Train follow-up times.
train_events (ndarray) – Train 0/1 event indicators.
test_durations (ndarray) – Test follow-up times.
test_events (ndarray) – Test 0/1 event indicators.
risk_scores (ndarray) – Per-row risk score on test (higher = earlier event).
survival_at_horizons (ndarray) – (n_samples, len(horizons)) matrix of S(t).
horizons (ndarray) – 1-D array of horizons matching survival_at_horizons.

Returns:

A SurvivalMetrics bundle.

Return type:

SurvivalMetrics

class flightrisk.eval.uplift_metrics.UpliftMetrics(qini, auuc, uplift_at_k)[source]¶

Bases: object

Metrics emitted by the uplift track.

Parameters:

qini (float) – Qini coefficient (area between Qini curve and random line).
auuc (float) – Area under the uplift curve.
uplift_at_k (Mapping[int, float]) – Mapping from k (percent) to estimated uplift in the top-k% slice.

qini: float¶

auuc: float¶

uplift_at_k: Mapping[int, float]¶

as_dict()[source]¶

Return the metrics as a flat dict for MLflow logging.

Returns:: Mapping of scalar metrics, with one entry per k.
Return type:: Mapping[str, float]

flightrisk.eval.uplift_metrics.qini_curve(uplift, treatment, outcome)[source]¶

Build the Qini curve in cumulative form.

Parameters:

uplift (ndarray) – Per-row uplift score.
treatment (ndarray) – 0/1 treatment indicator.
outcome (ndarray) – 0/1 outcome.

Returns:

Frame with rank, cum_treated_outcomes, cum_control_outcomes, cum_treated_n, cum_control_n, qini columns.

Return type:

DataFrame

flightrisk.eval.uplift_metrics.qini_coefficient(uplift, treatment, outcome)[source]¶

Return the Qini coefficient against the random-targeting baseline.

Parameters:

uplift (ndarray) – Per-row uplift score.
treatment (ndarray) – 0/1 treatment indicator.
outcome (ndarray) – 0/1 outcome.

Returns:

Float Qini coefficient.

Return type:

float

flightrisk.eval.uplift_metrics.auuc(uplift, treatment, outcome)[source]¶

Compute the area under the uplift curve.

Parameters:

uplift (ndarray) – Per-row uplift score.
treatment (ndarray) – 0/1 treatment indicator.
outcome (ndarray) – 0/1 outcome.

Returns:

AUUC normalised by the number of observations.

Return type:

float

flightrisk.eval.uplift_metrics.uplift_at_k(uplift, treatment, outcome, *, k_percent)[source]¶

Estimate the uplift in the top-k% ranked slice.

Parameters:

uplift (ndarray) – Per-row uplift score.
treatment (ndarray) – 0/1 treatment indicator.
outcome (ndarray) – 0/1 outcome.
k_percent (float) – Top-percentile cutoff in (0, 100].

Returns:

Difference between treated outcome rate and control outcome rate within the top slice; nan if either arm is empty in the slice.

Raises:

ValueError – If k_percent is outside (0, 100].

Return type:

float

flightrisk.eval.uplift_metrics.uplift_metrics(*, uplift, treatment, outcome, k_percentiles=(10, 20, 30))[source]¶

Compute the standard uplift metric suite in one call.

Parameters:

uplift (ndarray) – Per-row uplift score.
treatment (ndarray) – 0/1 treatment indicator.
outcome (ndarray) – 0/1 outcome.
k_percentiles (tuple[int, ...]) – Top-percentiles for uplift_at_k.

Returns:

An UpliftMetrics bundle.

Return type:

UpliftMetrics

class flightrisk.eval.simulator.SimulatorConfig(cost_per_treated=5.0, revenue_per_retained=50.0, bootstrap_iters=1000, alpha=0.05, random_state=1337)[source]¶

Bases: object

Configuration for the campaign ROI simulator.

Parameters:

cost_per_treated (float) – Marginal cost of treating one customer.
revenue_per_retained (float) – Revenue captured by retaining one customer.
bootstrap_iters (int) – Number of bootstrap resamples for confidence bands.
alpha (float) – Two-sided significance level for the confidence interval.
random_state (int) – Seed used by the bootstrap resampler.

cost_per_treated: float = 5.0¶

revenue_per_retained: float = 50.0¶

bootstrap_iters: int = 1000¶

alpha: float = 0.05¶

random_state: int = 1337¶

class flightrisk.eval.simulator.SimulationResult(budget, n_targeted, expected_retained, expected_revenue, ci_lower, ci_upper, policy)[source]¶

Bases: object

Outcome of a budgeted targeting policy simulation.

Parameters:

budget (float) – Budget cap in dollars.
n_targeted (int) – Number of customers actually treated under the policy.
expected_retained (float) – Mean retained customers across bootstrap samples.
expected_revenue (float) – Mean retained revenue net of treatment cost.
ci_lower (float) – Lower bound of the bootstrap CI for revenue.
ci_upper (float) – Upper bound of the bootstrap CI for revenue.
policy (str) – Identifier ("risk", "uplift", "random").

budget: float¶

n_targeted: int¶

expected_retained: float¶

expected_revenue: float¶

ci_lower: float¶

ci_upper: float¶

policy: str¶

as_dict()[source]¶

Return the result as a flat dict suitable for MLflow.

Returns:: Mapping of fields to values.
Return type:: Mapping[str, float | int | str]

flightrisk.eval.simulator.simulate_risk_policy(*, risk_scores, treatment_lift, base_outcome, budget, config=None)[source]¶

Simulate a top-by-P(churn) campaign at the given budget.

The policy treats the top-k customers by risk_scores where k is determined by budget / cost_per_treated. Each treated customer’s retention probability is increased by treatment_lift (clipped to [0, 1 - base_outcome]); each untreated customer is assumed to retain at base_outcome.

Parameters:

risk_scores (ndarray) – Per-row predicted churn probability (Track A output).
treatment_lift (ndarray) – Per-row uplift in retention if treated (e.g. RCT average lift; can be a constant repeated).
base_outcome (ndarray) – Baseline retention probability per customer.
budget (float) – Total dollars available.
config (SimulatorConfig | None) – Optional simulator configuration.

Returns:

A SimulationResult with bootstrapped CI bands.

Return type:

SimulationResult

flightrisk.eval.simulator.simulate_uplift_policy(*, uplift_scores, treatment_lift, base_outcome, budget, config=None)[source]¶

Simulate a top-by-uplift campaign at the given budget.

The policy treats the top-k customers by uplift_scores (Track C output).

Parameters:

uplift_scores (ndarray) – Per-row uplift estimate.
treatment_lift (ndarray) – Per-row uplift in retention if treated.
base_outcome (ndarray) – Baseline retention probability per customer.
budget (float) – Total dollars available.
config (SimulatorConfig | None) – Optional simulator configuration.

Returns:

A SimulationResult with bootstrapped CI bands.

Return type:

SimulationResult

flightrisk.eval.simulator.simulate_random_policy(*, n, treatment_lift, base_outcome, budget, config=None)[source]¶

Simulate a uniformly random targeting policy at the given budget.

Parameters:

n (int) – Population size.
treatment_lift (ndarray) – Per-row uplift in retention if treated.
base_outcome (ndarray) – Baseline retention probability per customer.
budget (float) – Total dollars available.
config (SimulatorConfig | None) – Optional simulator configuration.

Returns:

A SimulationResult with bootstrapped CI bands.

Return type:

SimulationResult

flightrisk.eval.simulator.compare_policies(*, risk_scores, uplift_scores, treatment_lift, base_outcome, budgets, config=None)[source]¶

Simulate risk, uplift, and random policies across multiple budgets.

Parameters:

risk_scores (ndarray) – Per-row Track A scores.
uplift_scores (ndarray) – Per-row Track C scores.
treatment_lift (ndarray) – Per-row retention lift if treated.
base_outcome (ndarray) – Baseline retention probability per customer.
budgets (tuple[float, ...]) – Budgets to evaluate.
config (SimulatorConfig | None) – Optional simulator configuration.

Returns:

Tidy frame ready for plotting.

Return type:

DataFrame

flightrisk.eval.plots.save_roi_chart(comparison, *, output)[source]¶

Render the headline ROI chart comparing policies across budgets.

Parameters:

comparison (DataFrame) – Output of flightrisk.eval.simulator.compare_policies().
output (Path) – Destination PNG path.

Returns:

The path written.

Return type:

Path