flightrisk.models

Risk track

class flightrisk.models.risk.lightgbm_model.LightGBMRiskParams(objective='binary', learning_rate=0.03, num_leaves=95, max_depth=-1, min_data_in_leaf=250, feature_fraction=0.8, bagging_fraction=0.8, bagging_freq=5, lambda_l2=1.5, n_estimators=1500, early_stopping_rounds=80, verbose=-1)[source]

Bases: object

Hyperparameters for the LightGBM risk learner.

Parameters:
  • objective (str) – LightGBM objective. Defaults to "binary".

  • learning_rate (float) – Boosting learning rate.

  • num_leaves (int) – Maximum number of leaves per tree.

  • max_depth (int) – Maximum tree depth (-1 means unbounded).

  • min_data_in_leaf (int) – Minimum samples per leaf.

  • feature_fraction (float) – Column subsample ratio per tree.

  • bagging_fraction (float) – Row subsample ratio per iteration.

  • bagging_freq (int) – Iterations between bagging refreshes.

  • lambda_l2 (float) – L2 regularisation strength.

  • n_estimators (int) – Maximum boosting rounds.

  • early_stopping_rounds (int) – Patience used when a validation set is given.

  • verbose (int) – LightGBM verbosity flag.

objective: str = 'binary'
learning_rate: float = 0.03
num_leaves: int = 95
max_depth: int = -1
min_data_in_leaf: int = 250
feature_fraction: float = 0.8
bagging_fraction: float = 0.8
bagging_freq: int = 5
lambda_l2: float = 1.5
n_estimators: int = 1500
early_stopping_rounds: int = 80
verbose: int = -1
as_native()[source]

Return the params in the form LightGBM expects.

Returns:

Mapping consumable by lightgbm.LGBMClassifier.

Return type:

Mapping[str, Any]

class flightrisk.models.risk.lightgbm_model.LightGBMRiskModel(params=None)[source]

Bases: object

Thin wrapper around lightgbm.LGBMClassifier for the risk track.

The wrapper records the feature ordering used at fit time so prediction matches the training schema even when the upstream feature builder shuffles columns or drops auxiliary IDs.

Parameters:

params (LightGBMRiskParams | None)

property estimator: LGBMClassifier

Return the underlying fitted estimator.

Returns:

The fitted lightgbm.LGBMClassifier.

Raises:

RuntimeError – If the model has not been fitted.

fit(X, y, *, X_val=None, y_val=None, categorical_features=None)[source]

Fit the model with optional early stopping on a validation slice.

Parameters:
  • X (DataFrame) – Training features.

  • y (ndarray) – Training labels (0/1).

  • X_val (DataFrame | None) – Optional validation features.

  • y_val (ndarray | None) – Optional validation labels.

  • categorical_features (list[str] | None) – Optional categorical column names.

Returns:

self for chaining.

Return type:

LightGBMRiskModel

predict_proba(X)[source]

Predict the positive-class probability.

Parameters:

X (DataFrame) – Features whose columns must include the ones seen at fit time.

Returns:

1-D array of probabilities in [0, 1].

Raises:

RuntimeError – If the model has not been fitted.

Return type:

ndarray

class flightrisk.models.risk.xgboost_model.XGBoostRiskParams(objective='binary:logistic', eta=0.05, max_depth=6, min_child_weight=5.0, subsample=0.85, colsample_bytree=0.85, reg_lambda=1.0, n_estimators=800, early_stopping_rounds=50, tree_method='hist', eval_metric='logloss')[source]

Bases: object

Hyperparameters for the XGBoost risk learner.

Parameters:
  • objective (str) – XGBoost objective. Defaults to "binary:logistic".

  • eta (float) – Boosting learning rate.

  • max_depth (int) – Maximum tree depth.

  • min_child_weight (float) – Minimum sum of instance weights per leaf.

  • subsample (float) – Row subsample ratio per iteration.

  • colsample_bytree (float) – Column subsample ratio per tree.

  • reg_lambda (float) – L2 regularisation strength.

  • n_estimators (int) – Maximum boosting rounds.

  • early_stopping_rounds (int) – Patience used when a validation set is given.

  • tree_method (str) – "hist" is fast and works on CPU and GPU.

  • eval_metric (str) – Default evaluation metric.

objective: str = 'binary:logistic'
eta: float = 0.05
max_depth: int = 6
min_child_weight: float = 5.0
subsample: float = 0.85
colsample_bytree: float = 0.85
reg_lambda: float = 1.0
n_estimators: int = 800
early_stopping_rounds: int = 50
tree_method: str = 'hist'
eval_metric: str = 'logloss'
as_native()[source]

Return the params in the form XGBoost expects.

Returns:

Mapping consumable by xgboost.XGBClassifier.

Return type:

Mapping[str, Any]

class flightrisk.models.risk.xgboost_model.XGBoostRiskModel(params=None)[source]

Bases: object

Thin wrapper around xgboost.XGBClassifier for the risk track.

Parameters:

params (XGBoostRiskParams | None)

property estimator: XGBClassifier

Return the underlying fitted estimator.

Returns:

The fitted xgboost.XGBClassifier.

Raises:

RuntimeError – If the model has not been fitted.

fit(X, y, *, X_val=None, y_val=None)[source]

Fit the model with optional early stopping on a validation slice.

Parameters:
  • X (DataFrame) – Training features.

  • y (ndarray) – Training labels (0/1).

  • X_val (DataFrame | None) – Optional validation features.

  • y_val (ndarray | None) – Optional validation labels.

Returns:

self for chaining.

Return type:

XGBoostRiskModel

predict_proba(X)[source]

Predict the positive-class probability.

Parameters:

X (DataFrame) – Features whose columns must include the ones seen at fit time.

Returns:

1-D array of probabilities in [0, 1].

Raises:

RuntimeError – If the model has not been fitted.

Return type:

ndarray

class flightrisk.models.risk.calibration.CalibratedRiskModel(base, *, method='isotonic')[source]

Bases: object

Wrap a probabilistic model with isotonic or Platt calibration.

The wrapped model’s predict_proba is fit on a held-out validation slice so the calibrator does not see training rows.

Parameters:
  • base (_ProbaModel)

  • method (CalibrationMethod)

fit(X_val, y_val)[source]

Fit the calibrator on a held-out validation slice.

Parameters:
  • X_val (DataFrame) – Validation features.

  • y_val (ndarray) – Validation labels (0/1).

Returns:

self for chaining.

Return type:

CalibratedRiskModel

predict_proba(X)[source]

Return calibrated probabilities for X.

Parameters:

X (DataFrame) – Features compatible with self.base.

Returns:

1-D array of calibrated probabilities.

Raises:

RuntimeError – If the calibrator has not been fitted.

Return type:

ndarray

class flightrisk.models.risk.trainer.RiskTrainingResult(model, metrics, calibration, feature_names=<factory>)[source]

Bases: object

Bundle of artifacts produced by train_risk_model().

Parameters:
  • model (object) – The fitted estimator (calibrated when calibration is set).

  • metrics (RiskMetrics) – Evaluation metrics on the test slice.

  • calibration (DataFrame) – Calibration table on the test slice.

  • feature_names (list[str]) – Feature ordering used at fit time.

model: object
metrics: RiskMetrics
calibration: DataFrame
feature_names: list[str]
flightrisk.models.risk.trainer.train_risk_model(features, labels, *, train_idx, val_idx, test_idx, estimator='lightgbm', calibration='isotonic')[source]

Train a risk model end to end with optional calibration on validation.

Parameters:
  • features (DataFrame) – Wide feature frame (with optional msno ID column).

  • labels (Series | ndarray) – Aligned 0/1 churn labels.

  • train_idx (ndarray) – Row positions used for training.

  • val_idx (ndarray) – Row positions used for early stopping and calibration.

  • test_idx (ndarray) – Row positions held out for final metrics.

  • estimator (str) – "lightgbm" or "xgboost".

  • calibration (Literal['isotonic', 'platt'] | None) – "isotonic", "platt", or None to skip.

Returns:

A RiskTrainingResult with the fitted model and metrics.

Return type:

RiskTrainingResult

flightrisk.models.risk.trainer.save_calibration_plot(table, *, output)[source]

Render and save a reliability diagram from a calibration table.

Parameters:
Returns:

The path written.

Return type:

Path

Survival track

class flightrisk.models.survival.labels.SurvivalLabels(msno, duration_days, event_observed)[source]

Bases: object

Right-censored survival labels keyed by msno.

Parameters:
  • msno (Series) – Subscriber ID.

  • duration_days (Series) – Time from cutoff until event or censoring, in days.

  • event_observed (Series) – 1 if churn was observed before horizon_days; 0 if the user was still active at horizon (right-censored).

msno: Series
duration_days: Series
event_observed: Series
to_frame()[source]

Return the labels as a tidy frame.

Returns:

Frame with msno, duration_days, event_observed.

Return type:

DataFrame

flightrisk.models.survival.labels.build_survival_labels(transactions, *, cutoff, horizon_days=90)[source]

Convert KKBox transactions into right-censored survival labels.

For each user the event time is the first is_cancel == 1 transaction after cutoff; if none occurs within horizon_days, the user is censored at the horizon. Users with no post-cutoff transactions are also censored at the horizon.

Parameters:
  • transactions (DataFrame) – Validated transactions frame.

  • cutoff (Timestamp) – Build cutoff. The clock starts here.

  • horizon_days (int) – Maximum follow-up window in days.

Returns:

A SurvivalLabels bundle.

Raises:

ValueError – If horizon_days is non-positive.

Return type:

SurvivalLabels

flightrisk.models.survival.labels.to_structured_array(durations, events)[source]

Pack durations and event indicators into the structured array sksurv expects.

Parameters:
  • durations (ndarray) – Float durations.

  • events (ndarray) – Integer 0/1 event indicators.

Returns:

Structured numpy array with fields event (bool) and time.

Return type:

ndarray

class flightrisk.models.survival.cox_model.CoxParams(penalizer=0.01, l1_ratio=0.0)[source]

Bases: object

Hyperparameters for the Cox proportional-hazards fitter.

Parameters:
  • penalizer (float) – Elastic-net penalty strength.

  • l1_ratio (float) – Mix between L1 and L2 (0 = ridge, 1 = lasso).

penalizer: float = 0.01
l1_ratio: float = 0.0
class flightrisk.models.survival.cox_model.CoxSurvivalModel(params=None)[source]

Bases: object

Lifelines Cox PH wrapper that returns hazards on a fixed schema.

Parameters:

params (CoxParams | None)

fit(X, durations, events)[source]

Fit the Cox model.

Parameters:
  • X (DataFrame) – Numeric feature frame; non-numeric columns must be encoded upstream.

  • durations (ndarray) – Durations or follow-up times.

  • events (ndarray) – 0/1 event indicators.

Returns:

self for chaining.

Return type:

CoxSurvivalModel

property fitter: CoxPHFitter

Return the fitted lifelines fitter.

Returns:

The lifelines.CoxPHFitter instance.

Raises:

RuntimeError – If the model has not been fitted.

survival_function(X, *, times=None)[source]

Return survival probabilities S(t) for each row in X.

Parameters:
  • X (DataFrame) – Feature frame with the columns seen at fit time.

  • times (ndarray | None) – Optional times at which to evaluate S(t).

Returns:

Frame indexed by time with one column per row in X.

Raises:

RuntimeError – If the model has not been fitted.

Return type:

DataFrame

hazard_at_horizon(X, *, horizon_days)[source]

Return 1 - S(horizon) per row, the cumulative hazard at horizon.

Parameters:
  • X (DataFrame) – Feature frame compatible with the fit.

  • horizon_days (int) – Horizon at which to evaluate the hazard.

Returns:

Array of hazard probabilities in [0, 1].

Return type:

ndarray

class flightrisk.models.survival.rsf_model.RSFParams(n_estimators=400, max_depth=None, min_samples_leaf=20, min_samples_split=40, max_features='sqrt', n_jobs=-1, random_state=1337)[source]

Bases: object

Hyperparameters for the Random Survival Forest.

Parameters:
  • n_estimators (int) – Number of trees in the forest.

  • max_depth (int | None) – Maximum tree depth (None means unbounded).

  • min_samples_leaf (int) – Minimum samples per leaf.

  • min_samples_split (int) – Minimum samples per split.

  • max_features (str) – Number of features considered at each split.

  • n_jobs (int) – Parallelism level (-1 means use all cores).

  • random_state (int) – Seed for tree bagging.

n_estimators: int = 400
max_depth: int | None = None
min_samples_leaf: int = 20
min_samples_split: int = 40
max_features: str = 'sqrt'
n_jobs: int = -1
random_state: int = 1337
as_native()[source]

Return the params in the form scikit-survival expects.

Returns:

Mapping consumable by sksurv.ensemble.RandomSurvivalForest.

Return type:

Mapping[str, Any]

class flightrisk.models.survival.rsf_model.RSFSurvivalModel(params=None)[source]

Bases: object

Wrapper around sksurv.ensemble.RandomSurvivalForest.

Parameters:

params (RSFParams | None)

fit(X, durations, events)[source]

Fit the random survival forest.

Parameters:
  • X (DataFrame) – Numeric feature frame.

  • durations (ndarray) – Durations or follow-up times.

  • events (ndarray) – 0/1 event indicators.

Returns:

self for chaining.

Return type:

RSFSurvivalModel

property estimator: RandomSurvivalForest

Return the fitted estimator.

Raises:

RuntimeError – If the model has not been fitted.

Returns:

The fitted RandomSurvivalForest.

predict_risk(X)[source]

Return the model’s risk score (cumulative hazard) per row.

Parameters:

X (DataFrame) – Features compatible with the fit.

Returns:

1-D array of risk scores; higher means earlier predicted churn.

Return type:

ndarray

survival_at_times(X, times)[source]

Return S(t) evaluated at times for each row in X.

Parameters:
  • X (DataFrame) – Features compatible with the fit.

  • times (ndarray) – 1-D array of times.

Returns:

(n_samples, len(times)) array of survival probabilities.

Return type:

ndarray

hazard_at_horizon(X, *, horizon_days)[source]

Return 1 - S(horizon) per row.

Parameters:
  • X (DataFrame) – Features compatible with the fit.

  • horizon_days (int) – Horizon at which to evaluate the hazard.

Returns:

1-D array of hazard probabilities in [0, 1].

Return type:

ndarray

class flightrisk.models.survival.trainer.SurvivalTrainingResult(model, metrics, horizons_days, survival_at_horizons, feature_names=<factory>)[source]

Bases: object

Bundle returned by train_survival_model().

Parameters:
  • model (object) – The fitted estimator.

  • metrics (SurvivalMetrics) – Aggregated survival metrics.

  • horizons_days (ndarray) – Horizons evaluated.

  • survival_at_horizons (ndarray) – (n_test, len(horizons)) S(t) matrix.

  • feature_names (list[str]) – Feature ordering used at fit time.

model: object
metrics: SurvivalMetrics
horizons_days: ndarray
survival_at_horizons: ndarray
feature_names: list[str]
flightrisk.models.survival.trainer.train_survival_model(features, durations, events, *, train_idx, test_idx, estimator='rsf', horizons_days=(30, 60, 90))[source]

Train a survival model end to end.

Parameters:
  • features (DataFrame) – Wide feature frame (with optional msno ID column).

  • durations (ndarray) – Aligned follow-up times in days.

  • events (ndarray) – Aligned 0/1 event indicators.

  • train_idx (ndarray) – Row positions used for fitting.

  • test_idx (ndarray) – Row positions used for evaluation.

  • estimator (Literal['cox', 'rsf']) – "cox" or "rsf".

  • horizons_days (tuple[int, ...]) – Horizons evaluated for the metric suite.

Returns:

A SurvivalTrainingResult with the fitted model.

Return type:

SurvivalTrainingResult

flightrisk.models.survival.trainer.save_survival_curve_plot(survival_at_horizons, horizons_days, *, output, n_curves=30)[source]

Render a small selection of S(t) curves as a sanity-check plot.

Parameters:
  • survival_at_horizons (ndarray) – (n_samples, len(horizons)) matrix of S(t).

  • horizons_days (ndarray) – Matching horizons in days.

  • output (Path) – Destination PNG path.

  • n_curves (int) – Number of randomly-selected curves to draw.

Returns:

The path written.

Return type:

Path

Uplift track

class flightrisk.models.uplift.meta_learners.LightGBMUpliftParams(objective='binary', learning_rate=0.04, num_leaves=95, min_data_in_leaf=100, n_estimators=800, feature_fraction=0.8, verbose=-1)[source]

Bases: object

Hyperparameters for the LightGBM base learners used inside meta-learners.

Parameters:
  • objective (str) – "binary" for outcomes that are 0/1.

  • learning_rate (float) – Boosting learning rate.

  • num_leaves (int) – Maximum number of leaves per tree.

  • min_data_in_leaf (int) – Minimum samples per leaf.

  • n_estimators (int) – Boosting rounds.

  • feature_fraction (float) – Column subsample ratio.

  • verbose (int) – LightGBM verbosity flag.

objective: str = 'binary'
learning_rate: float = 0.04
num_leaves: int = 95
min_data_in_leaf: int = 100
n_estimators: int = 800
feature_fraction: float = 0.8
verbose: int = -1
class flightrisk.models.uplift.meta_learners.TLearner(params=None)[source]

Bases: object

T-learner: one outcome model per treatment arm; uplift is their difference.

Parameters:

params (LightGBMUpliftParams | None)

fit(X, treatment, outcome)[source]

Fit the two outcome models.

Parameters:
  • X (DataFrame) – Feature frame.

  • treatment (ndarray) – 0/1 treatment indicator.

  • outcome (ndarray) – 0/1 outcome.

Returns:

self for chaining.

Raises:

ValueError – If either arm has no rows.

Return type:

TLearner

predict_uplift(X)[source]

Return per-row uplift estimates E[Y|T=1] - E[Y|T=0].

Parameters:

X (DataFrame) – Feature frame.

Returns:

1-D array of uplift estimates.

Raises:

RuntimeError – If the model has not been fitted.

Return type:

ndarray

class flightrisk.models.uplift.meta_learners.XLearner(params=None)[source]

Bases: object

X-learner: combines outcome models with treatment-imputation residuals.

The variant implemented here uses a propensity-weighted average of the treated-side and control-side imputed treatment effects. The propensity is estimated from the data unless provided.

Parameters:

params (LightGBMUpliftParams | None)

fit(X, treatment, outcome)[source]

Fit the X-learner stack.

Parameters:
  • X (DataFrame) – Feature frame.

  • treatment (ndarray) – 0/1 treatment indicator.

  • outcome (ndarray) – 0/1 outcome.

Returns:

self for chaining.

Raises:

ValueError – If either arm has no rows.

Return type:

XLearner

predict_uplift(X)[source]

Return per-row uplift via propensity-weighted X-learner formula.

Parameters:

X (DataFrame) – Feature frame.

Returns:

1-D array of uplift estimates.

Raises:

RuntimeError – If the model has not been fitted.

Return type:

ndarray

class flightrisk.models.uplift.meta_learners.DRLearner(params=None)[source]

Bases: object

Doubly-robust learner: treatment-effect regression on AIPW pseudo-outcomes.

Parameters:

params (LightGBMUpliftParams | None)

fit(X, treatment, outcome)[source]

Fit the DR-learner stack.

Parameters:
  • X (DataFrame) – Feature frame.

  • treatment (ndarray) – 0/1 treatment indicator.

  • outcome (ndarray) – 0/1 outcome.

Returns:

self for chaining.

Raises:

ValueError – If either arm has no rows.

Return type:

DRLearner

predict_uplift(X)[source]

Return per-row uplift estimates.

Parameters:

X (DataFrame) – Feature frame.

Returns:

1-D array of uplift estimates.

Raises:

RuntimeError – If the model has not been fitted.

Return type:

ndarray

class flightrisk.models.uplift.causal_forest.CausalForestParams(n_estimators=200, min_samples_leaf=30, max_depth=None, random_state=1337, discrete_treatment=True)[source]

Bases: object

Hyperparameters for the econml CausalForestDML wrapper.

Parameters:
  • n_estimators (int) – Number of forest trees.

  • min_samples_leaf (int) – Minimum samples per leaf.

  • max_depth (int | None) – Maximum tree depth (None means unbounded).

  • random_state (int) – Seed for tree bagging.

  • discrete_treatment (bool) – Whether the treatment is binary/categorical.

n_estimators: int = 200
min_samples_leaf: int = 30
max_depth: int | None = None
random_state: int = 1337
discrete_treatment: bool = True
class flightrisk.models.uplift.causal_forest.CausalForestUpliftModel(params=None)[source]

Bases: object

Wrap econml.dml.CausalForestDML with a uniform uplift API.

Parameters:

params (CausalForestParams | None)

fit(X, treatment, outcome)[source]

Fit the causal forest with default outcome and treatment nuisance models.

Parameters:
  • X (DataFrame) – Feature frame.

  • treatment (ndarray) – 0/1 treatment indicator.

  • outcome (ndarray) – 0/1 outcome.

Returns:

self for chaining.

Return type:

CausalForestUpliftModel

predict_uplift(X)[source]

Return per-row CATE estimates.

Parameters:

X (DataFrame) – Feature frame.

Returns:

1-D array of treatment-effect estimates.

Raises:

RuntimeError – If the model has not been fitted.

Return type:

ndarray

class flightrisk.models.uplift.trainer.UpliftTrainingResult(model, metrics, qini_curve, uplift_test, feature_names=<factory>)[source]

Bases: object

Bundle returned by train_uplift_model().

Parameters:
  • model (object) – The fitted estimator.

  • metrics (UpliftMetrics) – Aggregated uplift metrics.

  • qini_curve (DataFrame) – Frame describing the Qini curve on the test slice.

  • uplift_test (ndarray) – Per-row uplift on the test slice.

  • feature_names (list[str]) – Feature ordering used at fit time.

model: object
metrics: UpliftMetrics
qini_curve: DataFrame
uplift_test: ndarray
feature_names: list[str]
flightrisk.models.uplift.trainer.train_uplift_model(features, treatment, outcome, *, train_idx, test_idx, estimator='t_learner', k_percentiles=(10, 20, 30))[source]

Train an uplift model end to end on the Orange Belgium RCT.

Parameters:
  • features (DataFrame) – Wide feature frame.

  • treatment (Series | ndarray) – 0/1 treatment indicator.

  • outcome (Series | ndarray) – 0/1 outcome indicator.

  • train_idx (ndarray) – Row positions used for fitting.

  • test_idx (ndarray) – Row positions used for evaluation.

  • estimator (Literal['t_learner', 'x_learner', 'dr_learner', 'causal_forest']) – One of the supported uplift estimators.

  • k_percentiles (tuple[int, ...]) – Top-percentiles for uplift_at_k.

Returns:

An UpliftTrainingResult with the fitted model and metrics.

Return type:

UpliftTrainingResult

flightrisk.models.uplift.trainer.save_qini_plot(curve, *, output)[source]

Render and save a Qini curve from the curve frame.

Parameters:
Returns:

The path written.

Return type:

Path