flightrisk.models¶
Risk track¶
- class flightrisk.models.risk.lightgbm_model.LightGBMRiskParams(objective='binary', learning_rate=0.03, num_leaves=95, max_depth=-1, min_data_in_leaf=250, feature_fraction=0.8, bagging_fraction=0.8, bagging_freq=5, lambda_l2=1.5, n_estimators=1500, early_stopping_rounds=80, verbose=-1)[source]¶
Bases:
objectHyperparameters for the LightGBM risk learner.
- Parameters:
objective (str) – LightGBM objective. Defaults to
"binary".learning_rate (float) – Boosting learning rate.
num_leaves (int) – Maximum number of leaves per tree.
max_depth (int) – Maximum tree depth (
-1means unbounded).min_data_in_leaf (int) – Minimum samples per leaf.
feature_fraction (float) – Column subsample ratio per tree.
bagging_fraction (float) – Row subsample ratio per iteration.
bagging_freq (int) – Iterations between bagging refreshes.
lambda_l2 (float) – L2 regularisation strength.
n_estimators (int) – Maximum boosting rounds.
early_stopping_rounds (int) – Patience used when a validation set is given.
verbose (int) – LightGBM verbosity flag.
- class flightrisk.models.risk.lightgbm_model.LightGBMRiskModel(params=None)[source]¶
Bases:
objectThin wrapper around
lightgbm.LGBMClassifierfor the risk track.The wrapper records the feature ordering used at fit time so prediction matches the training schema even when the upstream feature builder shuffles columns or drops auxiliary IDs.
- Parameters:
params (LightGBMRiskParams | None)
- property estimator: LGBMClassifier¶
Return the underlying fitted estimator.
- Returns:
The fitted
lightgbm.LGBMClassifier.- Raises:
RuntimeError – If the model has not been fitted.
- fit(X, y, *, X_val=None, y_val=None, categorical_features=None)[source]¶
Fit the model with optional early stopping on a validation slice.
- Parameters:
- Returns:
selffor chaining.- Return type:
- predict_proba(X)[source]¶
Predict the positive-class probability.
- Parameters:
X (DataFrame) – Features whose columns must include the ones seen at fit time.
- Returns:
1-D array of probabilities in
[0, 1].- Raises:
RuntimeError – If the model has not been fitted.
- Return type:
- class flightrisk.models.risk.xgboost_model.XGBoostRiskParams(objective='binary:logistic', eta=0.05, max_depth=6, min_child_weight=5.0, subsample=0.85, colsample_bytree=0.85, reg_lambda=1.0, n_estimators=800, early_stopping_rounds=50, tree_method='hist', eval_metric='logloss')[source]¶
Bases:
objectHyperparameters for the XGBoost risk learner.
- Parameters:
objective (str) – XGBoost objective. Defaults to
"binary:logistic".eta (float) – Boosting learning rate.
max_depth (int) – Maximum tree depth.
min_child_weight (float) – Minimum sum of instance weights per leaf.
subsample (float) – Row subsample ratio per iteration.
colsample_bytree (float) – Column subsample ratio per tree.
reg_lambda (float) – L2 regularisation strength.
n_estimators (int) – Maximum boosting rounds.
early_stopping_rounds (int) – Patience used when a validation set is given.
tree_method (str) –
"hist"is fast and works on CPU and GPU.eval_metric (str) – Default evaluation metric.
- class flightrisk.models.risk.xgboost_model.XGBoostRiskModel(params=None)[source]¶
Bases:
objectThin wrapper around
xgboost.XGBClassifierfor the risk track.- Parameters:
params (XGBoostRiskParams | None)
- property estimator: XGBClassifier¶
Return the underlying fitted estimator.
- Returns:
The fitted
xgboost.XGBClassifier.- Raises:
RuntimeError – If the model has not been fitted.
- fit(X, y, *, X_val=None, y_val=None)[source]¶
Fit the model with optional early stopping on a validation slice.
- Parameters:
- Returns:
selffor chaining.- Return type:
- predict_proba(X)[source]¶
Predict the positive-class probability.
- Parameters:
X (DataFrame) – Features whose columns must include the ones seen at fit time.
- Returns:
1-D array of probabilities in
[0, 1].- Raises:
RuntimeError – If the model has not been fitted.
- Return type:
- class flightrisk.models.risk.calibration.CalibratedRiskModel(base, *, method='isotonic')[source]¶
Bases:
objectWrap a probabilistic model with isotonic or Platt calibration.
The wrapped model’s
predict_probais fit on a held-out validation slice so the calibrator does not see training rows.- Parameters:
base (_ProbaModel)
method (CalibrationMethod)
- fit(X_val, y_val)[source]¶
Fit the calibrator on a held-out validation slice.
- Parameters:
X_val (DataFrame) – Validation features.
y_val (ndarray) – Validation labels (0/1).
- Returns:
selffor chaining.- Return type:
- predict_proba(X)[source]¶
Return calibrated probabilities for
X.- Parameters:
X (DataFrame) – Features compatible with
self.base.- Returns:
1-D array of calibrated probabilities.
- Raises:
RuntimeError – If the calibrator has not been fitted.
- Return type:
- class flightrisk.models.risk.trainer.RiskTrainingResult(model, metrics, calibration, feature_names=<factory>)[source]¶
Bases:
objectBundle of artifacts produced by
train_risk_model().- Parameters:
model (object) – The fitted estimator (calibrated when
calibrationis set).metrics (RiskMetrics) – Evaluation metrics on the test slice.
calibration (DataFrame) – Calibration table on the test slice.
feature_names (list[str]) – Feature ordering used at fit time.
- metrics: RiskMetrics¶
- calibration: DataFrame¶
- flightrisk.models.risk.trainer.train_risk_model(features, labels, *, train_idx, val_idx, test_idx, estimator='lightgbm', calibration='isotonic')[source]¶
Train a risk model end to end with optional calibration on validation.
- Parameters:
features (DataFrame) – Wide feature frame (with optional
msnoID column).labels (Series | ndarray) – Aligned 0/1 churn labels.
train_idx (ndarray) – Row positions used for training.
val_idx (ndarray) – Row positions used for early stopping and calibration.
test_idx (ndarray) – Row positions held out for final metrics.
estimator (str) –
"lightgbm"or"xgboost".calibration (Literal['isotonic', 'platt'] | None) –
"isotonic","platt", orNoneto skip.
- Returns:
A
RiskTrainingResultwith the fitted model and metrics.- Return type:
- flightrisk.models.risk.trainer.save_calibration_plot(table, *, output)[source]¶
Render and save a reliability diagram from a calibration table.
- Parameters:
table (DataFrame) – Output of
flightrisk.eval.metrics.calibration_table().output (Path) – Destination PNG path.
- Returns:
The path written.
- Return type:
Survival track¶
- class flightrisk.models.survival.labels.SurvivalLabels(msno, duration_days, event_observed)[source]¶
Bases:
objectRight-censored survival labels keyed by
msno.- Parameters:
msno (Series) – Subscriber ID.
duration_days (Series) – Time from cutoff until event or censoring, in days.
event_observed (Series) – 1 if churn was observed before
horizon_days; 0 if the user was still active at horizon (right-censored).
- msno: Series¶
- duration_days: Series¶
- event_observed: Series¶
- flightrisk.models.survival.labels.build_survival_labels(transactions, *, cutoff, horizon_days=90)[source]¶
Convert KKBox transactions into right-censored survival labels.
For each user the event time is the first
is_cancel == 1transaction aftercutoff; if none occurs withinhorizon_days, the user is censored at the horizon. Users with no post-cutoff transactions are also censored at the horizon.- Parameters:
transactions (DataFrame) – Validated transactions frame.
cutoff (Timestamp) – Build cutoff. The clock starts here.
horizon_days (int) – Maximum follow-up window in days.
- Returns:
A
SurvivalLabelsbundle.- Raises:
ValueError – If
horizon_daysis non-positive.- Return type:
- flightrisk.models.survival.labels.to_structured_array(durations, events)[source]¶
Pack durations and event indicators into the structured array sksurv expects.
- class flightrisk.models.survival.cox_model.CoxParams(penalizer=0.01, l1_ratio=0.0)[source]¶
Bases:
objectHyperparameters for the Cox proportional-hazards fitter.
- Parameters:
- class flightrisk.models.survival.cox_model.CoxSurvivalModel(params=None)[source]¶
Bases:
objectLifelines Cox PH wrapper that returns hazards on a fixed schema.
- Parameters:
params (CoxParams | None)
- fit(X, durations, events)[source]¶
Fit the Cox model.
- Parameters:
- Returns:
selffor chaining.- Return type:
- property fitter: CoxPHFitter¶
Return the fitted lifelines fitter.
- Returns:
The
lifelines.CoxPHFitterinstance.- Raises:
RuntimeError – If the model has not been fitted.
- survival_function(X, *, times=None)[source]¶
Return survival probabilities S(t) for each row in
X.- Parameters:
X (DataFrame) – Feature frame with the columns seen at fit time.
times (ndarray | None) – Optional times at which to evaluate S(t).
- Returns:
Frame indexed by time with one column per row in
X.- Raises:
RuntimeError – If the model has not been fitted.
- Return type:
DataFrame
- class flightrisk.models.survival.rsf_model.RSFParams(n_estimators=400, max_depth=None, min_samples_leaf=20, min_samples_split=40, max_features='sqrt', n_jobs=-1, random_state=1337)[source]¶
Bases:
objectHyperparameters for the Random Survival Forest.
- Parameters:
n_estimators (int) – Number of trees in the forest.
max_depth (int | None) – Maximum tree depth (
Nonemeans unbounded).min_samples_leaf (int) – Minimum samples per leaf.
min_samples_split (int) – Minimum samples per split.
max_features (str) – Number of features considered at each split.
n_jobs (int) – Parallelism level (
-1means use all cores).random_state (int) – Seed for tree bagging.
- class flightrisk.models.survival.rsf_model.RSFSurvivalModel(params=None)[source]¶
Bases:
objectWrapper around
sksurv.ensemble.RandomSurvivalForest.- Parameters:
params (RSFParams | None)
- fit(X, durations, events)[source]¶
Fit the random survival forest.
- Parameters:
- Returns:
selffor chaining.- Return type:
- property estimator: RandomSurvivalForest¶
Return the fitted estimator.
- Raises:
RuntimeError – If the model has not been fitted.
- Returns:
The fitted
RandomSurvivalForest.
- predict_risk(X)[source]¶
Return the model’s risk score (cumulative hazard) per row.
- Parameters:
X (DataFrame) – Features compatible with the fit.
- Returns:
1-D array of risk scores; higher means earlier predicted churn.
- Return type:
- class flightrisk.models.survival.trainer.SurvivalTrainingResult(model, metrics, horizons_days, survival_at_horizons, feature_names=<factory>)[source]¶
Bases:
objectBundle returned by
train_survival_model().- Parameters:
- metrics: SurvivalMetrics¶
- flightrisk.models.survival.trainer.train_survival_model(features, durations, events, *, train_idx, test_idx, estimator='rsf', horizons_days=(30, 60, 90))[source]¶
Train a survival model end to end.
- Parameters:
features (DataFrame) – Wide feature frame (with optional
msnoID column).durations (ndarray) – Aligned follow-up times in days.
events (ndarray) – Aligned 0/1 event indicators.
train_idx (ndarray) – Row positions used for fitting.
test_idx (ndarray) – Row positions used for evaluation.
estimator (Literal['cox', 'rsf']) –
"cox"or"rsf".horizons_days (tuple[int, ...]) – Horizons evaluated for the metric suite.
- Returns:
A
SurvivalTrainingResultwith the fitted model.- Return type:
Uplift track¶
- class flightrisk.models.uplift.meta_learners.LightGBMUpliftParams(objective='binary', learning_rate=0.04, num_leaves=95, min_data_in_leaf=100, n_estimators=800, feature_fraction=0.8, verbose=-1)[source]¶
Bases:
objectHyperparameters for the LightGBM base learners used inside meta-learners.
- Parameters:
objective (str) –
"binary"for outcomes that are 0/1.learning_rate (float) – Boosting learning rate.
num_leaves (int) – Maximum number of leaves per tree.
min_data_in_leaf (int) – Minimum samples per leaf.
n_estimators (int) – Boosting rounds.
feature_fraction (float) – Column subsample ratio.
verbose (int) – LightGBM verbosity flag.
- class flightrisk.models.uplift.meta_learners.TLearner(params=None)[source]¶
Bases:
objectT-learner: one outcome model per treatment arm; uplift is their difference.
- Parameters:
params (LightGBMUpliftParams | None)
- fit(X, treatment, outcome)[source]¶
Fit the two outcome models.
- Parameters:
- Returns:
selffor chaining.- Raises:
ValueError – If either arm has no rows.
- Return type:
- predict_uplift(X)[source]¶
Return per-row uplift estimates
E[Y|T=1] - E[Y|T=0].- Parameters:
X (DataFrame) – Feature frame.
- Returns:
1-D array of uplift estimates.
- Raises:
RuntimeError – If the model has not been fitted.
- Return type:
- class flightrisk.models.uplift.meta_learners.XLearner(params=None)[source]¶
Bases:
objectX-learner: combines outcome models with treatment-imputation residuals.
The variant implemented here uses a propensity-weighted average of the treated-side and control-side imputed treatment effects. The propensity is estimated from the data unless provided.
- Parameters:
params (LightGBMUpliftParams | None)
- fit(X, treatment, outcome)[source]¶
Fit the X-learner stack.
- Parameters:
- Returns:
selffor chaining.- Raises:
ValueError – If either arm has no rows.
- Return type:
- predict_uplift(X)[source]¶
Return per-row uplift via propensity-weighted X-learner formula.
- Parameters:
X (DataFrame) – Feature frame.
- Returns:
1-D array of uplift estimates.
- Raises:
RuntimeError – If the model has not been fitted.
- Return type:
- class flightrisk.models.uplift.meta_learners.DRLearner(params=None)[source]¶
Bases:
objectDoubly-robust learner: treatment-effect regression on AIPW pseudo-outcomes.
- Parameters:
params (LightGBMUpliftParams | None)
- fit(X, treatment, outcome)[source]¶
Fit the DR-learner stack.
- Parameters:
- Returns:
selffor chaining.- Raises:
ValueError – If either arm has no rows.
- Return type:
- predict_uplift(X)[source]¶
Return per-row uplift estimates.
- Parameters:
X (DataFrame) – Feature frame.
- Returns:
1-D array of uplift estimates.
- Raises:
RuntimeError – If the model has not been fitted.
- Return type:
- class flightrisk.models.uplift.causal_forest.CausalForestParams(n_estimators=200, min_samples_leaf=30, max_depth=None, random_state=1337, discrete_treatment=True)[source]¶
Bases:
objectHyperparameters for the econml
CausalForestDMLwrapper.- Parameters:
- class flightrisk.models.uplift.causal_forest.CausalForestUpliftModel(params=None)[source]¶
Bases:
objectWrap
econml.dml.CausalForestDMLwith a uniform uplift API.- Parameters:
params (CausalForestParams | None)
- fit(X, treatment, outcome)[source]¶
Fit the causal forest with default outcome and treatment nuisance models.
- Parameters:
- Returns:
selffor chaining.- Return type:
- predict_uplift(X)[source]¶
Return per-row CATE estimates.
- Parameters:
X (DataFrame) – Feature frame.
- Returns:
1-D array of treatment-effect estimates.
- Raises:
RuntimeError – If the model has not been fitted.
- Return type:
- class flightrisk.models.uplift.trainer.UpliftTrainingResult(model, metrics, qini_curve, uplift_test, feature_names=<factory>)[source]¶
Bases:
objectBundle returned by
train_uplift_model().- Parameters:
model (object) – The fitted estimator.
metrics (UpliftMetrics) – Aggregated uplift metrics.
qini_curve (DataFrame) – Frame describing the Qini curve on the test slice.
uplift_test (ndarray) – Per-row uplift on the test slice.
feature_names (list[str]) – Feature ordering used at fit time.
- metrics: UpliftMetrics¶
- qini_curve: DataFrame¶
- flightrisk.models.uplift.trainer.train_uplift_model(features, treatment, outcome, *, train_idx, test_idx, estimator='t_learner', k_percentiles=(10, 20, 30))[source]¶
Train an uplift model end to end on the Orange Belgium RCT.
- Parameters:
features (DataFrame) – Wide feature frame.
treatment (Series | ndarray) – 0/1 treatment indicator.
outcome (Series | ndarray) – 0/1 outcome indicator.
train_idx (ndarray) – Row positions used for fitting.
test_idx (ndarray) – Row positions used for evaluation.
estimator (Literal['t_learner', 'x_learner', 'dr_learner', 'causal_forest']) – One of the supported uplift estimators.
k_percentiles (tuple[int, ...]) – Top-percentiles for
uplift_at_k.
- Returns:
An
UpliftTrainingResultwith the fitted model and metrics.- Return type:
- flightrisk.models.uplift.trainer.save_qini_plot(curve, *, output)[source]¶
Render and save a Qini curve from the curve frame.
- Parameters:
curve (DataFrame) – Output of
flightrisk.eval.uplift_metrics.qini_curve().output (Path) – Destination PNG path.
- Returns:
The path written.
- Return type: