flightrisk.features¶
- flightrisk.features.time_helpers.yyyymmdd_to_datetime(values)[source]¶
Convert KKBox
YYYYMMDDinteger dates todatetime64[ns].Invalid or out-of-range integers become
pandas.NaT.- Parameters:
values (Series) – Integer-valued series in
YYYYMMDDform.- Returns:
Datetime series of the same length.
- Return type:
Series
- flightrisk.features.time_helpers.days_since(reference, anchor)[source]¶
Return the number of days between each timestamp and a fixed anchor.
Negative values mean
referencelies afteranchor.- Parameters:
reference (Series) – Datetime series.
anchor (Timestamp) – Reference timestamp.
- Returns:
Float series of day deltas (NaN where
referenceis NaT).- Return type:
Series
- flightrisk.features.time_helpers.restrict_before(frame, *, date_col, cutoff)[source]¶
Drop rows whose
date_colis on or aftercutoff.Used to enforce no-future-leakage at feature build time.
- Parameters:
frame (DataFrame) – Source frame.
date_col (str) – Name of the datetime column in
frame.cutoff (Timestamp) – Strict upper bound (rows
>=are dropped).
- Returns:
A copy of
framecontaining only past rows.- Return type:
DataFrame
- class flightrisk.features.kkbox.KKBoxFeatureConfig(windows_days=(7, 30, 90), include_rolling_diversity=True, include_session_stats=True, include_payment_dynamics=True)[source]¶
Bases:
objectConfiguration for KKBox feature engineering.
- Parameters:
windows_days (tuple[int, ...]) – Rolling windows used to aggregate listening behaviour.
include_rolling_diversity (bool) – If true, add unique-artist diversity proxies.
include_session_stats (bool) – If true, include session-length statistics.
include_payment_dynamics (bool) – If true, include charge deltas and refund counts.
- flightrisk.features.kkbox.member_features(members, *, cutoff)[source]¶
Derive tenure-style features from the KKBox member master table.
- Parameters:
members (DataFrame) – Validated member frame.
cutoff (Timestamp) – Build cutoff used to compute tenure.
- Returns:
One row per
msnowith tenure and registration features.- Return type:
DataFrame
- flightrisk.features.kkbox.transaction_features(transactions, *, cutoff, include_payment_dynamics=True)[source]¶
Aggregate transaction history up to
cutoffper user.- Parameters:
transactions (DataFrame) – Validated transactions frame.
cutoff (Timestamp) – Build cutoff; later transactions are dropped to avoid leakage.
include_payment_dynamics (bool) – Include charge deltas and refund counts.
- Returns:
One row per
msnowith aggregated payment features.- Return type:
DataFrame
- flightrisk.features.kkbox.listening_features(user_logs, *, cutoff, config=None)[source]¶
Build rolling listening features from KKBox daily logs.
- Parameters:
user_logs (DataFrame) – Validated daily log frame.
cutoff (Timestamp) – Build cutoff; later logs are dropped to avoid leakage.
config (KKBoxFeatureConfig | None) – Optional feature configuration.
- Returns:
One row per
msnowith windowed listening features and adays_since_last_logincolumn.- Return type:
DataFrame
- flightrisk.features.kkbox.derived_features(frame)[source]¶
Add interaction, ratio, and log-transformed features to the matrix.
These are cheap, deterministic combinations of base features that tend to move tree-based models on subscription churn data:
log1pof long-tailed counters (tenure, plays, songs)recent-vs-lifetime engagement ratios
spend efficiency (
actual / list) and acharge_dropflagrecency band indicators (
never_logged_in,inactive_30d)tenure x auto-renew interaction (loyalty proxy)
- Parameters:
frame (DataFrame) – A frame produced by
build_kkbox_feature_matrix().- Returns:
A copy of
framewith extra columns appended.- Return type:
DataFrame
- flightrisk.features.kkbox.build_kkbox_feature_matrix(members, transactions, user_logs, *, cutoff, config=None)[source]¶
Combine member, transaction, and listening features into one frame.
- Parameters:
members (DataFrame) – Validated member master table.
transactions (DataFrame) – Validated transactions frame.
user_logs (DataFrame) – Validated daily log frame.
cutoff (Timestamp) – Build cutoff used to enforce no-leakage joins.
config (KKBoxFeatureConfig | None) – Optional feature configuration.
- Returns:
One row per
msnoready to merge with labels or hazard times.- Return type:
DataFrame
- class flightrisk.features.orange.OrangeFeatureConfig(drop_low_variance_threshold=0.0, fill_strategy='median')[source]¶
Bases:
objectConfiguration for Orange Belgium feature post-processing.
- Parameters:
- flightrisk.features.orange.prepare_orange_features(features, *, config=None)[source]¶
Prepare the 178 Orange Belgium features for downstream models.
The dataset already provides anonymised, model-ready columns, so the pipeline is intentionally minimal: drop low-variance columns and impute missing values to keep tree models happy.
- Parameters:
features (DataFrame) – Raw feature frame from
flightrisk.data.loaders.load_orange_belgium().config (OrangeFeatureConfig | None) – Optional feature configuration.
- Returns:
A new frame ready for fitting. The original is not modified.
- Raises:
ValueError – If
fill_strategyis unknown.- Return type:
DataFrame
- class flightrisk.features.pipeline.KKBoxFeatureBundle(features, labels, cutoff)[source]¶
Bases:
objectOutput of the KKBox feature pipeline.
- Parameters:
features (DataFrame) – Wide feature matrix keyed by
msno.labels (DataFrame) – Churn labels aligned to
featuresbymsno.cutoff (Timestamp) – Build cutoff used to assemble the matrix.
- features: DataFrame¶
- labels: DataFrame¶
- cutoff: Timestamp¶
- flightrisk.features.pipeline.build_kkbox_bundle(artifacts, *, cutoff, config=None)[source]¶
Run the full KKBox feature pipeline and align it with labels.
- Parameters:
artifacts (KKBoxArtifacts) – Validated raw frames from
flightrisk.data.loaders.load_kkbox().cutoff (str | Timestamp) – Build cutoff (string or
pandas.Timestamp).config (KKBoxFeatureConfig | None) – Optional feature configuration.
- Returns:
A
KKBoxFeatureBundleready for modeling.- Return type:
- class flightrisk.features.pipeline.OrangeFeatureBundle(features, treatment, outcome)[source]¶
Bases:
objectOutput of the Orange Belgium feature pipeline.
- Parameters:
features (DataFrame) – Cleaned feature frame.
treatment (Series) – 0/1 treatment column from the original RCT.
outcome (Series) – 0/1 outcome column.
- features: DataFrame¶
- treatment: Series¶
- outcome: Series¶
- flightrisk.features.pipeline.build_orange_bundle(artifacts, *, config=None)[source]¶
Prepare the Orange Belgium dataset for uplift modelling.
- Parameters:
artifacts (OrangeBelgiumArtifacts) – Validated raw frame from
flightrisk.data.loaders.load_orange_belgium().config (OrangeFeatureConfig | None) – Optional feature configuration.
- Returns:
- Return type:
- flightrisk.features.pipeline.write_kkbox_bundle(bundle, *, name='kkbox')[source]¶
Persist a KKBox bundle to
data/features/{name}/.- Parameters:
bundle (KKBoxFeatureBundle) – Bundle returned by
build_kkbox_bundle().name (str) – Subdirectory name under
data/features.
- Returns:
The directory the bundle was written to.
- Return type:
- flightrisk.features.pipeline.write_orange_bundle(bundle, *, name='orange')[source]¶
Persist an Orange Belgium bundle to
data/features/{name}/.- Parameters:
bundle (OrangeFeatureBundle) – Bundle returned by
build_orange_bundle().name (str) – Subdirectory name under
data/features.
- Returns:
The directory the bundle was written to.
- Return type: