flightrisk.features

flightrisk.features.time_helpers.yyyymmdd_to_datetime(values)[source]

Convert KKBox YYYYMMDD integer dates to datetime64[ns].

Invalid or out-of-range integers become pandas.NaT.

Parameters:

values (Series) – Integer-valued series in YYYYMMDD form.

Returns:

Datetime series of the same length.

Return type:

Series

flightrisk.features.time_helpers.days_since(reference, anchor)[source]

Return the number of days between each timestamp and a fixed anchor.

Negative values mean reference lies after anchor.

Parameters:
  • reference (Series) – Datetime series.

  • anchor (Timestamp) – Reference timestamp.

Returns:

Float series of day deltas (NaN where reference is NaT).

Return type:

Series

flightrisk.features.time_helpers.restrict_before(frame, *, date_col, cutoff)[source]

Drop rows whose date_col is on or after cutoff.

Used to enforce no-future-leakage at feature build time.

Parameters:
  • frame (DataFrame) – Source frame.

  • date_col (str) – Name of the datetime column in frame.

  • cutoff (Timestamp) – Strict upper bound (rows >= are dropped).

Returns:

A copy of frame containing only past rows.

Return type:

DataFrame

class flightrisk.features.kkbox.KKBoxFeatureConfig(windows_days=(7, 30, 90), include_rolling_diversity=True, include_session_stats=True, include_payment_dynamics=True)[source]

Bases: object

Configuration for KKBox feature engineering.

Parameters:
  • windows_days (tuple[int, ...]) – Rolling windows used to aggregate listening behaviour.

  • include_rolling_diversity (bool) – If true, add unique-artist diversity proxies.

  • include_session_stats (bool) – If true, include session-length statistics.

  • include_payment_dynamics (bool) – If true, include charge deltas and refund counts.

windows_days: tuple[int, ...] = (7, 30, 90)
include_rolling_diversity: bool = True
include_session_stats: bool = True
include_payment_dynamics: bool = True
flightrisk.features.kkbox.member_features(members, *, cutoff)[source]

Derive tenure-style features from the KKBox member master table.

Parameters:
  • members (DataFrame) – Validated member frame.

  • cutoff (Timestamp) – Build cutoff used to compute tenure.

Returns:

One row per msno with tenure and registration features.

Return type:

DataFrame

flightrisk.features.kkbox.transaction_features(transactions, *, cutoff, include_payment_dynamics=True)[source]

Aggregate transaction history up to cutoff per user.

Parameters:
  • transactions (DataFrame) – Validated transactions frame.

  • cutoff (Timestamp) – Build cutoff; later transactions are dropped to avoid leakage.

  • include_payment_dynamics (bool) – Include charge deltas and refund counts.

Returns:

One row per msno with aggregated payment features.

Return type:

DataFrame

flightrisk.features.kkbox.listening_features(user_logs, *, cutoff, config=None)[source]

Build rolling listening features from KKBox daily logs.

Parameters:
  • user_logs (DataFrame) – Validated daily log frame.

  • cutoff (Timestamp) – Build cutoff; later logs are dropped to avoid leakage.

  • config (KKBoxFeatureConfig | None) – Optional feature configuration.

Returns:

One row per msno with windowed listening features and a days_since_last_login column.

Return type:

DataFrame

flightrisk.features.kkbox.derived_features(frame)[source]

Add interaction, ratio, and log-transformed features to the matrix.

These are cheap, deterministic combinations of base features that tend to move tree-based models on subscription churn data:

  • log1p of long-tailed counters (tenure, plays, songs)

  • recent-vs-lifetime engagement ratios

  • spend efficiency (actual / list) and a charge_drop flag

  • recency band indicators (never_logged_in, inactive_30d)

  • tenure x auto-renew interaction (loyalty proxy)

Parameters:

frame (DataFrame) – A frame produced by build_kkbox_feature_matrix().

Returns:

A copy of frame with extra columns appended.

Return type:

DataFrame

flightrisk.features.kkbox.build_kkbox_feature_matrix(members, transactions, user_logs, *, cutoff, config=None)[source]

Combine member, transaction, and listening features into one frame.

Parameters:
  • members (DataFrame) – Validated member master table.

  • transactions (DataFrame) – Validated transactions frame.

  • user_logs (DataFrame) – Validated daily log frame.

  • cutoff (Timestamp) – Build cutoff used to enforce no-leakage joins.

  • config (KKBoxFeatureConfig | None) – Optional feature configuration.

Returns:

One row per msno ready to merge with labels or hazard times.

Return type:

DataFrame

class flightrisk.features.orange.OrangeFeatureConfig(drop_low_variance_threshold=0.0, fill_strategy='median')[source]

Bases: object

Configuration for Orange Belgium feature post-processing.

Parameters:
  • drop_low_variance_threshold (float) – Minimum variance to keep a column. Set to 0 to disable filtering.

  • fill_strategy (str) – How to fill missing numeric values: "median" or "zero".

drop_low_variance_threshold: float = 0.0
fill_strategy: str = 'median'
flightrisk.features.orange.prepare_orange_features(features, *, config=None)[source]

Prepare the 178 Orange Belgium features for downstream models.

The dataset already provides anonymised, model-ready columns, so the pipeline is intentionally minimal: drop low-variance columns and impute missing values to keep tree models happy.

Parameters:
Returns:

A new frame ready for fitting. The original is not modified.

Raises:

ValueError – If fill_strategy is unknown.

Return type:

DataFrame

class flightrisk.features.pipeline.KKBoxFeatureBundle(features, labels, cutoff)[source]

Bases: object

Output of the KKBox feature pipeline.

Parameters:
  • features (DataFrame) – Wide feature matrix keyed by msno.

  • labels (DataFrame) – Churn labels aligned to features by msno.

  • cutoff (Timestamp) – Build cutoff used to assemble the matrix.

features: DataFrame
labels: DataFrame
cutoff: Timestamp
flightrisk.features.pipeline.build_kkbox_bundle(artifacts, *, cutoff, config=None)[source]

Run the full KKBox feature pipeline and align it with labels.

Parameters:
Returns:

A KKBoxFeatureBundle ready for modeling.

Return type:

KKBoxFeatureBundle

class flightrisk.features.pipeline.OrangeFeatureBundle(features, treatment, outcome)[source]

Bases: object

Output of the Orange Belgium feature pipeline.

Parameters:
  • features (DataFrame) – Cleaned feature frame.

  • treatment (Series) – 0/1 treatment column from the original RCT.

  • outcome (Series) – 0/1 outcome column.

features: DataFrame
treatment: Series
outcome: Series
flightrisk.features.pipeline.build_orange_bundle(artifacts, *, config=None)[source]

Prepare the Orange Belgium dataset for uplift modelling.

Parameters:
Returns:

An OrangeFeatureBundle.

Return type:

OrangeFeatureBundle

flightrisk.features.pipeline.write_kkbox_bundle(bundle, *, name='kkbox')[source]

Persist a KKBox bundle to data/features/{name}/.

Parameters:
Returns:

The directory the bundle was written to.

Return type:

Path

flightrisk.features.pipeline.write_orange_bundle(bundle, *, name='orange')[source]

Persist an Orange Belgium bundle to data/features/{name}/.

Parameters:
Returns:

The directory the bundle was written to.

Return type:

Path