CrossPredict Crossvalidation

CrossVal

class crosspredict.crossval.CrossLightgbmModel(iterator, params, feature_name, col_target, cols_cat='auto', num_boost_round=99999, early_stopping_rounds=50, valid=True, random_state=0, cross_target_encoder=None)[source]
class crosspredict.crossval.CrossXgboostModel(iterator, params, feature_name, col_target, cols_cat='auto', num_boost_round=99999, early_stopping_rounds=50, valid=True, random_state=0, cross_target_encoder=None)[source]

CrossPredict TargetEncoder

CrossVal

CrossPredict Iterator

class crosspredict.iterator.Iterator(n_splits=5, n_repeats=1, random_state=0, col_target=None, col_client=None, cv_byclient=False)[source]

K-Fold data with 3 different crossvalidation strategies:

  • crossvalidation by users (RepeatedKFold) should pass col_client and cv_byclient=True

  • stratified crossvalidation by target column (RepeatedStratifiedKFold) should pass col_target and cv_byclient=False

  • simple crossvalidation (RepeatedKFold) should pass col_target=None and cv_byclient=False

Parameters
  • n_splits (int, default=5) – Number of folds. Must be at least 2.

  • n_repeats (int, default=1) – Number of times cross-validator needs to be repeated.

  • random_state (int or RandomState instance, default=0) – Pass an int for reproducible output across multiple function calls.

  • col_target (str, default=None) – Column name for stratified crossvalidation

  • col_client (str, default=None) – Column name for crossvalidation by users

  • cv_byclient (bool, default=False) – flag if “crossvalidation by users” is needed

__init__(n_splits=5, n_repeats=1, random_state=0, col_target=None, col_client=None, cv_byclient=False)[source]
Parameters
  • n_splits – int, default=5 Number of folds. Must be at least 2.

  • n_repeats – int, default=1 Number of times cross-validator needs to be repeated.

  • random_state – int or RandomState instance, default=0 Pass an int for reproducible output across multiple function calls.

  • col_target – str, default=None Column name for stratified crossvalidation

  • col_client – str, default=None Column name for crossvalidation by users

  • cv_byclient – bool, default=False flag if “crossvalidation by users” is needed

CrossPredict ReportBinary

Report Binary

class crosspredict.report_binary.ReportBinary(cols_score, cols_target, col_generation_apps=None, col_generation_deals=None)[source]

Makes report for binary classification problem :type cols_score: List[str] :param cols_score: List[str] List of column names with model probabilities :type cols_target: List[str] :param cols_target: List[str] List of column names with true binary labels :type col_generation_apps: Optional[str] :param col_generation_apps: str Column name with month of event date (for PSI calculation, does not need true labels in all rows) :type col_generation_deals: Optional[str] :param col_generation_deals: str Column name with month of event date (for metric calculation, only for data with true labels)

__init__(cols_score, cols_target, col_generation_apps=None, col_generation_deals=None)[source]

Initialize self. See help(type(self)) for accurate signature.

fit(df)[source]

Precalculates metrics and statistics for given pd.DataFrame :type df: DataFrame :param df: pd.DataFrame :rtype: ReportBinary :return: self class

plot_report(report_shape, report=None, cols_score=None, cols_target=None)[source]

Plots report of given configuration.

Parameters
Returns