CrossPredict Crossvalidation¶

CrossVal

class crosspredict.crossval.CrossLightgbmModel(iterator, params, feature_name, col_target, cols_cat='auto', num_boost_round=99999, early_stopping_rounds=50, valid=True, random_state=0, cross_target_encoder=None)[source]¶

class crosspredict.crossval.CrossXgboostModel(iterator, params, feature_name, col_target, cols_cat='auto', num_boost_round=99999, early_stopping_rounds=50, valid=True, random_state=0, cross_target_encoder=None)[source]¶

CrossPredict TargetEncoder¶

CrossVal

CrossPredict Iterator¶

class crosspredict.iterator.Iterator(n_splits=5, n_repeats=1, random_state=0, col_target=None, col_client=None, cv_byclient=False)[source]¶

K-Fold data with 3 different crossvalidation strategies:

crossvalidation by users (RepeatedKFold) should pass col_client and cv_byclient=True
stratified crossvalidation by target column (RepeatedStratifiedKFold) should pass col_target and cv_byclient=False
simple crossvalidation (RepeatedKFold) should pass col_target=None and cv_byclient=False

Parameters

n_splits (int, default=5) – Number of folds. Must be at least 2.
n_repeats (int, default=1) – Number of times cross-validator needs to be repeated.
random_state (int or RandomState instance, default=0) – Pass an int for reproducible output across multiple function calls.
col_target (str, default=None) – Column name for stratified crossvalidation
col_client (str, default=None) – Column name for crossvalidation by users
cv_byclient (bool, default=False) – flag if “crossvalidation by users” is needed

__init__(n_splits=5, n_repeats=1, random_state=0, col_target=None, col_client=None, cv_byclient=False)[source]¶

Parameters

n_splits – int, default=5 Number of folds. Must be at least 2.
n_repeats – int, default=1 Number of times cross-validator needs to be repeated.
random_state – int or RandomState instance, default=0 Pass an int for reproducible output across multiple function calls.
col_target – str, default=None Column name for stratified crossvalidation
col_client – str, default=None Column name for crossvalidation by users
cv_byclient – bool, default=False flag if “crossvalidation by users” is needed

CrossPredict ReportBinary¶

Report Binary

class crosspredict.report_binary.ReportBinary(cols_score, cols_target, col_generation_apps=None, col_generation_deals=None)[source]¶

Makes report for binary classification problem :type cols_score: List[str] :param cols_score: List[str] List of column names with model probabilities :type cols_target: List[str] :param cols_target: List[str] List of column names with true binary labels :type col_generation_apps: Optional[str] :param col_generation_apps: str Column name with month of event date (for PSI calculation, does not need true labels in all rows) :type col_generation_deals: Optional[str] :param col_generation_deals: str Column name with month of event date (for metric calculation, only for data with true labels)

__init__(cols_score, cols_target, col_generation_apps=None, col_generation_deals=None)[source]¶: Initialize self. See help(type(self)) for accurate signature.

fit(df)[source]¶: Precalculates metrics and statistics for given pd.DataFrame :type df: DataFrame :param df: pd.DataFrame :rtype: ReportBinary :return: self class

plot_report(report_shape, report=None, cols_score=None, cols_target=None)[source]¶

Plots report of given configuration.

Parameters

report_shape (List[int]) – List[int] Shape of subplot axes. Read more https://matplotlib.org/3.1.1/gallery/userdemo/demo_gridspec01.html#sphx-glr-gallery-userdemo-demo-gridspec01-py
report (Optional[Dict]) – Dict Dict with reports and their location. Read more https://matplotlib.org/3.1.1/gallery/userdemo/demo_gridspec01.html#sphx-glr-gallery-userdemo-demo-gridspec01-py
cols_score (Optional[List[str]]) – List[str] SubList of column names with model probabilities
cols_target (Optional[List[str]]) – List[str] SubList of column names with true binary labels

Returns