CrossPredict Crossvalidation¶
CrossVal
CrossPredict TargetEncoder¶
CrossVal
CrossPredict Iterator¶
-
class
crosspredict.iterator.Iterator(n_splits=5, n_repeats=1, random_state=0, col_target=None, col_client=None, cv_byclient=False)[source]¶ K-Fold data with 3 different crossvalidation strategies:
crossvalidation by users (RepeatedKFold) should pass col_client and cv_byclient=True
stratified crossvalidation by target column (RepeatedStratifiedKFold) should pass col_target and cv_byclient=False
simple crossvalidation (RepeatedKFold) should pass col_target=None and cv_byclient=False
- Parameters
n_splits (int, default=5) – Number of folds. Must be at least 2.
n_repeats (int, default=1) – Number of times cross-validator needs to be repeated.
random_state (int or RandomState instance, default=0) – Pass an int for reproducible output across multiple function calls.
col_target (str, default=None) – Column name for stratified crossvalidation
col_client (str, default=None) – Column name for crossvalidation by users
cv_byclient (bool, default=False) – flag if “crossvalidation by users” is needed
-
__init__(n_splits=5, n_repeats=1, random_state=0, col_target=None, col_client=None, cv_byclient=False)[source]¶ - Parameters
n_splits – int, default=5 Number of folds. Must be at least 2.
n_repeats – int, default=1 Number of times cross-validator needs to be repeated.
random_state – int or RandomState instance, default=0 Pass an int for reproducible output across multiple function calls.
col_target – str, default=None Column name for stratified crossvalidation
col_client – str, default=None Column name for crossvalidation by users
cv_byclient – bool, default=False flag if “crossvalidation by users” is needed
CrossPredict ReportBinary¶
Report Binary
-
class
crosspredict.report_binary.ReportBinary(cols_score, cols_target, col_generation_apps=None, col_generation_deals=None)[source]¶ Makes report for binary classification problem :type cols_score:
List[str] :param cols_score: List[str] List of column names with model probabilities :type cols_target:List[str] :param cols_target: List[str] List of column names with true binary labels :type col_generation_apps:Optional[str] :param col_generation_apps: str Column name with month of event date (for PSI calculation, does not need true labels in all rows) :type col_generation_deals:Optional[str] :param col_generation_deals: str Column name with month of event date (for metric calculation, only for data with true labels)-
__init__(cols_score, cols_target, col_generation_apps=None, col_generation_deals=None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
fit(df)[source]¶ Precalculates metrics and statistics for given pd.DataFrame :type df:
DataFrame:param df: pd.DataFrame :rtype:ReportBinary:return: self class
-
plot_report(report_shape, report=None, cols_score=None, cols_target=None)[source]¶ Plots report of given configuration.
- Parameters
report_shape (
List[int]) – List[int] Shape of subplot axes. Read more https://matplotlib.org/3.1.1/gallery/userdemo/demo_gridspec01.html#sphx-glr-gallery-userdemo-demo-gridspec01-pyreport (
Optional[Dict]) – Dict Dict with reports and their location. Read more https://matplotlib.org/3.1.1/gallery/userdemo/demo_gridspec01.html#sphx-glr-gallery-userdemo-demo-gridspec01-pycols_score (
Optional[List[str]]) – List[str] SubList of column names with model probabilitiescols_target (
Optional[List[str]]) – List[str] SubList of column names with true binary labels
- Returns
-