bm.prep package

数据预处理模块

bm.prep.baggingPU module

Bagging meta-estimator for PU learning.

class bm.prep.baggingPU.BaggingClassifierPU(base_estimator=None, n_estimators=10, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=True, warm_start=False, n_jobs=1, random_state=None, verbose=0)

Bases: BaseBaggingPU, ClassifierMixin

A Bagging PU classifier.

Adapted from sklearn.ensemble.BaggingClassifier, based on A bagging SVM to learn from positive and unlabeled examples (2013) by Mordelet and Vert http://dx.doi.org/10.1016/j.patrec.2013.06.010 http://members.cbio.mines-paristech.fr/~jvert/svn/bibli/local/Mordelet2013bagging.pdf

Parameters
  • base_estimator (object or None, optional (default=None)) – The base estimator to fit on random subsets of the dataset. If None, then the base estimator is a decision tree.

  • n_estimators (int, optional (default=10)) – The number of base estimators in the ensemble.

  • max_samples (int or float, optional (default=1.0)) – The number of unlabeled samples to draw to train each base estimator.

  • max_features (int or float, optional (default=1.0)) –

    The number of features to draw from X to train each base estimator.

    • If int, then draw max_features features.

    • If float, then draw max_features * X.shape[1] features.

  • bootstrap (boolean, optional (default=True)) – Whether samples are drawn with replacement.

  • bootstrap_features (boolean, optional (default=False)) – Whether features are drawn with replacement.

  • oob_score (bool, optional (default=True)) – Whether to use out-of-bag samples to estimate the generalization error.

  • warm_start (bool, optional (default=False)) – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

  • n_jobs (int, optional (default=1)) – The number of jobs to run in parallel for both fit and predict. If -1, then the number of jobs is set to the number of cores.

  • random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

  • verbose (int, optional (default=0)) – Controls the verbosity of the building process.

base_estimator_

The base estimator from which the ensemble is grown.

Type

estimator

estimators_

The collection of fitted base estimators.

Type

list of estimators

estimators_samples_

The subset of drawn samples (i.e., the in-bag samples) for each base estimator. Each subset is defined by a boolean mask.

Type

list of arrays

estimators_features_

The subset of drawn features for each base estimator.

Type

list of arrays

classes_

The classes labels.

Type

array of shape = [n_classes]

n_classes_

The number of classes.

Type

int or list

oob_score_

Score of the training dataset obtained using an out-of-bag estimate.

Type

float

oob_decision_function_

Decision function computed with out-of-bag estimate on the training set. Positive data points, and perhaps some of the unlabeled, are left out during the bootstrap. In these cases, oob_decision_function_ contains NaN.

Type

array of shape = [n_samples, n_classes]

decision_function(X)

Average of the decision functions of the base classifiers.

Parameters

X ({array-like, sparse matrix} of shape = [n_samples, n_features]) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns

score – The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. Regression and binary classification are special cases with k == 1, otherwise k==n_classes.

Return type

array, shape = [n_samples, k]

predict(X)

Predict class for X.

The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a predict_proba method, then it resorts to voting.

Parameters

X ({array-like, sparse matrix} of shape = [n_samples, n_features]) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns

y – The predicted classes.

Return type

array of shape = [n_samples]

predict_log_proba(X)

Predict class log-probabilities for X.

The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.

Parameters

X ({array-like, sparse matrix} of shape = [n_samples, n_features]) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns

p – The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

Return type

array of shape = [n_samples, n_classes]

predict_proba(X)

Predict class probabilities for X.

The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a predict_proba method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.

Parameters

X ({array-like, sparse matrix} of shape = [n_samples, n_features]) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns

p – The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

Return type

array of shape = [n_samples, n_classes]