bm.prep package

数据预处理模块

bm.prep.baggingPU module

Bagging meta-estimator for PU learning.

class bm.prep.baggingPU.BaggingClassifierPU(base_estimator=None, n_estimators=10, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=True, warm_start=False, n_jobs=1, random_state=None, verbose=0)

Bases: BaseBaggingPU, ClassifierMixin

A Bagging PU classifier.

Adapted from sklearn.ensemble.BaggingClassifier, based on A bagging SVM to learn from positive and unlabeled examples (2013) by Mordelet and Vert http://dx.doi.org/10.1016/j.patrec.2013.06.010 http://members.cbio.mines-paristech.fr/~jvert/svn/bibli/local/Mordelet2013bagging.pdf

Parameters

base_estimator (object or None, optional (default=None)) – The base estimator to fit on random subsets of the dataset. If None, then the base estimator is a decision tree.
n_estimators (int, optional (default=10)) – The number of base estimators in the ensemble.
max_samples (int or float, optional (default=1.0)) – The number of unlabeled samples to draw to train each base estimator.
max_features (int or float, optional (default=1.0)) –
The number of features to draw from X to train each base estimator.
- If int, then draw max_features features.
- If float, then draw max_features * X.shape[1] features.
bootstrap (boolean, optional (default=True)) – Whether samples are drawn with replacement.
bootstrap_features (boolean, optional (default=False)) – Whether features are drawn with replacement.
oob_score (bool, optional (default=True)) – Whether to use out-of-bag samples to estimate the generalization error.
warm_start (bool, optional (default=False)) – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
n_jobs (int, optional (default=1)) – The number of jobs to run in parallel for both fit and predict. If -1, then the number of jobs is set to the number of cores.
random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
verbose (int, optional (default=0)) – Controls the verbosity of the building process.

base_estimator_

The base estimator from which the ensemble is grown.

Type: estimator

estimators_

The collection of fitted base estimators.

Type: list of estimators

estimators_samples_

The subset of drawn samples (i.e., the in-bag samples) for each base estimator. Each subset is defined by a boolean mask.

Type: list of arrays

estimators_features_

The subset of drawn features for each base estimator.

Type: list of arrays

classes_

The classes labels.

Type: array of shape = [n_classes]

n_classes_

The number of classes.

Type: int or list

oob_score_

Score of the training dataset obtained using an out-of-bag estimate.

Type: float

oob_decision_function_

Decision function computed with out-of-bag estimate on the training set. Positive data points, and perhaps some of the unlabeled, are left out during the bootstrap. In these cases, oob_decision_function_ contains NaN.

Type: array of shape = [n_samples, n_classes]

decision_function(X)

Average of the decision functions of the base classifiers.

Parameters: X ({array-like, sparse matrix} of shape = [n_samples, n_features]) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
Returns: score – The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. Regression and binary classification are special cases with k == 1, otherwise k==n_classes.
Return type: array, shape = [n_samples, k]

predict(X)

Predict class for X.

The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a predict_proba method, then it resorts to voting.

Parameters: X ({array-like, sparse matrix} of shape = [n_samples, n_features]) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
Returns: y – The predicted classes.
Return type: array of shape = [n_samples]

predict_log_proba(X)

Predict class log-probabilities for X.

The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.

Parameters: X ({array-like, sparse matrix} of shape = [n_samples, n_features]) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
Returns: p – The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
Return type: array of shape = [n_samples, n_classes]

predict_proba(X)

Predict class probabilities for X.

The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a predict_proba method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.

Parameters: X ({array-like, sparse matrix} of shape = [n_samples, n_features]) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
Returns: p – The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
Return type: array of shape = [n_samples, n_classes]