bm.prep package
数据预处理模块
bm.prep.baggingPU module
Bagging meta-estimator for PU learning.
- class bm.prep.baggingPU.BaggingClassifierPU(base_estimator=None, n_estimators=10, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=True, warm_start=False, n_jobs=1, random_state=None, verbose=0)
Bases:
BaseBaggingPU,ClassifierMixinA Bagging PU classifier.
Adapted from sklearn.ensemble.BaggingClassifier, based on A bagging SVM to learn from positive and unlabeled examples (2013) by Mordelet and Vert http://dx.doi.org/10.1016/j.patrec.2013.06.010 http://members.cbio.mines-paristech.fr/~jvert/svn/bibli/local/Mordelet2013bagging.pdf
- Parameters
base_estimator (object or None, optional (default=None)) – The base estimator to fit on random subsets of the dataset. If None, then the base estimator is a decision tree.
n_estimators (int, optional (default=10)) – The number of base estimators in the ensemble.
max_samples (int or float, optional (default=1.0)) – The number of unlabeled samples to draw to train each base estimator.
max_features (int or float, optional (default=1.0)) –
The number of features to draw from X to train each base estimator.
If int, then draw max_features features.
If float, then draw max_features * X.shape[1] features.
bootstrap (boolean, optional (default=True)) – Whether samples are drawn with replacement.
bootstrap_features (boolean, optional (default=False)) – Whether features are drawn with replacement.
oob_score (bool, optional (default=True)) – Whether to use out-of-bag samples to estimate the generalization error.
warm_start (bool, optional (default=False)) – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
n_jobs (int, optional (default=1)) – The number of jobs to run in parallel for both fit and predict. If -1, then the number of jobs is set to the number of cores.
random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
verbose (int, optional (default=0)) – Controls the verbosity of the building process.
- base_estimator_
The base estimator from which the ensemble is grown.
- Type
estimator
- estimators_
The collection of fitted base estimators.
- Type
list of estimators
- estimators_samples_
The subset of drawn samples (i.e., the in-bag samples) for each base estimator. Each subset is defined by a boolean mask.
- Type
list of arrays
- estimators_features_
The subset of drawn features for each base estimator.
- Type
list of arrays
- classes_
The classes labels.
- Type
array of shape = [n_classes]
- n_classes_
The number of classes.
- Type
int or list
- oob_score_
Score of the training dataset obtained using an out-of-bag estimate.
- Type
float
- oob_decision_function_
Decision function computed with out-of-bag estimate on the training set. Positive data points, and perhaps some of the unlabeled, are left out during the bootstrap. In these cases, oob_decision_function_ contains NaN.
- Type
array of shape = [n_samples, n_classes]
- decision_function(X)
Average of the decision functions of the base classifiers.
- Parameters
X ({array-like, sparse matrix} of shape = [n_samples, n_features]) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns
score – The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute
classes_. Regression and binary classification are special cases withk == 1, otherwisek==n_classes.- Return type
array, shape = [n_samples, k]
- predict(X)
Predict class for X.
The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a
predict_probamethod, then it resorts to voting.- Parameters
X ({array-like, sparse matrix} of shape = [n_samples, n_features]) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns
y – The predicted classes.
- Return type
array of shape = [n_samples]
- predict_log_proba(X)
Predict class log-probabilities for X.
The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.
- Parameters
X ({array-like, sparse matrix} of shape = [n_samples, n_features]) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns
p – The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- Return type
array of shape = [n_samples, n_classes]
- predict_proba(X)
Predict class probabilities for X.
The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a
predict_probamethod, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.- Parameters
X ({array-like, sparse matrix} of shape = [n_samples, n_features]) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns
p – The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- Return type
array of shape = [n_samples, n_classes]