bm.visual package
可视化
bm.visual.color_style module
- bm.visual.color_style.find_text_color(base_color, dark_color='black', light_color='white', coef_choice=0)
背景和文本颜色选择 用户可以指定深色和浅色文本颜色,或接受默认值黑色和白色
- Parameters
base_color – RGB的背景颜色
dark_color – matplotlib颜色
light_color – 文本高亮
coef_choice – 输入0或1进行索引,默认为0
bm.visual.features_visual module
特征列之间的相关度、特征标签之间的相关度可视化
- class bm.visual.features_visual.BinningPlot(ax, bin_method='interpolate', picture_name='feature_bin', save_file=False, **kwargs)
Bases:
FeatureVisualizerwoe可视化, 通过计算woe以及iv进行分箱
- Parameters
ax (ax, 默认值 : None) –
bin_X (Dataframe) – 分箱数据
title (str, 默认值 : None) – 特征名称
display_iv (bool, 默认值 : False) – 是否展示iv
- draw(bad_rate, samples_num, column, **kwargs)
- Parameters
bad_rate (DataFrame) – bad的数量
samples_num (DataFrame) – 相应的样本数量
- finalize(**kwargs)
返回轴的装饰器
- Parameters
kwargs (dict) – 通用的关键词字典
- init_bin_method(data, column, target=None, target_value=None, num_clusters=5, max_interval=10, special_attributes=None, tree_params=None)
- Parameters
data (Dataframe) – 输入的数据
column (str) – 特征列
target (str) – 标签列名
target_value (Any) – 目标列值
num_clusters (int) – 聚类簇数
max_interval (int, 默认值 10) – 分箱最大间隔
special_attributes (str, 默认值 None) – 特殊属性
tree_params (dict, 默认值 None) – 特征数参数
- visual(data, column, target=None, target_value=None, num_clusters=5, max_interval=10, special_attributes=None, tree_params=None, bad_rate_plot=False, **kwargs)
通过不同的分箱方法,绘制分箱图
- Parameters
data (DataFrame, shape(n,m)) – 输入数据
column (str) – 分箱特征列
target (str, 默认值 : None) – 目标值(分类标签列)
target_value (str, 默认值 : None) – 现支持二分类(如 : bad, good)
num_clusters (int, 默认值 : 5) – 聚类簇数
max_interval (int, 默认值 : 10) – 最大间隔数
special_attributes (str, 默认值: None) – 特殊特征名
tree_params (dict, 默认值 : None) – 决策树参数字典
bad_rate_plot (bool, 默认值 : False) – 绘制分箱bad_rate
- class bm.visual.features_visual.FeatureCorrelationPlot(ax, columns, picture_name='feature_correlation', save_file=False, **kwargs)
Bases:
FeatureVisualizer类别特征统计图,对于非数值型的特征进行统计
- Parameters
ax (matplotlib Axes, default: None) – 如果hist=True,则添加到上方(xhax)和右侧(yhax)
X (Dataframe, 默认值 : None) – 输入的数据
columns (list, default: None) –
sub_col (int, 默认值 : 2) –
label (str, default : None) –
kwargs (dict) – 关键参数字典
Examples
>>> viz = FeatureCorrelationPlot() >>> viz.visual(X) >>> viz.show()
- draw(corr_df, colormap, mask=False, **kwargs)
- Parameters
corr_df (Dataframe) – 正确的特征
colormap (plt) – 颜色设置
mask (bool) – 是否需要mask
kwargs (dict) – 参数字典
- finalize()
返回轴的装饰器
- Parameters
kwargs (dict) – 通用的关键词字典
- visual(X)
- Parameters
X (Dataframe) – 输入的类别型特征
- class bm.visual.features_visual.FeaturesBoxPlot(ax, columns, sub_col, picture_name='feature_box', save_file=False, **kwargs)
Bases:
FeatureVisualizer- Parameters
ax (matplotlib Axes, default: None) – 如果hist=True,则添加到上方(xhax)和右侧(yhax)
X (Dataframe, 默认值 : None) – 输入的数据
columns (list, default: None) – 特征列表
sub_col (int, 默认值 : 2) –
kwargs (dict) – 关键参数字典
Examples
>>> viz = FeaturesBoxPlot() >>> viz.visual(X, y) >>> viz.show()
- draw(x, idx, feat, **kwargs)
生成画布,处理输入数据,计算最小值(min),下四分位数(Q1),中位数(median),上四分位数(Q3),最大值(max)
- Parameters
x (Dataframe, 默认值 : None) – 输入数据
idx (int) – 特征列索引
feat (any) – 每个索引对应的特征
kwargs (dict) – 参数字典
- finalize()
修改图片的一些参数
- visual(X)
- Parameters
X (Dataframe) – 输入数据
- class bm.visual.features_visual.FeaturesCategoryCount(ax, columns, label, sub_col, picture_name='feature_category', save_file=False, **kwargs)
Bases:
FeatureVisualizer类别型特征统计图,统计非数值类型的类别特征
- Parameters
ax (matplotlib Axes, default: None) – 如果hist=True,则添加到上方(xhax)和右侧(yhax)
X (Dataframe, 默认值 : None) – 输入的数据
columns (list, default: None) –
sub_col (int, 默认值 : 2) –
label (str, default : None) –
kwargs (dict) – 关键参数字典
Examples
>>> viz = FeaturesCategoryCount() >>> viz.visual(X, y) >>> viz.show()
- draw(x, idx, feat, **kwargs)
生成画布,处理输入数据
- Parameters
x (Dataframe, 默认值 : None) – 输入数据
idx (int) – 每一个类别特征的索引
feat (any) – 每一个索引对应的特征
- finalize()
修改一些图片的属性
- visual(X)
- Parameters
X (Dataframe) – 输入的类别特征数据
- class bm.visual.features_visual.FeaturesDistributionPlot(ax, columns, label, sub_col, picture_name='feature_distribute', save_file=False, **kwargs)
Bases:
FeatureVisualizer特征分布图, 绘制特征的分布
- Parameters
ax (轴, 默认值 : None) –
columns (str or list, 默认值 : None) – 特征列名
label (str) – 标签
sub_col (int, 默认值 : 5) –
kwargs (dict) – 关键参数字典
Examples
>>> viz = FeaturesDistributionPlot() >>> viz.visual(X, y) >>> viz.show()
- draw(x, idx, feat, **kwargs)
- Parameters
x (Dataframe, 默认值 : None) – 输入数据
idx (int) – 每一列特征索引
feat (str) – 索引对应的特征值
- finalize(**kwargs)
修改图的属性
- visual(X)
- Parameters
X (Dataframe) – 输入的数据
- class bm.visual.features_visual.FeaturesVisualPlot(ax=None, columns=None, correlation='pearson', kind='scatter', hist=True, alpha=0.65, joint_kws=None, hist_kws=None, picture_name='features_visual', save_file=False, **kwargs)
Bases:
FeatureVisualizer特征数据可视化,允许不同特征之间的对比交互可视化。 可以实现特征与标签之间通过不同算法计算其相关性并进行可视化
“columns”参数可以用于指定“X”中两个所需列的索引。
通过将参数“hist”设置为“True”,可以包含直方图、频率分布,或概率密度函数的“密度”。
- Parameters
ax (matplotlib Axes, default: None) – 如果hist=True,则添加到上方(xhax)和右侧(yhax)
columns (int, str, [int, int], [str, str], default: None) –
correlation (str, default: 'pearson') – 相关性计算方法:可选’pearson’, ‘covariance’, ‘spearman’, ‘kendalltau’
kind (str in {'scatter', 'hex'}, default: 'scatter') – 图形打印类型。注意,当kind=’hex’时,目标不能按颜色绘制。
hist ({True, False, None, 'density', 'frequency'}, default: True) – 默认绘制直方图,显示两个输入变量分布。如果设置为“density”,将绘制概率密度函数。如果设置为True或“frequency”,则将绘制频率。
alpha (float, default: 0.65) – 指定透明度,其中1完全不透明,0完全透明。该特性使密集聚集点更为可见。
kwargs (dict) – 关键参数字典
Examples
>>> viz = FeaturesVisualPlot(columns=["temp", "humidity"]) >>> viz.visual(X, y) >>> viz.show()
- draw(x, y, xlabel=None, ylabel=None)
- Parameters
x (1D array-like) – x每一列的与y
y (1D array-like) – x每一列的与y
xlabel (str) – x轴与y轴的标签
ylabel (str) – x轴与y轴的标签
- finalize(**kwargs)
修改图像属性
- is_dataframe(data)
对输入的数据进行转化,使其变为DataFrame类型
- Parameters
data (instance) – 输入的数据
- visual(X, y=None)
可视化处理,输入数据进行传递
- Parameters
X (array-like) – 一维或二维的numpy数组,通常为二维。
y (array-like, 默认值: None) – 一维的标签数组
- property xhax
直方图的x轴
- property yhax
直方图的y轴
- class bm.visual.features_visual.WoeIvPlot(ax, title=None, display_iv=False, picture_name='woe_iv', save_file=False, **kwargs)
Bases:
FeatureVisualizerWOE-IV分箱可视化
- binxDataFrame
分箱结果
- titlestr
图片标题
- display_ivbool
是否显示对应的IV值
Examples
>>> viz = WoeIvPlot() >>> viz.visual(X) >>> viz.show()
- draw(binx, ind, y_left_max, y_right_max, **kwargs)
- Parameters
binx (Dataframe) – 分箱结果
ind (list) – x轴的刻度值
y_left_max (int) – 左偏移量
y_right_max (int) – 右偏移量
kwargs (dict) – 参数字典
- finalize()
返回轴的装饰器
- Parameters
kwargs (dict) – 通用的关键词字典
- visual(binx, **kwargs)
- Parameters
binx (Dataframe) – 分箱结果
kwargs (dict) – 参数字典
bm.visual.interpretability_visual module
- class bm.visual.interpretability_visual.FeatureImportancePlot(ax, picture_name='features_importance_visual', save_file=False, **kwargs)
Bases:
FeatureVisualizer特征重要性可视化
- Parameters
ax (matplotlib Axes, default: None) – 如果hist=True,则添加到上方(xhax)和右侧(yhax)
picture_name (str, default: features_importance_visual) – 图片保存名称
save_file (boolean, default: False) – 图片保存路径
kwargs (dict) – 关键参数字典
- draw(X, top_n=50, figsize=(20, 12))
- Parameters
X (Dataframe) – 特征重要性dataframe
top_n (int) – top-n的特征
figsize (tuple) – 图像的size
- finalize(X)
- Parameters
X (Dataframe) – 特征重要度
- visual(X)
- Parameters
X (dataframe) – 输入的数据
- class bm.visual.interpretability_visual.ShapPlot(estimator, ax=None, picture_name='feature_shap_plot', save_file=False, mode=None, **kwargs)
Bases:
FeatureVisualizer筛选的特征的解释性可视化
- Parameters
ax (matplotlib Axes, default: None) – 如果hist=True,则添加到上方(xhax)和右侧(yhax)
picture_name (str, default: features_importance_visual) –
save_file (boolean, default: False) –
mode (str, default: None) – mode可选项为force或summary
kwargs (dict) – 关键参数字典
- draw(explainer, shap_values, X, feature_names, show)
- Parameters
estimator (pipeline) – 模型
X (Dataframe) – 输入的筛选特征
- finalize()
返回轴的装饰器
- Parameters
kwargs (dict) – 通用的关键词字典
- visual(estimator, X, feature_names, show)
- Parameters
estimator (pipeline) – 模型
X (Dataframe) – 输入的筛选特征
bm.visual.model_visual module
模型训练、预测的评估绘图
- class bm.visual.model_visual.CNTPlot(estimator, ax=None, per_class=True, binary=False, classes=None, encoder=None, is_fitted='auto', force_model=False, picture_name='cnt', save_file=False, **kwargs)
Bases:
ClassificationVisualizer- Parameters
estimator (pipeline) – 使用的模型
ax (轴, 默认值: None) –
per_class (bool, 默认值: True) – 值为True则绘制每个类别的ROC曲线,如果只需要宏观或微观平均曲线,则应将其设置为False
binary (bool, 默认值: False) – 二分类
classes (list of str, 默认值: None) – 标签类别
encoder (dict or LabelEncoder, 默认值:None) – 标签编码器,sklean方法
is_fitted (boolean or str , 默认值: 'auto') – 判断模型是否训练
force_model (bool, 默认值:False) – 线条颜色
kwargs (dict) – 参数字典
Examples
>>> from sklearn.linear_model import LogisticRegression >>> from sklearn.model_selection import train_test_split >>> data = load_data("occupancy") >>> features = ["temp", "relative humidity", "light", "C02", "humidity"] >>> X_train, X_test, y_train, y_test = train_test_split(X, y) >>> oz = CNTPlot(LogisticRegression()) >>> oz.fit(X_train, y_train) >>> oz.score(X_test, y_test) >>> oz.show()
- draw()
- finalize(**kwargs)
图形设置
- Parameters
kwargs (dict) – 参数字典
- fit(X, y=None, **kwargs)
运行模型
- score(X, y, **kwargs)
模型训练、预测值
- Parameters
X (ndarray or DataFrame, shape(n,m)) – 有m个特征的输入矩阵
y (ndarray or Series, shape(n,)) – 一维类别标签
- Returns
score_ – 评估得到的值
- Return type
float
- class bm.visual.model_visual.ClassificationReportPlot(estimator, ax=None, classes=None, cmap='YlOrRd', support=None, encoder=None, is_fitted='auto', force_model=False, colorbar=True, fontsize=None, picture_name='classification_report', save_file=False, **kwargs)
Bases:
ClassificationVisualizer混淆矩阵可视化视图
- Parameters
estimator (pipeline) – 模型
ax (matplotlib轴,默认值:None) –
classes (list or str, 默认值: None) – 类别标签
cmap (str, 默认值: 'YlOrRd') – 颜色集合
support ({True, False, None, 'percent', 'count'}, 默认值: None) – 模型训练时的颜色
encoder (dict or LabelEncoder, 默认值:None) – 标签编码器,sklean方法
is_fitted (boolean or str , 默认值: 'auto') – 判断模型是否训练
force_model (bool, 默认值:False) – 线条颜色
colorbar (bool, 默认值:True) – 图形颜色
fontsize (int or None, 默认值:None) – 字体大小
kwargs (dict) – 参数字典
Examples
>>> from sklearn.linear_model import LogisticRegression >>> viz = ClassificationReportPlot(LogisticRegression()) >>> viz.fit(X_train, y_train) >>> viz.scores(X_test, y_test) >>> viz.show()
- draw()
- finalize(**kwargs)
对于生成的图像进行标题等属性设置
- Parameters
kwargs (dict) – 参数字典
- scores(X, y)
生成 classification report.
- Parameters
X (ndarray or DataFrame of shape n x m) – 一个(n x m)的特征矩阵
y (ndarray or Series of length n) – 对应的标签
- Returns
score_ – 准确率(accuracy)
- Return type
float
- class bm.visual.model_visual.ConfusionMaxtrixPlot(estimator, ax=None, sample_weight=None, percent=False, classes=None, encoder=None, cmap='YlOrRd', fontsize=None, is_fitted='auto', force_model=False, label_transfer=None, picture_name='confusion_maxtrix', save_file=False, **kwargs)
Bases:
ClassificationVisualizer分类混淆矩阵可视化
- Parameters
estimator (模型) – 使用的模型
ax (matplotlib轴,默认值:None) –
sample_weight (array-like, shape(n_samples,)) – 可选项,样本权重
percent (bool, 默认值False) – 数字或百分比展示
classes (list or str, 默认值: None) – 类别标签
cmap (str, 默认值: 'YlOrRd') – 颜色集合
encoder (dict or LabelEncoder, 默认值:None) – 标签编码器,sklean方法
is_fitted (boolean or str , 默认值: 'auto') – 判断模型是否训练
force_model (bool, 默认值:False) – 线条颜色
fontsize (int or None, 默认值:None) – 字体大小
kwargs (dict) – 参数字典
Examples
>>> from sklearn.linear_model import LogisticRegression >>> viz = ConfusionMaxtrixPlot(LogisticRegression()) >>> viz.fit(X_train, y_train) >>> viz.score(X_test, y_test) >>> viz.show()
- draw()
生成相应的混淆矩阵图
- finalize(**kwargs)
返回轴的装饰器
- Parameters
kwargs (dict) – 通用的关键词字典
- score(X, y, **kwargs)
通过比较实例X上的预测与目标向量y指定的真值,根据提供的测试数据绘制混淆矩阵。
- Parameters
X (ndarray or DataFrame of shape n x m) – 一个(n x m)的特征矩阵
y (ndarray or Series of length n) – 对应的标签
- Returns
score_ – 准确率(accuracy)
- Return type
float
- show(outpath=None, **kwargs)
图形展示方法
- Parameters
outpath (string, 默认值: None) – 图形保存路径
clear_figure (Boolean, 默认值: False) – 如果为True,保存到文件或显示在屏幕上后清除图形。
kwargs (dict) – 通用的关键词字典
Notes
- class bm.visual.model_visual.FeatureImportancePlot(estimator, ax=None, labels=None, relative=True, absolute=False, xlabel=None, stack=False, colors=None, colormap=None, is_fitted='auto', topn=None, picture_name='feature_important', save_file=False, **kwargs)
Bases:
ModelVisualizer对特征按照重要程度进行排序
- Parameters
estimator (Estimator) – 初始化好的模型
ax (matplotlib轴, 默认值: None) – 画图的轴
labels (list, 默认值: None) – 标签列表
relative (bool, 默认值: True) – 相对重要程度
absolute (bool, 默认值: False) – 绝对重要程度
xlabel (str, 默认值: None) – x轴的标签
stack (bool, 默认值: False) – 绘图类型
colors (list of strings) – 如果“stack==False”,请为图表中的每个条指定颜色。
colormap (string or matplotlib cmap) – 如果“stack==True”,请指定一个colormap来为类着色。
is_fitted (bool or str, 默认值 : 'auto') – 判断是否进行fit
topn (int, 默认值 : None) – 展示top-n的结果,默认全部展示
kwargs (dict) – 参数字典
Examples
>>> from sklearn.ensemble import GradientBoostingClassifier >>> visualizer = FeatureImportancePlot(GradientBoostingClassifier()) >>> visualizer.fit(X, y) >>> visualizer.show()
- draw(**kwargs)
绘制特征重要度图
Note
不经过特征筛选,直接利用模型生成的特征排序
- finalize(**kwargs)
图形属性修改
- fit(X, y=None, **kwargs)
训练模型
- Parameters
X (numpy.ndarray or DataFrame, shape(n,m)) – 输入的训练数据
y (numpy.ndarray or Series, shape(n,)) – 输入的标签
kwargs (dict) – 参数字典
- class bm.visual.model_visual.KDEPlot(estimator, ax=None, per_class=True, binary=False, classes=None, encoder=None, is_fitted='auto', force_model=False, picture_name='kde', save_file=False, **kwargs)
Bases:
ClassificationVisualizerKernel Density Estimator Plot
- Parameters
estimator (estimator) – 使用的模型
ax (matplotlib 轴, 默认值: None) –
per_class (bool, 默认值: True) – 值为True则绘制每个类别的ROC曲线,如果只需要宏观或微观平均曲线,则应将其设置为False
binary (bool, 默认值: False) – 二分类
classes (list of str, 默认值: None) – 标签类别
encoder (dict or LabelEncoder, 默认值:None) – 标签编码器,sklean方法
is_fitted (boolean or str , 默认值: 'auto') – 判断模型是否训练
force_model (bool, 默认值:False) –
kwargs (dict) – 参数字典
Examples
>>> from sklearn.linear_model import LogisticRegression >>> from sklearn.model_selection import train_test_split >>> data = load_data("occupancy") >>> features = ["temp", "relative humidity", "light", "C02", "humidity"] >>> X_train, X_test, y_train, y_test = train_test_split(X, y) >>> oz = KDEPlot(LogisticRegression()) >>> oz.fit(X_train, y_train) >>> oz.score(X_test, y_test) >>> oz.show()
- draw()
根据数据画图
- Returns
ax – matlibplot ax
- Return type
ax
- finalize(**kwargs)
ROCAUC图形修改
- Parameters
kwargs (dict) – 参数字典
- fit(X, y=None, **kwargs)
重构fit过程,继承于sklearn的base模块
- score(X, y, **kwargs)
模型训练、预测值
- Parameters
X (ndarray or DataFrame, shape(n,m)) – 有m个特征的输入矩阵
y (ndarray or Series, shape(n,)) – 一维类别标签
- Returns
score_ – 评估得到的值
- Return type
float
- class bm.visual.model_visual.KSPlot(estimator, ax=None, classes=None, encoder=None, is_fitted='auto', force_model=False, picture_name='ks', save_file=False, **kwargs)
Bases:
ClassificationVisualizer绘制KS曲线
- Parameters
estimator (pipeline) – 模型
ax (轴,默认值: None) –
classes (list or str, 默认值:None) – 类别
encoder (dict or LabelEncoder, 默认值: None) – 标签编码器
is_fitted (bool or str, 默认值 : "auto") – 是否进行过fit
force_model (bool, 默认值 : False) –
kwargs (dict) – 参数字典
- draw(score_bin, good_rates, bad_rates, ks_lst, **kwargs)
- Parameters
score_bin (list) – score的index
good_rates (list) – 好样本率
bad_rates (list) – 坏样本率
ks_lst (list) – 对应的ks值
kwargs (dict) – 参数字典
- finalize()
返回轴的装饰器
- Parameters
kwargs (dict) – 通用的关键词字典
- fit(X, y=None, **kwargs)
重构模型fit过程,继承于sklearn的base类
- Parameters
X (Dataframe) – 训练数据
y (list) – 训练数据对应的标签
kwargs (dict) – 参数字典
- score(X_train, y_train, X_test=None, y_test=None, **kwargs)
返回模型训练过程中的预测值
- Parameters
X_train (Dataframe) – 训练数据
y_train (list or ndarray) – 训练数据标签
X_test (Dataframe) – 测试数据
y_test (list or ndarray) – 测试数据对应标签
kwargs (dict) – 参数字典
- class bm.visual.model_visual.LearningCurve(estimator, ax=None, train_sizes=array([0.1, 0.325, 0.55, 0.775, 1.0]), cv=None, scoring=None, exploit_incremental_learning=False, n_jobs=1, pre_dispatch='all', shuffle=False, random_state=None, picture_name='learning_curve', save_file=False, **kwargs)
Bases:
ModelVisualizer绘制数据在模型上的学习曲线
- Parameters
estimator (pipeline) – 学习器(初始化模型)
ax (轴, default : None) –
train_sizes (array-like, shape (n_ticks,), default: np.linspace(0.1,1.0,5)) –
cv (int, default: None, 做cross-validation的时候,数据分成的份数,其中一份作为cv集,其余n-1份作为training) –
scoring (string, callable or None, default: None) – optional[‘accuracy’, ‘adjusted_rand_score’, ‘average_precision’, ‘f1’, ‘f1_macro’, ‘f1_micro’, ‘f1_samples’, ‘f1_weighted’, ‘neg_log_loss’, ‘neg_mean_absolute_error’, ‘neg_mean_squared_error’, ‘neg_median_absolute_error’, ‘precision’, ‘precision_macro’, ‘precision_micro’, ‘precision_samples’, ‘precision_weighted’, ‘r2’, ‘recall’, ‘recall_macro’, ‘recall_micro’, ‘recall_samples’, ‘recall_weighted’, ‘roc_auc’]
exploit_incremental_learning (boolean, default: False) – 如果估计器支持增量学习,这将用于加速不同训练集大小的拟合。
n_jobs (int, optional, default : 1) – 并行数
pre_dispatch (integer or string, optional, default : all) – 并行执行的预调度作业数
shuffle (boolean, optional) – shuffle operation
random_state (int, RandomState instance or None, optional (default=None)) – 设置随机种子,当shuffle = True时设置
kwargs (dict) – 参数字典
Examples
>>> from sklearn.naive_bayes import GaussianNB >>> model = LearningCurve(GaussianNB()) >>> model.fit(X, y) >>> model.show()
- draw(**kwargs)
Renders the training and test learning curves.
- finalize(**kwargs)
设置title以及轴标签
- fit(X, y=None, **kwargs)
重构fit过程,继承于sklearn的base类
- Parameters
X (Dataframe) – 训练数据
y (list or ndarray) – 训练数据对应的标签
kwargs (dict) – 参数字典
- class bm.visual.model_visual.LiftPlot(estimator, ax=None, per_class=True, binary=False, classes=None, encoder=None, is_fitted='auto', force_model=False, picture_name='lit', save_file=False, **kwargs)
Bases:
ClassificationVisualizer提升度曲线
- Parameters
estimator (pipeline) – 使用的模型
ax (matplotlib 轴, 默认值: None) –
per_class (bool, 默认值: True) – 值为True则绘制每个类别的ROC曲线,如果只需要宏观或微观平均曲线,则应将其设置为False
binary (bool, 默认值: False) – 二分类
classes (list of str, 默认值: None) – 标签类别
encoder (dict or LabelEncoder, 默认值:None) – 标签编码器,sklean方法
is_fitted (boolean or str , 默认值: 'auto') – 判断模型是否训练
force_model (bool, 默认值:False) – 线条颜色
kwargs (dict) – 参数字典
Examples
>>> from sklearn.linear_model import LogisticRegression >>> from sklearn.model_selection import train_test_split >>> data = load_data("occupancy") >>> features = ["temp", "relative humidity", "light", "C02", "humidity"] >>> X_train, X_test, y_train, y_test = train_test_split(X, y) >>> oz = LiftPlot(LogisticRegression()) >>> oz.fit(X_train, y_train) >>> oz.score(X_test, y_test) >>> oz.show()
- draw()
连接到matplotlib接口,并创建以图形或轴的形式对可视化工具进行训练的数据
- Parameters
kwargs (dict) – 通用的关键词字典
- finalize(**kwargs)
- Parameters
kwargs (dict) – 参数字典
- fit(X, y=None, **kwargs)
重构fit过程, 继承于sklearn的base基类
- Parameters
X (Dataframe) – 训练数据
y (list or ndarray) – 训练数据对应的标签
kwargs (dict) – 参数字典
- score(X, y, **kwargs)
模型训练、预测值
- Parameters
X (ndarray or DataFrame, shape(n,m)) – 有m个特征的输入矩阵
y (ndarray or Series, shape(n,)) – 一维类别标签
- Returns
score_ – 评估得到的值
- Return type
float
- class bm.visual.model_visual.MarketingPlot(estimator, ax=None, per_class=True, binary=False, classes=None, encoder=None, is_fitted='auto', force_model=False, picture_name='market_form', save_file=False, **kwargs)
Bases:
ClassificationVisualizer营销报表生成,综合了营销所需的各种图像
- Parameters
estimator (pipeline) – 使用的模型
ax (matplotlib 轴, 默认值: None) –
per_class (bool, 默认值: True) – 值为True则绘制每个类别的ROC曲线,如果只需要宏观或微观平均曲线,则应将其设置为False
binary (bool, 默认值: False) – 二分类
classes (list of str, 默认值: None) – 标签类别
encoder (dict or LabelEncoder, 默认值:None) – 标签编码器,sklean方法
is_fitted (boolean or str , 默认值: 'auto') – 判断模型是否训练
force_model (bool, 默认值:False) –
kwargs (dict) – 参数字典
Examples
>>> from sklearn.linear_model import LogisticRegression >>> from sklearn.model_selection import train_test_split >>> data = load_data("occupancy") >>> features = ["temp", "relative humidity", "light", "C02", "humidity"] >>> X_train, X_test, y_train, y_test = train_test_split(X, y) >>> oz = LiftPlot(LogisticRegression()) >>> oz.fit(X_train, y_train) >>> oz.score(X_test, y_test) >>> oz.show()
- cob_draw()
- cpr_draw()
- draw()
连接到matplotlib接口,并创建以图形或轴的形式对可视化工具进行训练的数据
- Parameters
kwargs (dict) – 通用的关键词字典
- finalize(**kwargs)
图形属性设置
- Parameters
kwargs (dict) – 参数字典
- fit(X, y=None, **kwargs)
重构fit过程,继承于sklearn的base类
- Parameters
X (Dataframe) – 训练数据
y (list or ndarray) – 训练数据对应的标签
kwargs (dict) – 参数字典
- ker_draw()
- ks_draw()
- prc_draw()
- roc_draw()
- score(X, y, **kwargs)
模型训练、预测值
- Parameters
X (ndarray or DataFrame, shape(n,m)) – 有m个特征的输入矩阵
y (ndarray or Series, shape(n,)) – 一维类别标签
- Returns
score_ – 评估得到的值
- Return type
float
- class bm.visual.model_visual.PrecisionRecallPlot(estimator, ax=None, classes=None, colors=None, cmap=None, encoder=None, fill_area=None, ap_score=True, micro=True, iso_f1_curves=False, iso_f1_values=(0.2, 0.4, 0.6, 0.8), per_class=False, fill_opacity=0.2, line_opacity=0.8, is_fitted='auto', force_model=False, pr_change=False, picture_name='precision_recall', save_file=False, **kwargs)
Bases:
ClassificationVisualizer精确率(precision)和召回率(recall)的对应图
- Parameters
estimator (pipeline) – 模型
ax (matplotlib 轴, 默认值:None) –
classes (list or str, 默认值:None) – 类别标签
cmap (str or colormap, 默认值:None) – 颜色选择
encoder (dict or LabelEncoder, 默认值: None) – 标签编码器
fill_area (bool, 默认值: True) – 覆盖区域颜色
ap_score (bool, 默认值 : True) – 图注释
micro (bool, 默认值 : True) – micro average
iso_f1_curves (bool, 默认值 : None) – ISO F1-Curves
iso_f1_values (tuple, 默认值 : (0.2,0.4,0.6,0.8)) – 刻度
pre_class (bool, 默认值 : False) – 在多标签是否画每个类别的图
fill_opacity (float, 默认值 : 0.2) – 填充区域alpha 偏移值
line_opacity (float, 默认值 : 0.8) – 线条偏移值
if_fitted (bool or str, 默认值 : auto) – 学习器是否进行fit
force_model (bool, 默认值 : False) –
kwargs (dict) – 参数字典
Examples
>>> from sklearn.model_selection import train_test_split >>> from sklearn.svm import LinearSVC >>> X_train, X_test, y_train, y_test = train_test_split(X, y) >>> viz = PrecisionRecallPlot(LinearSVC()) >>> viz.fit(X_train, y_train) >>> viz.score(X_test, y_test) >>> viz.show()
- draw()
连接到matplotlib接口,并创建以图形或轴的形式对可视化工具进行训练的数据
- Parameters
kwargs (dict) – 通用的关键词字典
- finalize()
修改轴信息
- fit(X, y=None, **kwargs)
重构模型fit过程,继承于sklearn的base类
- Parameters
X (Dataframe) – 训练数据
y (list) – 训练数据对应的标签
kwargs (dict) – 参数字典
- score(X, y, **kwargs)
- Parameters
X (Dataframe) – 训练数据
y (list) – 训练数据对应的标签
kwargs (dict) – 参数字典
- class bm.visual.model_visual.PredictErrorPlot(estimator, ax=None, classes=None, encoder=None, is_fitted='auto', force_model=False, picture_name='predict_error', save_file=False, **kwargs)
Bases:
ClassificationVisualizer预测错误可视化,各个类别预测错误的统计
- Parameters
estimator (estimator) – 学习器
ax (轴,默认值: None) –
classes (list or str, 默认值:None) – 类别
encoder (dict or LabelEncoder, 默认值: None) – 标签编码器
is_fitted (bool or str, 默认值 : "auto") – 是否进行过fit
force_model (bool, 默认值 : False) –
kwargs (dict) – 参数字典
- draw()
Renders the class prediction error across the axis.
- Returns
ax – The axes on which the figure is plotted
- Return type
Matplotlib Axes
- finalize(**kwargs)
修改图片信息
- score(X, y, **kwargs)
预测
- Parameters
X (ndarray or DataFrame, shape(n,m)) – 一个n行m列的矩阵
y (ndarray or Series, shape(n,)) – 一个标签array
- Returns
score_ – accuracy score
- Return type
float
- class bm.visual.model_visual.ROCAUCPlot(estimator, ax=None, micro=True, macro=True, per_class=True, binary=False, classes=None, encoder=None, is_fitted='auto', force_model=False, picture_name='roc_auc', save_file=False, **kwargs)
Bases:
ClassificationVisualizerROC & AUC曲线图
- Parameters
estimator (pipeline) – 使用的模型
ax (matplotlib 轴, 默认值: None) –
micro (bool, 默认值: True) – 微平均
macro (bool, 默认值: True) – 宏平均
per_class (bool, 默认值: True) – 值为True则绘制每个类别的ROC曲线,如果只需要宏观或微观平均曲线,则应将其设置为False
binary (bool, 默认值: False) – 二分类
classes (list of str, 默认值: None) – 标签类别
encoder (dict or LabelEncoder, 默认值:None) – 标签编码器,sklean方法
is_fitted (boolean or str , 默认值: 'auto') – 判断模型是否训练
force_model (bool, 默认值:False) – 检查模型类别
kwargs (dict) – 参数字典
Examples
>>> from sklearn.linear_model import LogisticRegression >>> from sklearn.model_selection import train_test_split >>> data = load_data("occupancy") >>> features = ["temp", "relative humidity", "light", "C02", "humidity"] >>> X_train, X_test, y_train, y_test = train_test_split(X, y) >>> oz = ROCAUCPlot(LogisticRegression()) >>> oz.fit(X_train, y_train) >>> oz.score(X_test, y_test) >>> oz.show()
- draw()
连接到matplotlib接口,并创建以图形或轴的形式对可视化工具进行训练的数据
- Parameters
kwargs (dict) – 通用的关键词字典
- finalize(**kwargs)
ROCAUC图形修改
- Parameters
kwargs (dict) – 参数字典
- fit(X, y=None, **kwargs)
重构模型fit过程,继承于sklearn的base类
- Parameters
X (Dataframe) – 训练数据
y (list) – 训练数据对应的标签
kwargs (dict) – 参数字典
- score(X, y, **kwargs)
模型训练、预测值
- Parameters
X (ndarray or DataFrame, shape(n,m)) – 有m个特征的输入矩阵
y (ndarray or Series, shape(n,)) – 一维类别标签
- Returns
score_ – 评估得到的值
- Return type
float
- class bm.visual.model_visual.RedidualsPlot(estimator, ax=None, hist=True, qqplot=False, train_color='b', test_color='g', line_color='#111111', train_alpha=0.75, test_alpha=0.75, is_fitted='auto', picture_name='redidual', save_file=False, **kwargs)
Bases:
RegressionVisualizer预测残差可视化
预测值与真实值之间的残差plot
- Parameters
estimator (回归模型) – 训练好的回归模型
ax (matplotlib轴,默认值:None) –
hist ({True, False, None, 'density', 'frequency'}, 默认值: True) – 残差分布图,设置为density是密度图, frequency是频率图
qqplot ({True, False}, 默认值: False) – 残差的分位数
train_color (color, 默认值: 'g') – 模型训练时的颜色
test_color (color, 默认值:'g') – 模型测试的图颜色
line_color (color, 默认值:dark grey) – 线条颜色
train_alpha (float, 默认值:0.75) – 训练数据透明度
test_alpha (float, 默认值:0.75) – 测试数据透明度
is_fitted (boolean or str , 默认值: 'auto') – 判断模型是否训练
kwargs (dict) – 参数字典
Examples
>>> from sklearn.linear_model import Ridge >>> model = RedidualsPlot(Ridge()) >>> model.fits(X_train, y_train) >>> model.score(X_test, y_test) >>> model.show()
- draw(y_pred, residuals, train=False, **kwargs)
根据数据绘制图形
- Parameters
y_pred (ndarray) – 一维的预测值
residuals (ndarray) – 一维的残差值
train (boolean, 默认值: False) – 是否训练模式
kwargs (dict) – 参数字典
- finalize(**kwargs)
图形的title等属性修改
- Parameters
kwargs (dict) – 参数字典
- fits(X, y, **kwargs)
- Parameters
X (ndarray or DataFrame,shape(n,m)) – 输入数据
y (ndarray or Series, shape(n,)) – 输入的标签
kwargs (dict) – 参数字典
- Returns
self – 对象实例
- Return type
ResidualsPlot
- property hax
Returns the histogram axes, creating it only on demand.
- property qqax
返回相应ax的轴
- score(X, y=None, train=False, **kwargs)
生成预测值
- Parameters
X (array-like) – 输入数据
y (array-like) – 输入标签
train (boolean) – 分流,训练和预测
- Returns
score – 相应模式的输出
- Return type
float
bm.visual.quick_visual module
对封装的可视化进行快速使用
- bm.visual.quick_visual.binning_plot(bin_method, data, column, ax=None, target=None, target_value=None, num_clusters=5, max_interval=10, special_attributes=None, tree_params=None, bad_rate_plot=False, show=False, picture_name='binning_plot', save_file=True, **kwargs)
- Parameters
bin_method (str, 默认值 : interpolate) – 可选项,[interpolate, quantile, distance, mixed, decision_tree, chi_square, kmeans, best_ks]
data (Dataframe) – 输入的数据
column (str) –
分箱特征列
- axax, 默认值:None
自定义轴,自动设置
target (str, 默认值 : None) – 目标值(分类标签列)
target_value (str or int, 默认值 : None) – 现支持二分类(如 : bad, good)
num_clusters (int, 默认值 : 5) – 聚类簇数
max_interval (int, 默认值 : 10) – 最大间隔数
special_attributes (str, 默认值 : None) – 特殊特征名
tree_params (dict, 默认值 : None) – 决策树参数字典
bad_rate_plot (bool, 默认值 : False) – 绘制分箱bad_rate
show (bool) – 是否进行可视化展示
picture_name (str) – 图片保存路径名称
save_file (bool) – 判断是否保存图
kwargs (dict) – 参数字典
- bm.visual.quick_visual.feature_importance_plot(X, ax=None, picture_name='特征重要性', save_file=True, show=False, **kwargs)
- Parameters
X (Dataframe) – 筛选保留的特征
ax (ax, 默认值:None) – 自定义轴,自动设置
picture_name (str) – 保存图片时的名称
save_file (bool) – 是否进行图片保存
show (bool) – 是否进行可视化展示
kwargs (dict) – 参数字典
- bm.visual.quick_visual.featurebox_plot(X, ax=None, columns=None, sub_col=None, show=False, picture_name='箱形图', save_file=True, **kwargs)
- Parameters
X (Dataframe) – 输入的数据
ax (ax, 默认值:None) – 自定义轴, 自动设置
columns (list) – 特征列表
sub_col (int) – 子列
show (bool) – 是否进行可视化展示
picture_name (str) – 图片保存路径名称
save_file (bool) – 判断是否保存图
kwargs (dict) – 参数字典
- bm.visual.quick_visual.featurecategory_plot(X, ax=None, columns=None, label=None, sub_col=None, show=False, picture_name='类别型特征分布图', save_file=True, **kwargs)
- Parameters
X (Dataframe) – 输入的数据
ax (ax, 默认值:None) – 自定义轴,自动设置
columns (list) – 特征列表
label (str) – 目标标签名称
sub_col (int) – 子列
show (bool) – 是否进行可视化展示
picture_name (str) – 图片保存路径名称
save_file (bool) – 判断是否保存图
kwargs (dict) – 参数字典
- bm.visual.quick_visual.featurecor_plot(X, ax=None, columns=None, show=False, picture_name='相关性热图', save_file=True, **kwargs)
- Parameters
X (Dataframe) – 输入的数据
ax (ax, 默认值:None) – 自定义轴,自动设置
columns (list) – 特征列表
show (bool) – 是否进行可视化展示
picture_name (str) – 图片保存路径名称
save_file (bool) – 判断是否保存图
kwargs (dict) – 参数字典
- bm.visual.quick_visual.featuredis_plot(X, ax=None, columns=None, label=None, sub_col=None, show=False, picture_name='数值型特征分布图', save_file=True, **kwargs)
- Parameters
X (Dataframe) – 输入的数据
ax (ax, 默认值:None) – 自定义轴,自动设置
columns (list) – 特征列表
label (str) – 目标标签名称
sub_col (int) – 子列
show (bool) – 是否进行可视化展示
picture_name (str) – 图片保存路径名称
save_file (bool) – 判断是否保存图
kwargs (dict) – 参数字典
- bm.visual.quick_visual.reports_plot(estimator, X_train, y_train, X_test=None, y_test=None, fit_params={}, ax=None, per_class=True, picture_name='模型报表', binary=False, classes=None, encoder=None, is_fitted='auto', force_model=False, show=False, save_file=True, **kwargs)
- Parameters
estimator (pipeline) – pipeline模型
X_train (Dataframe) – 训练数据
y_train (list or ndarray) – 训练数据对应的标签
X_test (Dataframe) – 测试数据
y_test (list or ndarray) – 测试数据对应的标签
fit_params (dict) – 模型初始化参数
ax (ax, 默认值:None) – 自定义轴,自动设置
per_class (bool) – 值为True则绘制每个类别的ROC曲线,如果只需要宏观或微观平均曲线,则应将其设置为False
picture_name (str) – 保存图片时的名称
binary (bool) – 是否为二分类
classes (list) – 类别标签, 可设置为[0,1]
encoder (bool) – 是否对标签进行编码, 默认不需要
is_fitted (bool, 默认值:auto) – 是否进行训练
force_model (False) –
show (bool) – 是否进行可视化展示
save_file (bool) – 是否进行保存
kwargs (dict) – 参数字典
- bm.visual.quick_visual.shap_plot(estimator, X, feature_names, ax=None, picture_name='SHAP', save_file=True, mode=None, show=False, **kwargs)
- Parameters
estimator (pipeline) – pipeline模型
X (Dataframe) – 筛选后的训练数据
feature_names (list) – 筛选后的特征名称,包括categorical和numeric特征名称
ax (ax, 默认值:None) – 自定义轴,自动设置
picture_name (str) – 保存图片时的名称
save_file (bool) – 是否进行图片保存
mode (str) – 保存的可视化图类型, 可选force和summary
show (bool) – 是否进行可视化展示(当设置save_file为Ture时, show必须设为False)
kwargs (dict) – 参数字典
- bm.visual.quick_visual.wiplot(binx, title, ax=None, display_iv=False, show=False, picture_name='WOE-IV', save_file=True, **kwargs)
- Parameters
binx (Dataframe) – 分箱数据
title (str) – 目标标签
ax (ax, 默认值:None) – 自定义轴,自动设置
display_iv (bool) – 是否进行打印展示
show (bool) – 是否进行可视化展示
picture_name (str) – 图片保存路径名称
save_file (bool) – 判断是否保存图
kwargs (dict,) – 参数字典
bm.visual.target_visual module
- class bm.visual.target_visual.FeatureCorrelationPlot(ax=None, method='pearson', labels=None, sort=False, feature_index=None, feature_names=None, color=None, picture_name='feature_correlation', save_file=False, **kwargs)
Bases:
TargetVisualizer该可视化工具计算Pearson相关系数以及特征和因变量之间的互信息。 该可视化可用于特征选择。
- Parameters
ax (ax,默认值:None) – 画布的轴
method (string, 默认值:"pearson") – 计算特征与标签相关性的方法,包括:pearson, mutual_info-regression, mutual_info-classification
labels (list, 默认值:None) – 特征列名列表
sort (boolean, 默认值:False) – 绘制图形时是否进行排序绘制
feature_index (list) – 特征在列表中的index索引
feature_names (list) – 特征名称列表
color (string) – 绘图颜色
kwargs (dict) – 参数字典
Examples
>>> viz = FeatureCorrelationPlot() >>> viz.visual(X, y) >>> viz.show()
- draw()
绘制特征相关度图
- finalize()
设置图形的标签和title
- is_dataframe(data)
对输入的数据进行转化,使其变为DataFrame类型
- Parameters
data (instance) – 输入的数据
- visual(X, y, **kwargs)
计算特征与标签的相关度
- Parameters
X (numpy.ndarray or DataFrame, shape(n,m)) – 一个n条数据m个特征的矩阵
y (numpy.ndarray or DataFrame, shape(n,)) – 一个n个标签的实例矩阵
kwargs (dict) – 参数字典
- Returns
self
- Return type
visualbase
- class bm.visual.target_visual.TargetBalancedReferencePlot(ax=None, target=None, bins=4, picture_name='target_balance', save_file=False, **kwargs)
Bases:
TargetVisualizer考虑到标签存在不平衡的问题,对数据标签进行可视化分箱, 各个类别标签的指向数据的建议
- Parameters
ax (matplotlib轴,默认值:None) – 继承于visual_base类
target (string, 默认值:"y") – 数据集中的变量y
bins (分箱数量, 默认值:4) –
kwargs (dict) – 基类继承的参数字典
Examples
>>> visualizer = TargetBalancedReferencePlot() >>> visualizer.visual(y) >>> visualizer.show()
- draw(y, **kwargs)
绘制分箱直方图
- Parameters
y (ndarray or Series) – 一维的numpy.ndarray或Series
kwargs (dict) – 参数字典
- finalize(**kwargs)
添加x轴标签并管理刻度标签,以确保其可见。
- Parameters
kwargs (dict) – 通用参数字典
- visual(y, **kwargs)
为图形设置y并且检查输入的数据类型
- Parameters
y (ndarray or Series) – 一维的numpy.ndarray或Series
kwargs (dict) – 参数字典
- class bm.visual.target_visual.TargetStatisticsPlot(ax=None, labels=None, colors=None, colormap=None, picture_name='target_statis', save_file=False, **kwargs)
Bases:
TargetVisualizer对数据中的标签进行统计,生成图形
- 展示存在两种模式:
统计模式(Statistics mode):每个标签在数据中出现的频率 对比模式(Compare mode):标签在测试数据和训练数据中的数量对比
- Parameters
ax (ax, 默认值:None) – 图形中的轴
labels (list) – 可选项, 编码好的标签列表
colors (string) – 颜色设置
colormap (string or matplotlib cmap) –
kwargs (dict) – 可选项, 参数字典
Examples
>>> from sklearn.model_selection import train_test_split >>> viz = TargetStatisticsPlot.visual(y) >>> viz.show()
>>> _, _, y_train, y_test = train_test_split(X, y, test_size=0.2) >>> viz = TargetStatisticsPlot() >>> viz.visual(y_train, y_test) >>> viz.show()
- draw()
确定ax轴的值以及一些设定
- finalize(**kwargs)
设置图的一些参数,如title,legend等等
- Parameters
kwargs (dict) – 参数字典
- visual(y_train, y_test=None)
- 两种模式通过输入的参数个数决定:
只输入y_train是统计模式 二者都输入是对比模式
- Parameters
y_train (array-like) – 一维数组,shape(n,)
y_test (array-like) – 可选项, 一维数组,shape(m,)
bm.visual.visual_base module
继承于sklearn的可视化base类
- class bm.visual.visual_base.ClassificationVisualizer(estimator, ax=None, fig=None, classes=None, encoder=None, is_fitted='auto', force_model=False, **kwargs)
Bases:
ScoreVisual分类模型训练、预测可视化监控
- Parameters
estimator (sklearn的estimator) – sklearn的学习器,也就是分类、回归等模型
ax (matplotlib轴, 默认值:None) – 绘制图的轴
fig (matplotlib图, 默认值:None) – 绘图实例
classes (list or str, 默认值:None) – 分类类别列表
is_fitted (bool or str, 默认值:"auto") –
force_model (Boolean,默认值:False) – 模型检查
kwargs (dict) – 参数字典
- property class_colors_
- fit(X, y=None, **kwargs)
设置数据
- Parameters
X (ndarray or DataFrame, shape(n,m)) – 实例特征矩阵
y (ndarray or Series, shape(n,)) – 标签矩阵
- Returns
self – estimator实例
- Return type
instance
- score(X, y, **kwargs)
测试评估值
- Parameters
X (array-like) – 输入的测试数据
y (array-like) – 输入相应的测试标签
- Returns
score – 输出值
- Return type
float
- class bm.visual.visual_base.FeatureVisualizer(ax=None, fig=None, **kwargs)
Bases:
VisualBase,TransformerMixin特征可视化基类
- Parameters
ax (matplotlib.Axes, 默认值: None) –
fig (matplotlib Figure, 默认值: None) –
kwargs (dict) – 要传递给基本可视化工具的任何其他关键字参数。
- transform(X, y=None)
父类,提供给子类进行重写
- Parameters
X (array-like, shape (n_samples, n_features)) – 需要转换的特征
y (array-like, shape (n_samples,)) – 输入特征所对应的标签
- Returns
X – 原始的输入特征
- Return type
array-like, shape (n_samples, n_features)
- class bm.visual.visual_base.ModelVisualizer(estimator, ax=None, fig=None, is_fitted='auto', **kwargs)
Bases:
VisualBase,Wrapper封装sklearn的模型工具,可视化工具作为模型对象的代理,只需代表包装的模型进行绘制。
- Parameters
estimator (sklearn的estimator) – sklearn的学习器,也就是分类、回归等模型
ax (ax, 默认值:None) – 绘制图的轴
fig (matplotlib, 默认值:None) – 绘图实例
is_fitted (Boolean or str,默认值:auto) – 判断是否进行模型训练、预测
kwargs (dict) – 参数字典
- fit(X, y=None, **kwargs)
- Parameters
X (Dataframe) – 输入的数据
y (ndarray or list) – 对应的标签
kwargs (dict) – 参数字典
- get_params(deep=True)
- Parameters
deep (bool, 默认: True) –
- set_params(**params)
- Parameters
params (dict) – 参数字典
- class bm.visual.visual_base.RegressionVisualizer(estimator, ax=None, fig=None, force_model=False, **kwargs)
Bases:
ScoreVisual回归模型基类
包装回归模型,以在调用评分方法时生成可视化,通常允许用户有效地比较模型之间的性能。
- Parameters
estimator (sklearn的estimator) – sklearn的学习器,也就是分类、回归等模型
ax (ax, 默认值:None) – 绘制图的轴
fig (matplotlib, 默认值:None) – 绘图实例
force_model (Boolean,默认值:False) – 模型检查
kwargs (dict) – 参数字典
- score(X, y, **kwargs)
测试评估值
- Parameters
X (array-like) – 输入的测试数据
y (array-like) – 输入相应的测试标签
- Returns
score – 输出值
- Return type
float
- class bm.visual.visual_base.ScoreVisual(estimator, ax=None, fig=None, is_fitted='auto', **kwargs)
Bases:
ModelVisualizer返回模型预测性能
- Parameters
model (sklearn的estimator) – sklearn的学习器,也就是分类、回归等模型
ax (matplotlib轴, 默认值:None) – 绘制图的轴
fig (matplotlib图, 默认值:None) – 绘图实例
is_fitted (Boolean or str,默认值:auto) – 判断是否进行模型训练、预测
kwargs (dict) – 参数字典
- score(X, y, **kwargs)
- class bm.visual.visual_base.TargetVisualizer(ax=None, fig=None, **kwargs)
Bases:
VisualBase标签可视化基类
- Parameters
ax (matplotlib Axes, default: None) – 标签轴
fig (matplotlib Figure, default: None) – 标签画布
kwargs (dict) – 一些必要的参数,继承于sklearn
- label_encoder(y)
标签编码
- class bm.visual.visual_base.VisualBase(ax=None, fig=None, **kwargs)
Bases:
BaseEstimator定义使用matplotlib创建、存储以及可视化展示的基类。 继承于sklearn的BaseEstimator类。 主要是定义可视化的数据输入规范等作用。
- Parameters
ax (matplotlib的轴,默认值:None) – 绘制图形的轴。如果在当前轴中没有传递将使用(或者根据需要生成)。
fig (matplotlib初始化绘制图,默认值:None) – 通过初始化图绘制可视化的图形,如果没有传参则会使用(或者根据需要生成)。
kwargs (dict) – 绘图所需要的关键参数
- property ax
- draw(**kwargs)
连接到matplotlib接口,并创建以图形或轴的形式对可视化工具进行训练的数据
- Parameters
kwargs (dict) – 通用的关键词字典
- property fig
- finalize()
返回轴的装饰器
- Parameters
kwargs (dict) – 通用的关键词字典
- set_title(title=None)
设置当前轴的标题
- Parameters
title (string, 默认值: None) – 增加图形的标题
- show(outpath=None, clear_figure=False, **kwargs)
图形展示方法
- Parameters
outpath (string, 默认值: None) – 图形保存路径
clear_figure (Boolean, 默认值: False) – 如果为True,保存到文件或显示在屏幕上后清除图形。
kwargs (dict) – 通用的关键词字典
Notes
- property size
- vis(X, y=None, **kwargs)
可视化的主要入口,方便后续继承重写
- Parameters
X (ndarray or DataFrame, shape(n,m)) – 输入的DataFrame或者numpy.ndarray类型的数据
y (ndarray or Series, shape(n,)) – 输入的类型为numpy.ndarray或者Series的类别标签
kwargs (dict) – 继承于sklean的一些必要参数
- Returns
self – 返回基类以此来支持后续的pipelines
- Return type
bm.visual.visual_utils module
- exception bm.visual.visual_utils.BrickError
Bases:
ExceptionThe root exception for all yellowbrick related errors.
- class bm.visual.visual_utils.ColorPalette(name_or_list)
Bases:
listA wrapper for functionality surrounding a list of colors, including a context manager that allows the palette to be set with a with statement.
- as_hex()
Return a color palette with hex codes instead of RGB values.
- as_rgb()
Return a color palette with RGB values instead of hex codes.
- plot(size=1)
Plot the values in the color palette as a horizontal array. See Seaborn’s palplot function for inspiration.
- Parameters
size (int) – scaling factor for size of the plot
- class bm.visual.visual_utils.ContribEstimator(estimator, estimator_type=None)
Bases:
object包装器
- exception bm.visual.visual_utils.ModelError
Bases:
BrickErrorA problem when interacting with sklearn or the ML framework.
- exception bm.visual.visual_utils.NotFitted
Bases:
ModelErrorAn action was called that requires a fitted model.
- classmethod from_estimator(estimator, method=None)
- class bm.visual.visual_utils.Wrapper(obj)
Bases:
object对象包装类
提供getatter方法获取对象方法
- Parameters
obj (object) – 需要进行包装的object对象
- bm.visual.visual_utils.bar_stack(data, ax=None, labels=None, ticks=None, colors=None, colormap=None, orientation='vertical', legend=True, legend_kws=None, **kwargs)
An advanced bar chart plotting utility that can draw bar and stacked bar charts from data, wrapping calls to the specified matplotlib.Axes object.
- Parameters
data (2D array-like) – The data passed to the Visualizer. Rows represent each stack in the bar chart and columns represent each bar. Therefore, a single bar chart is created by passing a 2D array containing a single row, while the data to create a bar chart with 3 stacks would have a shape of (3, b).
ax (matplotlib.Axes, default: None) – The axes object to draw the barplot on, uses plt.gca() if not specified.
labels (list of str, default: None) – The labels for each row in the bar stack, used to create a legend.
ticks (list of str, default: None) – The labels for each bar, added to the x-axis for a vertical plot, or the y-axis for a horizontal plot.
colors (array-like, default: None) – Specify the colors of each bar, each row in the stack, or every segment.
colormap (string or matplotlib cmap) – Specify a colormap for each bar, each row in the stack, or every segment.
orientation (‘vertical’ or ‘horizontal’) – Specifies a horizontal or vertical bar chart.
legend (boolean, default: True) – If True, the function add a legend with the plot
legend_kws (dict, default: None) – Additional keyword arguments for the legend components.
kwargs (dict) – Additional keyword arguments to pass to
ax.bar.
- bm.visual.visual_utils.check_fitted(estimator, is_fitted_by='auto', **kwargs)
- Parameters
estimator (sklearn.Estimator) – 模型
is_fitted_by (bool or str, default: 'auto') –
- kwargsdict
参数字典
- Returns
is_fitted – Whether or not the model is already fitted
- Return type
bool
- bm.visual.visual_utils.color_palette(palette=None, n_colors=None)
Return a color palette object with color definition and handling.
Calling this function with
palette=Nonewill return the current matplotlib color cycle.This function can also be used in a
withstatement to temporarily set the color cycle for a plot or set of plots.- Parameters
palette (None or str or sequence) –
Name of a palette or
Noneto return the current palette. If a sequence the input colors are used but possibly cycled.Available palette names from
yellowbrick.colors.palettesare:accentdarkpairedpastelbold
mutedcolorblindsns_colorblindsns_deepsns_muted
sns_pastelsns_brightsns_darkflatuineural_paint
n_colors (None or int) – Number of colors in the palette. If
None, the default will depend on howpaletteis specified. Named palettes default to 6 colors which allow the use of the names “bgrmyck”, though others do have more or less colors; therefore reducing the size of the list can only be done by specifying this parameter. Asking for more colors than exist in the palette will cause it to cycle.
- Returns
list(tuple) – Returns a ColorPalette object, which behaves like a list, but can be used as a context manager and possesses functions to convert colors.
.. seealso:: –
set_palette()Set the default color cycle for all plots.
set_color_codes()Reassign color codes like
"b","g", etc. to colors from one of the yellowbrick palettes.colors.resolve_colors()Resolve a color map or listed sequence of colors.
- bm.visual.visual_utils.color_sequence(palette=None, n_colors=None)
Return a ListedColormap object from a named sequence palette. Useful for continuous color scheme values and color maps.
Calling this function with
palette=Nonewill return the default color sequence: Color Brewer RdBu.- Parameters
palette (None or str or sequence) –
Name of a palette or
Noneto return the default palette. If a sequence the input colors are used to create a ListedColormap.The currently implemented color sequences are from Color Brewer.
Available palette names from
yellowbrick.colors.palettesare:- py:const
Blues
- py:const
BrBG
- py:const
BuGn
- py:const
BuPu
- py:const
GnBu
- py:const
Greens
- py:const
Greys
- py:const
OrRd
- py:const
Oranges
- py:const
PRGn
- py:const
PiYG
- py:const
PuBu
- py:const
PuBuGn
- py:const
PuOr
- py:const
PuRd
- py:const
Purples
- py:const
RdBu
- py:const
RdGy
- py:const
RdPu
- py:const
RdYlBu
- py:const
RdYlGn
- py:const
Reds
- py:const
Spectral
- py:const
YlGn
- py:const
YlGnBu
- py:const
YlOrBr
- py:const
YlOrRd
- py:const
ddl_heat
n_colors (None or int) – Number of colors in the palette. If
None, the default will depend on howpaletteis specified - selecting the largest sequence for that palette name. Note that sequences have a minimum lenght of 3 - if a number of colors is specified that is not available for the sequence aValueErroris raised.
- Returns
Returns a ListedColormap object, an artist object from the matplotlib library that can be used wherever a colormap is necessary.
- Return type
colormap
- bm.visual.visual_utils.div_safe(numerator, denominator)
Ufunc-extension that returns 0 instead of nan when dividing numpy arrays
- Parameters
numerator (array-like) –
denominator (scalar or array-like that can be validly divided by the numerator) –
array (returns a numpy) –
example (div_safe( [-1, 0, 1], 0 ) == [0, 0, 0]) –
- bm.visual.visual_utils.get_color_cycle()
Returns the current color cycle from matplotlib.
- bm.visual.visual_utils.get_model_name(model)
获取模型的名称
- Parameters
model (class or instance) – 模型对象
- Returns
name – 模型的名称
- Return type
string
- bm.visual.visual_utils.is_classifier(estimator)
- bm.visual.visual_utils.is_dataframe(obj)
Returns True if the given object is a Pandas Data Frame.
- Parameters
obj (instance) – The object to test whether or not is a Pandas DataFrame.
- bm.visual.visual_utils.is_estimator(model)
判断模型是否为estimator
- Parameters
estimator (class or instance) –
- bm.visual.visual_utils.is_fitted(estimator)
确保模型已经训练过
- bm.visual.visual_utils.is_regressor(estimator)
- bm.visual.visual_utils.memoized(fget)
- bm.visual.visual_utils.resolve_colors(n_colors=None, colormap=None, colors=None)
Generates a list of colors based on common color arguments, for example the name of a colormap or palette or another iterable of colors. The list is then truncated (or multiplied) to the specific number of requested colors.
- Parameters
n_colors (int, default: None) – Specify the length of the list of returned colors, which will either truncate or multiple the colors available. If None the length of the colors will not be modified.
colormap (str, yellowbrick.style.palettes.ColorPalette, matplotlib.cm, default: None) – The name of the matplotlib color map with which to generate colors.
colors (iterable, default: None) – A collection of colors to use specifically with the plot. Overrides colormap if both are specified.
- Returns
colors – A list of colors that can be used in matplotlib plots.
- Return type
list
Notes
This function was originally based on a similar function in the pandas plotting library that has been removed in the new version of the library.
- bm.visual.visual_utils.set_color_codes(palette='accent')
Change how matplotlib color shorthands are interpreted.
Calling this will change how shorthand codes like “b” or “g” are interpreted by matplotlib in subsequent plots.
- Parameters
palette (str) – Named yellowbrick palette to use as the source of colors.
See also
set_paletteColor codes can also be set through the function that sets the matplotlib color cycle.
- bm.visual.visual_utils.set_palette(palette, n_colors=None, color_codes=False)
Set the matplotlib color cycle using a seaborn palette.
- Parameters
palette (yellowbrick color palette | seaborn color palette (with
sns_prepended)) – Palette definition. Should be something thatcolor_palette()can process.n_colors (int) – Number of colors in the cycle. The default number of colors will depend on the format of
palette, see thecolor_palette()documentation for more information.color_codes (bool) – If
Trueandpaletteis a seaborn palette, remap the shorthand color codes (e.g. “b”, “g”, “r”, etc.) to the colors from this palette.