bm.visual package

可视化

bm.visual.color_style module

bm.visual.color_style.find_text_color(base_color, dark_color='black', light_color='white', coef_choice=0)

背景和文本颜色选择 用户可以指定深色和浅色文本颜色,或接受默认值黑色和白色

Parameters
  • base_color – RGB的背景颜色

  • dark_color – matplotlib颜色

  • light_color – 文本高亮

  • coef_choice – 输入0或1进行索引,默认为0

bm.visual.features_visual module

特征列之间的相关度、特征标签之间的相关度可视化

class bm.visual.features_visual.BinningPlot(ax, bin_method='interpolate', picture_name='feature_bin', save_file=False, **kwargs)

Bases: FeatureVisualizer

woe可视化, 通过计算woe以及iv进行分箱

Parameters
  • ax (ax, 默认值 : None) –

  • bin_X (Dataframe) – 分箱数据

  • title (str, 默认值 : None) – 特征名称

  • display_iv (bool, 默认值 : False) – 是否展示iv

draw(bad_rate, samples_num, column, **kwargs)
Parameters
  • bad_rate (DataFrame) – bad的数量

  • samples_num (DataFrame) – 相应的样本数量

finalize(**kwargs)

返回轴的装饰器

Parameters

kwargs (dict) – 通用的关键词字典

init_bin_method(data, column, target=None, target_value=None, num_clusters=5, max_interval=10, special_attributes=None, tree_params=None)
Parameters
  • data (Dataframe) – 输入的数据

  • column (str) – 特征列

  • target (str) – 标签列名

  • target_value (Any) – 目标列值

  • num_clusters (int) – 聚类簇数

  • max_interval (int, 默认值 10) – 分箱最大间隔

  • special_attributes (str, 默认值 None) – 特殊属性

  • tree_params (dict, 默认值 None) – 特征数参数

visual(data, column, target=None, target_value=None, num_clusters=5, max_interval=10, special_attributes=None, tree_params=None, bad_rate_plot=False, **kwargs)

通过不同的分箱方法,绘制分箱图

Parameters
  • data (DataFrame, shape(n,m)) – 输入数据

  • column (str) – 分箱特征列

  • target (str, 默认值 : None) – 目标值(分类标签列)

  • target_value (str, 默认值 : None) – 现支持二分类(如 : bad, good)

  • num_clusters (int, 默认值 : 5) – 聚类簇数

  • max_interval (int, 默认值 : 10) – 最大间隔数

  • special_attributes (str, 默认值: None) – 特殊特征名

  • tree_params (dict, 默认值 : None) – 决策树参数字典

  • bad_rate_plot (bool, 默认值 : False) – 绘制分箱bad_rate

class bm.visual.features_visual.FeatureCorrelationPlot(ax, columns, picture_name='feature_correlation', save_file=False, **kwargs)

Bases: FeatureVisualizer

类别特征统计图,对于非数值型的特征进行统计

Parameters
  • ax (matplotlib Axes, default: None) – 如果hist=True,则添加到上方(xhax)和右侧(yhax)

  • X (Dataframe, 默认值 : None) – 输入的数据

  • columns (list, default: None) –

  • sub_col (int, 默认值 : 2) –

  • label (str, default : None) –

  • kwargs (dict) – 关键参数字典

Examples

>>> viz = FeatureCorrelationPlot()
>>> viz.visual(X)
>>> viz.show()
draw(corr_df, colormap, mask=False, **kwargs)
Parameters
  • corr_df (Dataframe) – 正确的特征

  • colormap (plt) – 颜色设置

  • mask (bool) – 是否需要mask

  • kwargs (dict) – 参数字典

finalize()

返回轴的装饰器

Parameters

kwargs (dict) – 通用的关键词字典

visual(X)
Parameters

X (Dataframe) – 输入的类别型特征

class bm.visual.features_visual.FeaturesBoxPlot(ax, columns, sub_col, picture_name='feature_box', save_file=False, **kwargs)

Bases: FeatureVisualizer

Parameters
  • ax (matplotlib Axes, default: None) – 如果hist=True,则添加到上方(xhax)和右侧(yhax)

  • X (Dataframe, 默认值 : None) – 输入的数据

  • columns (list, default: None) – 特征列表

  • sub_col (int, 默认值 : 2) –

  • kwargs (dict) – 关键参数字典

Examples

>>> viz = FeaturesBoxPlot()
>>> viz.visual(X, y)
>>> viz.show()
draw(x, idx, feat, **kwargs)

生成画布,处理输入数据,计算最小值(min),下四分位数(Q1),中位数(median),上四分位数(Q3),最大值(max)

Parameters
  • x (Dataframe, 默认值 : None) – 输入数据

  • idx (int) – 特征列索引

  • feat (any) – 每个索引对应的特征

  • kwargs (dict) – 参数字典

finalize()

修改图片的一些参数

visual(X)
Parameters

X (Dataframe) – 输入数据

class bm.visual.features_visual.FeaturesCategoryCount(ax, columns, label, sub_col, picture_name='feature_category', save_file=False, **kwargs)

Bases: FeatureVisualizer

类别型特征统计图,统计非数值类型的类别特征

Parameters
  • ax (matplotlib Axes, default: None) – 如果hist=True,则添加到上方(xhax)和右侧(yhax)

  • X (Dataframe, 默认值 : None) – 输入的数据

  • columns (list, default: None) –

  • sub_col (int, 默认值 : 2) –

  • label (str, default : None) –

  • kwargs (dict) – 关键参数字典

Examples

>>> viz = FeaturesCategoryCount()
>>> viz.visual(X, y)
>>> viz.show()
draw(x, idx, feat, **kwargs)

生成画布,处理输入数据

Parameters
  • x (Dataframe, 默认值 : None) – 输入数据

  • idx (int) – 每一个类别特征的索引

  • feat (any) – 每一个索引对应的特征

finalize()

修改一些图片的属性

visual(X)
Parameters

X (Dataframe) – 输入的类别特征数据

class bm.visual.features_visual.FeaturesDistributionPlot(ax, columns, label, sub_col, picture_name='feature_distribute', save_file=False, **kwargs)

Bases: FeatureVisualizer

特征分布图, 绘制特征的分布

Parameters
  • ax (轴, 默认值 : None) –

  • columns (str or list, 默认值 : None) – 特征列名

  • label (str) – 标签

  • sub_col (int, 默认值 : 5) –

  • kwargs (dict) – 关键参数字典

Examples

>>> viz = FeaturesDistributionPlot()
>>> viz.visual(X, y)
>>> viz.show()
draw(x, idx, feat, **kwargs)
Parameters
  • x (Dataframe, 默认值 : None) – 输入数据

  • idx (int) – 每一列特征索引

  • feat (str) – 索引对应的特征值

finalize(**kwargs)

修改图的属性

visual(X)
Parameters

X (Dataframe) – 输入的数据

class bm.visual.features_visual.FeaturesVisualPlot(ax=None, columns=None, correlation='pearson', kind='scatter', hist=True, alpha=0.65, joint_kws=None, hist_kws=None, picture_name='features_visual', save_file=False, **kwargs)

Bases: FeatureVisualizer

特征数据可视化,允许不同特征之间的对比交互可视化。 可以实现特征与标签之间通过不同算法计算其相关性并进行可视化

“columns”参数可以用于指定“X”中两个所需列的索引。

通过将参数“hist”设置为“True”,可以包含直方图、频率分布,或概率密度函数的“密度”。

Parameters
  • ax (matplotlib Axes, default: None) – 如果hist=True,则添加到上方(xhax)和右侧(yhax)

  • columns (int, str, [int, int], [str, str], default: None) –

  • correlation (str, default: 'pearson') – 相关性计算方法:可选’pearson’, ‘covariance’, ‘spearman’, ‘kendalltau’

  • kind (str in {'scatter', 'hex'}, default: 'scatter') – 图形打印类型。注意,当kind=’hex’时,目标不能按颜色绘制。

  • hist ({True, False, None, 'density', 'frequency'}, default: True) – 默认绘制直方图,显示两个输入变量分布。如果设置为“density”,将绘制概率密度函数。如果设置为True或“frequency”,则将绘制频率。

  • alpha (float, default: 0.65) – 指定透明度,其中1完全不透明,0完全透明。该特性使密集聚集点更为可见。

  • kwargs (dict) – 关键参数字典

Examples

>>> viz = FeaturesVisualPlot(columns=["temp", "humidity"])
>>> viz.visual(X, y)
>>> viz.show()
draw(x, y, xlabel=None, ylabel=None)
Parameters
  • x (1D array-like) – x每一列的与y

  • y (1D array-like) – x每一列的与y

  • xlabel (str) – x轴与y轴的标签

  • ylabel (str) – x轴与y轴的标签

finalize(**kwargs)

修改图像属性

is_dataframe(data)

对输入的数据进行转化,使其变为DataFrame类型

Parameters

data (instance) – 输入的数据

visual(X, y=None)

可视化处理,输入数据进行传递

Parameters
  • X (array-like) – 一维或二维的numpy数组,通常为二维。

  • y (array-like, 默认值: None) – 一维的标签数组

property xhax

直方图的x轴

property yhax

直方图的y轴

class bm.visual.features_visual.WoeIvPlot(ax, title=None, display_iv=False, picture_name='woe_iv', save_file=False, **kwargs)

Bases: FeatureVisualizer

WOE-IV分箱可视化

binxDataFrame

分箱结果

titlestr

图片标题

display_ivbool

是否显示对应的IV值

Examples

>>> viz = WoeIvPlot()
>>> viz.visual(X)
>>> viz.show()
draw(binx, ind, y_left_max, y_right_max, **kwargs)
Parameters
  • binx (Dataframe) – 分箱结果

  • ind (list) – x轴的刻度值

  • y_left_max (int) – 左偏移量

  • y_right_max (int) – 右偏移量

  • kwargs (dict) – 参数字典

finalize()

返回轴的装饰器

Parameters

kwargs (dict) – 通用的关键词字典

visual(binx, **kwargs)
Parameters
  • binx (Dataframe) – 分箱结果

  • kwargs (dict) – 参数字典

bm.visual.interpretability_visual module

class bm.visual.interpretability_visual.FeatureImportancePlot(ax, picture_name='features_importance_visual', save_file=False, **kwargs)

Bases: FeatureVisualizer

特征重要性可视化

Parameters
  • ax (matplotlib Axes, default: None) – 如果hist=True,则添加到上方(xhax)和右侧(yhax)

  • picture_name (str, default: features_importance_visual) – 图片保存名称

  • save_file (boolean, default: False) – 图片保存路径

  • kwargs (dict) – 关键参数字典

draw(X, top_n=50, figsize=(20, 12))
Parameters
  • X (Dataframe) – 特征重要性dataframe

  • top_n (int) – top-n的特征

  • figsize (tuple) – 图像的size

finalize(X)
Parameters

X (Dataframe) – 特征重要度

visual(X)
Parameters

X (dataframe) – 输入的数据

class bm.visual.interpretability_visual.ShapPlot(estimator, ax=None, picture_name='feature_shap_plot', save_file=False, mode=None, **kwargs)

Bases: FeatureVisualizer

筛选的特征的解释性可视化

Parameters
  • ax (matplotlib Axes, default: None) – 如果hist=True,则添加到上方(xhax)和右侧(yhax)

  • picture_name (str, default: features_importance_visual) –

  • save_file (boolean, default: False) –

  • mode (str, default: None) – mode可选项为force或summary

  • kwargs (dict) – 关键参数字典

draw(explainer, shap_values, X, feature_names, show)
Parameters
  • estimator (pipeline) – 模型

  • X (Dataframe) – 输入的筛选特征

finalize()

返回轴的装饰器

Parameters

kwargs (dict) – 通用的关键词字典

visual(estimator, X, feature_names, show)
Parameters
  • estimator (pipeline) – 模型

  • X (Dataframe) – 输入的筛选特征

bm.visual.model_visual module

模型训练、预测的评估绘图

class bm.visual.model_visual.CNTPlot(estimator, ax=None, per_class=True, binary=False, classes=None, encoder=None, is_fitted='auto', force_model=False, picture_name='cnt', save_file=False, **kwargs)

Bases: ClassificationVisualizer

Parameters
  • estimator (pipeline) – 使用的模型

  • ax (, 默认值: None) –

  • per_class (bool, 默认值: True) – 值为True则绘制每个类别的ROC曲线,如果只需要宏观或微观平均曲线,则应将其设置为False

  • binary (bool, 默认值: False) – 二分类

  • classes (list of str, 默认值: None) – 标签类别

  • encoder (dict or LabelEncoder, 默认值:None) – 标签编码器,sklean方法

  • is_fitted (boolean or str , 默认值: 'auto') – 判断模型是否训练

  • force_model (bool, 默认值:False) – 线条颜色

  • kwargs (dict) – 参数字典

Examples

>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.model_selection import train_test_split
>>> data = load_data("occupancy")
>>> features = ["temp", "relative humidity", "light", "C02", "humidity"]
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
>>> oz = CNTPlot(LogisticRegression())
>>> oz.fit(X_train, y_train)
>>> oz.score(X_test, y_test)
>>> oz.show()
draw()
finalize(**kwargs)

图形设置

Parameters

kwargs (dict) – 参数字典

fit(X, y=None, **kwargs)

运行模型

score(X, y, **kwargs)

模型训练、预测值

Parameters
  • X (ndarray or DataFrame, shape(n,m)) – 有m个特征的输入矩阵

  • y (ndarray or Series, shape(n,)) – 一维类别标签

Returns

score_ – 评估得到的值

Return type

float

class bm.visual.model_visual.ClassificationReportPlot(estimator, ax=None, classes=None, cmap='YlOrRd', support=None, encoder=None, is_fitted='auto', force_model=False, colorbar=True, fontsize=None, picture_name='classification_report', save_file=False, **kwargs)

Bases: ClassificationVisualizer

混淆矩阵可视化视图

Parameters
  • estimator (pipeline) – 模型

  • ax (matplotlib轴,默认值:None) –

  • classes (list or str, 默认值: None) – 类别标签

  • cmap (str, 默认值: 'YlOrRd') – 颜色集合

  • support ({True, False, None, 'percent', 'count'}, 默认值: None) – 模型训练时的颜色

  • encoder (dict or LabelEncoder, 默认值:None) – 标签编码器,sklean方法

  • is_fitted (boolean or str , 默认值: 'auto') – 判断模型是否训练

  • force_model (bool, 默认值:False) – 线条颜色

  • colorbar (bool, 默认值:True) – 图形颜色

  • fontsize (int or None, 默认值:None) – 字体大小

  • kwargs (dict) – 参数字典

Examples

>>> from sklearn.linear_model import LogisticRegression
>>> viz = ClassificationReportPlot(LogisticRegression())
>>> viz.fit(X_train, y_train)
>>> viz.scores(X_test, y_test)
>>> viz.show()
draw()
finalize(**kwargs)

对于生成的图像进行标题等属性设置

Parameters

kwargs (dict) – 参数字典

scores(X, y)

生成 classification report.

Parameters
  • X (ndarray or DataFrame of shape n x m) – 一个(n x m)的特征矩阵

  • y (ndarray or Series of length n) – 对应的标签

Returns

score_ – 准确率(accuracy)

Return type

float

class bm.visual.model_visual.ConfusionMaxtrixPlot(estimator, ax=None, sample_weight=None, percent=False, classes=None, encoder=None, cmap='YlOrRd', fontsize=None, is_fitted='auto', force_model=False, label_transfer=None, picture_name='confusion_maxtrix', save_file=False, **kwargs)

Bases: ClassificationVisualizer

分类混淆矩阵可视化

Parameters
  • estimator (模型) – 使用的模型

  • ax (matplotlib轴,默认值:None) –

  • sample_weight (array-like, shape(n_samples,)) – 可选项,样本权重

  • percent (bool, 默认值False) – 数字或百分比展示

  • classes (list or str, 默认值: None) – 类别标签

  • cmap (str, 默认值: 'YlOrRd') – 颜色集合

  • encoder (dict or LabelEncoder, 默认值:None) – 标签编码器,sklean方法

  • is_fitted (boolean or str , 默认值: 'auto') – 判断模型是否训练

  • force_model (bool, 默认值:False) – 线条颜色

  • fontsize (int or None, 默认值:None) – 字体大小

  • kwargs (dict) – 参数字典

Examples

>>> from sklearn.linear_model import LogisticRegression
>>> viz = ConfusionMaxtrixPlot(LogisticRegression())
>>> viz.fit(X_train, y_train)
>>> viz.score(X_test, y_test)
>>> viz.show()
draw()

生成相应的混淆矩阵图

finalize(**kwargs)

返回轴的装饰器

Parameters

kwargs (dict) – 通用的关键词字典

score(X, y, **kwargs)

通过比较实例X上的预测与目标向量y指定的真值,根据提供的测试数据绘制混淆矩阵。

Parameters
  • X (ndarray or DataFrame of shape n x m) – 一个(n x m)的特征矩阵

  • y (ndarray or Series of length n) – 对应的标签

Returns

score_ – 准确率(accuracy)

Return type

float

show(outpath=None, **kwargs)

图形展示方法

Parameters
  • outpath (string, 默认值: None) – 图形保存路径

  • clear_figure (Boolean, 默认值: False) – 如果为True,保存到文件或显示在屏幕上后清除图形。

  • kwargs (dict) – 通用的关键词字典

Notes

class bm.visual.model_visual.FeatureImportancePlot(estimator, ax=None, labels=None, relative=True, absolute=False, xlabel=None, stack=False, colors=None, colormap=None, is_fitted='auto', topn=None, picture_name='feature_important', save_file=False, **kwargs)

Bases: ModelVisualizer

对特征按照重要程度进行排序

Parameters
  • estimator (Estimator) – 初始化好的模型

  • ax (matplotlib轴, 默认值: None) – 画图的轴

  • labels (list, 默认值: None) – 标签列表

  • relative (bool, 默认值: True) – 相对重要程度

  • absolute (bool, 默认值: False) – 绝对重要程度

  • xlabel (str, 默认值: None) – x轴的标签

  • stack (bool, 默认值: False) – 绘图类型

  • colors (list of strings) – 如果“stack==False”,请为图表中的每个条指定颜色。

  • colormap (string or matplotlib cmap) – 如果“stack==True”,请指定一个colormap来为类着色。

  • is_fitted (bool or str, 默认值 : 'auto') – 判断是否进行fit

  • topn (int, 默认值 : None) – 展示top-n的结果,默认全部展示

  • kwargs (dict) – 参数字典

Examples

>>> from sklearn.ensemble import GradientBoostingClassifier
>>> visualizer = FeatureImportancePlot(GradientBoostingClassifier())
>>> visualizer.fit(X, y)
>>> visualizer.show()
draw(**kwargs)

绘制特征重要度图

Note

不经过特征筛选,直接利用模型生成的特征排序

finalize(**kwargs)

图形属性修改

fit(X, y=None, **kwargs)

训练模型

Parameters
  • X (numpy.ndarray or DataFrame, shape(n,m)) – 输入的训练数据

  • y (numpy.ndarray or Series, shape(n,)) – 输入的标签

  • kwargs (dict) – 参数字典

class bm.visual.model_visual.KDEPlot(estimator, ax=None, per_class=True, binary=False, classes=None, encoder=None, is_fitted='auto', force_model=False, picture_name='kde', save_file=False, **kwargs)

Bases: ClassificationVisualizer

Kernel Density Estimator Plot

Parameters
  • estimator (estimator) – 使用的模型

  • ax (matplotlib 轴, 默认值: None) –

  • per_class (bool, 默认值: True) – 值为True则绘制每个类别的ROC曲线,如果只需要宏观或微观平均曲线,则应将其设置为False

  • binary (bool, 默认值: False) – 二分类

  • classes (list of str, 默认值: None) – 标签类别

  • encoder (dict or LabelEncoder, 默认值:None) – 标签编码器,sklean方法

  • is_fitted (boolean or str , 默认值: 'auto') – 判断模型是否训练

  • force_model (bool, 默认值:False) –

  • kwargs (dict) – 参数字典

Examples

>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.model_selection import train_test_split
>>> data = load_data("occupancy")
>>> features = ["temp", "relative humidity", "light", "C02", "humidity"]
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
>>> oz = KDEPlot(LogisticRegression())
>>> oz.fit(X_train, y_train)
>>> oz.score(X_test, y_test)
>>> oz.show()
draw()

根据数据画图

Returns

ax – matlibplot ax

Return type

ax

finalize(**kwargs)

ROCAUC图形修改

Parameters

kwargs (dict) – 参数字典

fit(X, y=None, **kwargs)

重构fit过程,继承于sklearn的base模块

score(X, y, **kwargs)

模型训练、预测值

Parameters
  • X (ndarray or DataFrame, shape(n,m)) – 有m个特征的输入矩阵

  • y (ndarray or Series, shape(n,)) – 一维类别标签

Returns

score_ – 评估得到的值

Return type

float

class bm.visual.model_visual.KSPlot(estimator, ax=None, classes=None, encoder=None, is_fitted='auto', force_model=False, picture_name='ks', save_file=False, **kwargs)

Bases: ClassificationVisualizer

绘制KS曲线

Parameters
  • estimator (pipeline) – 模型

  • ax (轴,默认值: None) –

  • classes (list or str, 默认值:None) – 类别

  • encoder (dict or LabelEncoder, 默认值: None) – 标签编码器

  • is_fitted (bool or str, 默认值 : "auto") – 是否进行过fit

  • force_model (bool, 默认值 : False) –

  • kwargs (dict) – 参数字典

draw(score_bin, good_rates, bad_rates, ks_lst, **kwargs)
Parameters
  • score_bin (list) – score的index

  • good_rates (list) – 好样本率

  • bad_rates (list) – 坏样本率

  • ks_lst (list) – 对应的ks值

  • kwargs (dict) – 参数字典

finalize()

返回轴的装饰器

Parameters

kwargs (dict) – 通用的关键词字典

fit(X, y=None, **kwargs)

重构模型fit过程,继承于sklearn的base类

Parameters
  • X (Dataframe) – 训练数据

  • y (list) – 训练数据对应的标签

  • kwargs (dict) – 参数字典

score(X_train, y_train, X_test=None, y_test=None, **kwargs)

返回模型训练过程中的预测值

Parameters
  • X_train (Dataframe) – 训练数据

  • y_train (list or ndarray) – 训练数据标签

  • X_test (Dataframe) – 测试数据

  • y_test (list or ndarray) – 测试数据对应标签

  • kwargs (dict) – 参数字典

class bm.visual.model_visual.LearningCurve(estimator, ax=None, train_sizes=array([0.1, 0.325, 0.55, 0.775, 1.0]), cv=None, scoring=None, exploit_incremental_learning=False, n_jobs=1, pre_dispatch='all', shuffle=False, random_state=None, picture_name='learning_curve', save_file=False, **kwargs)

Bases: ModelVisualizer

绘制数据在模型上的学习曲线

Parameters
  • estimator (pipeline) – 学习器(初始化模型)

  • ax (, default : None) –

  • train_sizes (array-like, shape (n_ticks,), default: np.linspace(0.1,1.0,5)) –

  • cv (int, default: None, 做cross-validation的时候,数据分成的份数,其中一份作为cv集,其余n-1份作为training) –

  • scoring (string, callable or None, default: None) – optional[‘accuracy’, ‘adjusted_rand_score’, ‘average_precision’, ‘f1’, ‘f1_macro’, ‘f1_micro’, ‘f1_samples’, ‘f1_weighted’, ‘neg_log_loss’, ‘neg_mean_absolute_error’, ‘neg_mean_squared_error’, ‘neg_median_absolute_error’, ‘precision’, ‘precision_macro’, ‘precision_micro’, ‘precision_samples’, ‘precision_weighted’, ‘r2’, ‘recall’, ‘recall_macro’, ‘recall_micro’, ‘recall_samples’, ‘recall_weighted’, ‘roc_auc’]

  • exploit_incremental_learning (boolean, default: False) – 如果估计器支持增量学习,这将用于加速不同训练集大小的拟合。

  • n_jobs (int, optional, default : 1) – 并行数

  • pre_dispatch (integer or string, optional, default : all) – 并行执行的预调度作业数

  • shuffle (boolean, optional) – shuffle operation

  • random_state (int, RandomState instance or None, optional (default=None)) – 设置随机种子,当shuffle = True时设置

  • kwargs (dict) – 参数字典

Examples

>>> from sklearn.naive_bayes import GaussianNB
>>> model = LearningCurve(GaussianNB())
>>> model.fit(X, y)
>>> model.show()
draw(**kwargs)

Renders the training and test learning curves.

finalize(**kwargs)

设置title以及轴标签

fit(X, y=None, **kwargs)

重构fit过程,继承于sklearn的base类

Parameters
  • X (Dataframe) – 训练数据

  • y (list or ndarray) – 训练数据对应的标签

  • kwargs (dict) – 参数字典

class bm.visual.model_visual.LiftPlot(estimator, ax=None, per_class=True, binary=False, classes=None, encoder=None, is_fitted='auto', force_model=False, picture_name='lit', save_file=False, **kwargs)

Bases: ClassificationVisualizer

提升度曲线

Parameters
  • estimator (pipeline) – 使用的模型

  • ax (matplotlib 轴, 默认值: None) –

  • per_class (bool, 默认值: True) – 值为True则绘制每个类别的ROC曲线,如果只需要宏观或微观平均曲线,则应将其设置为False

  • binary (bool, 默认值: False) – 二分类

  • classes (list of str, 默认值: None) – 标签类别

  • encoder (dict or LabelEncoder, 默认值:None) – 标签编码器,sklean方法

  • is_fitted (boolean or str , 默认值: 'auto') – 判断模型是否训练

  • force_model (bool, 默认值:False) – 线条颜色

  • kwargs (dict) – 参数字典

Examples

>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.model_selection import train_test_split
>>> data = load_data("occupancy")
>>> features = ["temp", "relative humidity", "light", "C02", "humidity"]
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
>>> oz = LiftPlot(LogisticRegression())
>>> oz.fit(X_train, y_train)
>>> oz.score(X_test, y_test)
>>> oz.show()
draw()

连接到matplotlib接口,并创建以图形或轴的形式对可视化工具进行训练的数据

Parameters

kwargs (dict) – 通用的关键词字典

finalize(**kwargs)
Parameters

kwargs (dict) – 参数字典

fit(X, y=None, **kwargs)

重构fit过程, 继承于sklearn的base基类

Parameters
  • X (Dataframe) – 训练数据

  • y (list or ndarray) – 训练数据对应的标签

  • kwargs (dict) – 参数字典

score(X, y, **kwargs)

模型训练、预测值

Parameters
  • X (ndarray or DataFrame, shape(n,m)) – 有m个特征的输入矩阵

  • y (ndarray or Series, shape(n,)) – 一维类别标签

Returns

score_ – 评估得到的值

Return type

float

class bm.visual.model_visual.MarketingPlot(estimator, ax=None, per_class=True, binary=False, classes=None, encoder=None, is_fitted='auto', force_model=False, picture_name='market_form', save_file=False, **kwargs)

Bases: ClassificationVisualizer

营销报表生成,综合了营销所需的各种图像

Parameters
  • estimator (pipeline) – 使用的模型

  • ax (matplotlib 轴, 默认值: None) –

  • per_class (bool, 默认值: True) – 值为True则绘制每个类别的ROC曲线,如果只需要宏观或微观平均曲线,则应将其设置为False

  • binary (bool, 默认值: False) – 二分类

  • classes (list of str, 默认值: None) – 标签类别

  • encoder (dict or LabelEncoder, 默认值:None) – 标签编码器,sklean方法

  • is_fitted (boolean or str , 默认值: 'auto') – 判断模型是否训练

  • force_model (bool, 默认值:False) –

  • kwargs (dict) – 参数字典

Examples

>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.model_selection import train_test_split
>>> data = load_data("occupancy")
>>> features = ["temp", "relative humidity", "light", "C02", "humidity"]
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
>>> oz = LiftPlot(LogisticRegression())
>>> oz.fit(X_train, y_train)
>>> oz.score(X_test, y_test)
>>> oz.show()
cob_draw()
cpr_draw()
draw()

连接到matplotlib接口,并创建以图形或轴的形式对可视化工具进行训练的数据

Parameters

kwargs (dict) – 通用的关键词字典

finalize(**kwargs)

图形属性设置

Parameters

kwargs (dict) – 参数字典

fit(X, y=None, **kwargs)

重构fit过程,继承于sklearn的base类

Parameters
  • X (Dataframe) – 训练数据

  • y (list or ndarray) – 训练数据对应的标签

  • kwargs (dict) – 参数字典

ker_draw()
ks_draw()
prc_draw()
roc_draw()
score(X, y, **kwargs)

模型训练、预测值

Parameters
  • X (ndarray or DataFrame, shape(n,m)) – 有m个特征的输入矩阵

  • y (ndarray or Series, shape(n,)) – 一维类别标签

Returns

score_ – 评估得到的值

Return type

float

class bm.visual.model_visual.PrecisionRecallPlot(estimator, ax=None, classes=None, colors=None, cmap=None, encoder=None, fill_area=None, ap_score=True, micro=True, iso_f1_curves=False, iso_f1_values=(0.2, 0.4, 0.6, 0.8), per_class=False, fill_opacity=0.2, line_opacity=0.8, is_fitted='auto', force_model=False, pr_change=False, picture_name='precision_recall', save_file=False, **kwargs)

Bases: ClassificationVisualizer

精确率(precision)和召回率(recall)的对应图

Parameters
  • estimator (pipeline) – 模型

  • ax (matplotlib 轴, 默认值:None) –

  • classes (list or str, 默认值:None) – 类别标签

  • cmap (str or colormap, 默认值:None) – 颜色选择

  • encoder (dict or LabelEncoder, 默认值: None) – 标签编码器

  • fill_area (bool, 默认值: True) – 覆盖区域颜色

  • ap_score (bool, 默认值 : True) – 图注释

  • micro (bool, 默认值 : True) – micro average

  • iso_f1_curves (bool, 默认值 : None) – ISO F1-Curves

  • iso_f1_values (tuple, 默认值 : (0.2,0.4,0.6,0.8)) – 刻度

  • pre_class (bool, 默认值 : False) – 在多标签是否画每个类别的图

  • fill_opacity (float, 默认值 : 0.2) – 填充区域alpha 偏移值

  • line_opacity (float, 默认值 : 0.8) – 线条偏移值

  • if_fitted (bool or str, 默认值 : auto) – 学习器是否进行fit

  • force_model (bool, 默认值 : False) –

  • kwargs (dict) – 参数字典

Examples

>>> from sklearn.model_selection import train_test_split
>>> from sklearn.svm import LinearSVC
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
>>> viz = PrecisionRecallPlot(LinearSVC())
>>> viz.fit(X_train, y_train)
>>> viz.score(X_test, y_test)
>>> viz.show()
draw()

连接到matplotlib接口,并创建以图形或轴的形式对可视化工具进行训练的数据

Parameters

kwargs (dict) – 通用的关键词字典

finalize()

修改轴信息

fit(X, y=None, **kwargs)

重构模型fit过程,继承于sklearn的base类

Parameters
  • X (Dataframe) – 训练数据

  • y (list) – 训练数据对应的标签

  • kwargs (dict) – 参数字典

score(X, y, **kwargs)
Parameters
  • X (Dataframe) – 训练数据

  • y (list) – 训练数据对应的标签

  • kwargs (dict) – 参数字典

class bm.visual.model_visual.PredictErrorPlot(estimator, ax=None, classes=None, encoder=None, is_fitted='auto', force_model=False, picture_name='predict_error', save_file=False, **kwargs)

Bases: ClassificationVisualizer

预测错误可视化,各个类别预测错误的统计

Parameters
  • estimator (estimator) – 学习器

  • ax (轴,默认值: None) –

  • classes (list or str, 默认值:None) – 类别

  • encoder (dict or LabelEncoder, 默认值: None) – 标签编码器

  • is_fitted (bool or str, 默认值 : "auto") – 是否进行过fit

  • force_model (bool, 默认值 : False) –

  • kwargs (dict) – 参数字典

draw()

Renders the class prediction error across the axis.

Returns

ax – The axes on which the figure is plotted

Return type

Matplotlib Axes

finalize(**kwargs)

修改图片信息

score(X, y, **kwargs)

预测

Parameters
  • X (ndarray or DataFrame, shape(n,m)) – 一个n行m列的矩阵

  • y (ndarray or Series, shape(n,)) – 一个标签array

Returns

score_ – accuracy score

Return type

float

class bm.visual.model_visual.ROCAUCPlot(estimator, ax=None, micro=True, macro=True, per_class=True, binary=False, classes=None, encoder=None, is_fitted='auto', force_model=False, picture_name='roc_auc', save_file=False, **kwargs)

Bases: ClassificationVisualizer

ROC & AUC曲线图

Parameters
  • estimator (pipeline) – 使用的模型

  • ax (matplotlib 轴, 默认值: None) –

  • micro (bool, 默认值: True) – 微平均

  • macro (bool, 默认值: True) – 宏平均

  • per_class (bool, 默认值: True) – 值为True则绘制每个类别的ROC曲线,如果只需要宏观或微观平均曲线,则应将其设置为False

  • binary (bool, 默认值: False) – 二分类

  • classes (list of str, 默认值: None) – 标签类别

  • encoder (dict or LabelEncoder, 默认值:None) – 标签编码器,sklean方法

  • is_fitted (boolean or str , 默认值: 'auto') – 判断模型是否训练

  • force_model (bool, 默认值:False) – 检查模型类别

  • kwargs (dict) – 参数字典

Examples

>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.model_selection import train_test_split
>>> data = load_data("occupancy")
>>> features = ["temp", "relative humidity", "light", "C02", "humidity"]
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
>>> oz = ROCAUCPlot(LogisticRegression())
>>> oz.fit(X_train, y_train)
>>> oz.score(X_test, y_test)
>>> oz.show()
draw()

连接到matplotlib接口,并创建以图形或轴的形式对可视化工具进行训练的数据

Parameters

kwargs (dict) – 通用的关键词字典

finalize(**kwargs)

ROCAUC图形修改

Parameters

kwargs (dict) – 参数字典

fit(X, y=None, **kwargs)

重构模型fit过程,继承于sklearn的base类

Parameters
  • X (Dataframe) – 训练数据

  • y (list) – 训练数据对应的标签

  • kwargs (dict) – 参数字典

score(X, y, **kwargs)

模型训练、预测值

Parameters
  • X (ndarray or DataFrame, shape(n,m)) – 有m个特征的输入矩阵

  • y (ndarray or Series, shape(n,)) – 一维类别标签

Returns

score_ – 评估得到的值

Return type

float

class bm.visual.model_visual.RedidualsPlot(estimator, ax=None, hist=True, qqplot=False, train_color='b', test_color='g', line_color='#111111', train_alpha=0.75, test_alpha=0.75, is_fitted='auto', picture_name='redidual', save_file=False, **kwargs)

Bases: RegressionVisualizer

预测残差可视化

预测值与真实值之间的残差plot

Parameters
  • estimator (回归模型) – 训练好的回归模型

  • ax (matplotlib轴,默认值:None) –

  • hist ({True, False, None, 'density', 'frequency'}, 默认值: True) – 残差分布图,设置为density是密度图, frequency是频率图

  • qqplot ({True, False}, 默认值: False) – 残差的分位数

  • train_color (color, 默认值: 'g') – 模型训练时的颜色

  • test_color (color, 默认值:'g') – 模型测试的图颜色

  • line_color (color, 默认值:dark grey) – 线条颜色

  • train_alpha (float, 默认值:0.75) – 训练数据透明度

  • test_alpha (float, 默认值:0.75) – 测试数据透明度

  • is_fitted (boolean or str , 默认值: 'auto') – 判断模型是否训练

  • kwargs (dict) – 参数字典

Examples

>>> from sklearn.linear_model import Ridge
>>> model = RedidualsPlot(Ridge())
>>> model.fits(X_train, y_train)
>>> model.score(X_test, y_test)
>>> model.show()
draw(y_pred, residuals, train=False, **kwargs)

根据数据绘制图形

Parameters
  • y_pred (ndarray) – 一维的预测值

  • residuals (ndarray) – 一维的残差值

  • train (boolean, 默认值: False) – 是否训练模式

  • kwargs (dict) – 参数字典

finalize(**kwargs)

图形的title等属性修改

Parameters

kwargs (dict) – 参数字典

fits(X, y, **kwargs)
Parameters
  • X (ndarray or DataFrame,shape(n,m)) – 输入数据

  • y (ndarray or Series, shape(n,)) – 输入的标签

  • kwargs (dict) – 参数字典

Returns

self – 对象实例

Return type

ResidualsPlot

property hax

Returns the histogram axes, creating it only on demand.

property qqax

返回相应ax的轴

score(X, y=None, train=False, **kwargs)

生成预测值

Parameters
  • X (array-like) – 输入数据

  • y (array-like) – 输入标签

  • train (boolean) – 分流,训练和预测

Returns

score – 相应模式的输出

Return type

float

bm.visual.quick_visual module

对封装的可视化进行快速使用

bm.visual.quick_visual.binning_plot(bin_method, data, column, ax=None, target=None, target_value=None, num_clusters=5, max_interval=10, special_attributes=None, tree_params=None, bad_rate_plot=False, show=False, picture_name='binning_plot', save_file=True, **kwargs)
Parameters
  • bin_method (str, 默认值 : interpolate) – 可选项,[interpolate, quantile, distance, mixed, decision_tree, chi_square, kmeans, best_ks]

  • data (Dataframe) – 输入的数据

  • column (str) –

    分箱特征列

    axax, 默认值:None

    自定义轴,自动设置

  • target (str, 默认值 : None) – 目标值(分类标签列)

  • target_value (str or int, 默认值 : None) – 现支持二分类(如 : bad, good)

  • num_clusters (int, 默认值 : 5) – 聚类簇数

  • max_interval (int, 默认值 : 10) – 最大间隔数

  • special_attributes (str, 默认值 : None) – 特殊特征名

  • tree_params (dict, 默认值 : None) – 决策树参数字典

  • bad_rate_plot (bool, 默认值 : False) – 绘制分箱bad_rate

  • show (bool) – 是否进行可视化展示

  • picture_name (str) – 图片保存路径名称

  • save_file (bool) – 判断是否保存图

  • kwargs (dict) – 参数字典

bm.visual.quick_visual.feature_importance_plot(X, ax=None, picture_name='特征重要性', save_file=True, show=False, **kwargs)
Parameters
  • X (Dataframe) – 筛选保留的特征

  • ax (ax, 默认值:None) – 自定义轴,自动设置

  • picture_name (str) – 保存图片时的名称

  • save_file (bool) – 是否进行图片保存

  • show (bool) – 是否进行可视化展示

  • kwargs (dict) – 参数字典

bm.visual.quick_visual.featurebox_plot(X, ax=None, columns=None, sub_col=None, show=False, picture_name='箱形图', save_file=True, **kwargs)
Parameters
  • X (Dataframe) – 输入的数据

  • ax (ax, 默认值:None) – 自定义轴, 自动设置

  • columns (list) – 特征列表

  • sub_col (int) – 子列

  • show (bool) – 是否进行可视化展示

  • picture_name (str) – 图片保存路径名称

  • save_file (bool) – 判断是否保存图

  • kwargs (dict) – 参数字典

bm.visual.quick_visual.featurecategory_plot(X, ax=None, columns=None, label=None, sub_col=None, show=False, picture_name='类别型特征分布图', save_file=True, **kwargs)
Parameters
  • X (Dataframe) – 输入的数据

  • ax (ax, 默认值:None) – 自定义轴,自动设置

  • columns (list) – 特征列表

  • label (str) – 目标标签名称

  • sub_col (int) – 子列

  • show (bool) – 是否进行可视化展示

  • picture_name (str) – 图片保存路径名称

  • save_file (bool) – 判断是否保存图

  • kwargs (dict) – 参数字典

bm.visual.quick_visual.featurecor_plot(X, ax=None, columns=None, show=False, picture_name='相关性热图', save_file=True, **kwargs)
Parameters
  • X (Dataframe) – 输入的数据

  • ax (ax, 默认值:None) – 自定义轴,自动设置

  • columns (list) – 特征列表

  • show (bool) – 是否进行可视化展示

  • picture_name (str) – 图片保存路径名称

  • save_file (bool) – 判断是否保存图

  • kwargs (dict) – 参数字典

bm.visual.quick_visual.featuredis_plot(X, ax=None, columns=None, label=None, sub_col=None, show=False, picture_name='数值型特征分布图', save_file=True, **kwargs)
Parameters
  • X (Dataframe) – 输入的数据

  • ax (ax, 默认值:None) – 自定义轴,自动设置

  • columns (list) – 特征列表

  • label (str) – 目标标签名称

  • sub_col (int) – 子列

  • show (bool) – 是否进行可视化展示

  • picture_name (str) – 图片保存路径名称

  • save_file (bool) – 判断是否保存图

  • kwargs (dict) – 参数字典

bm.visual.quick_visual.reports_plot(estimator, X_train, y_train, X_test=None, y_test=None, fit_params={}, ax=None, per_class=True, picture_name='模型报表', binary=False, classes=None, encoder=None, is_fitted='auto', force_model=False, show=False, save_file=True, **kwargs)
Parameters
  • estimator (pipeline) – pipeline模型

  • X_train (Dataframe) – 训练数据

  • y_train (list or ndarray) – 训练数据对应的标签

  • X_test (Dataframe) – 测试数据

  • y_test (list or ndarray) – 测试数据对应的标签

  • fit_params (dict) – 模型初始化参数

  • ax (ax, 默认值:None) – 自定义轴,自动设置

  • per_class (bool) – 值为True则绘制每个类别的ROC曲线,如果只需要宏观或微观平均曲线,则应将其设置为False

  • picture_name (str) – 保存图片时的名称

  • binary (bool) – 是否为二分类

  • classes (list) – 类别标签, 可设置为[0,1]

  • encoder (bool) – 是否对标签进行编码, 默认不需要

  • is_fitted (bool, 默认值:auto) – 是否进行训练

  • force_model (False) –

  • show (bool) – 是否进行可视化展示

  • save_file (bool) – 是否进行保存

  • kwargs (dict) – 参数字典

bm.visual.quick_visual.shap_plot(estimator, X, feature_names, ax=None, picture_name='SHAP', save_file=True, mode=None, show=False, **kwargs)
Parameters
  • estimator (pipeline) – pipeline模型

  • X (Dataframe) – 筛选后的训练数据

  • feature_names (list) – 筛选后的特征名称,包括categorical和numeric特征名称

  • ax (ax, 默认值:None) – 自定义轴,自动设置

  • picture_name (str) – 保存图片时的名称

  • save_file (bool) – 是否进行图片保存

  • mode (str) – 保存的可视化图类型, 可选force和summary

  • show (bool) – 是否进行可视化展示(当设置save_file为Ture时, show必须设为False)

  • kwargs (dict) – 参数字典

bm.visual.quick_visual.wiplot(binx, title, ax=None, display_iv=False, show=False, picture_name='WOE-IV', save_file=True, **kwargs)
Parameters
  • binx (Dataframe) – 分箱数据

  • title (str) – 目标标签

  • ax (ax, 默认值:None) – 自定义轴,自动设置

  • display_iv (bool) – 是否进行打印展示

  • show (bool) – 是否进行可视化展示

  • picture_name (str) – 图片保存路径名称

  • save_file (bool) – 判断是否保存图

  • kwargs (dict,) – 参数字典

bm.visual.target_visual module

class bm.visual.target_visual.FeatureCorrelationPlot(ax=None, method='pearson', labels=None, sort=False, feature_index=None, feature_names=None, color=None, picture_name='feature_correlation', save_file=False, **kwargs)

Bases: TargetVisualizer

该可视化工具计算Pearson相关系数以及特征和因变量之间的互信息。 该可视化可用于特征选择。

Parameters
  • ax (ax,默认值:None) – 画布的轴

  • method (string, 默认值:"pearson") – 计算特征与标签相关性的方法,包括:pearson, mutual_info-regression, mutual_info-classification

  • labels (list, 默认值:None) – 特征列名列表

  • sort (boolean, 默认值:False) – 绘制图形时是否进行排序绘制

  • feature_index (list) – 特征在列表中的index索引

  • feature_names (list) – 特征名称列表

  • color (string) – 绘图颜色

  • kwargs (dict) – 参数字典

Examples

>>> viz = FeatureCorrelationPlot()
>>> viz.visual(X, y)
>>> viz.show()
draw()

绘制特征相关度图

finalize()

设置图形的标签和title

is_dataframe(data)

对输入的数据进行转化,使其变为DataFrame类型

Parameters

data (instance) – 输入的数据

visual(X, y, **kwargs)

计算特征与标签的相关度

Parameters
  • X (numpy.ndarray or DataFrame, shape(n,m)) – 一个n条数据m个特征的矩阵

  • y (numpy.ndarray or DataFrame, shape(n,)) – 一个n个标签的实例矩阵

  • kwargs (dict) – 参数字典

Returns

self

Return type

visualbase

class bm.visual.target_visual.TargetBalancedReferencePlot(ax=None, target=None, bins=4, picture_name='target_balance', save_file=False, **kwargs)

Bases: TargetVisualizer

考虑到标签存在不平衡的问题,对数据标签进行可视化分箱, 各个类别标签的指向数据的建议

Parameters
  • ax (matplotlib轴,默认值:None) – 继承于visual_base类

  • target (string, 默认值:"y") – 数据集中的变量y

  • bins (分箱数量, 默认值:4) –

  • kwargs (dict) – 基类继承的参数字典

Examples

>>> visualizer = TargetBalancedReferencePlot()
>>> visualizer.visual(y)
>>> visualizer.show()
draw(y, **kwargs)

绘制分箱直方图

Parameters
  • y (ndarray or Series) – 一维的numpy.ndarray或Series

  • kwargs (dict) – 参数字典

finalize(**kwargs)

添加x轴标签并管理刻度标签,以确保其可见。

Parameters

kwargs (dict) – 通用参数字典

visual(y, **kwargs)

为图形设置y并且检查输入的数据类型

Parameters
  • y (ndarray or Series) – 一维的numpy.ndarray或Series

  • kwargs (dict) – 参数字典

class bm.visual.target_visual.TargetStatisticsPlot(ax=None, labels=None, colors=None, colormap=None, picture_name='target_statis', save_file=False, **kwargs)

Bases: TargetVisualizer

对数据中的标签进行统计,生成图形

展示存在两种模式:

统计模式(Statistics mode):每个标签在数据中出现的频率 对比模式(Compare mode):标签在测试数据和训练数据中的数量对比

Parameters
  • ax (ax, 默认值:None) – 图形中的轴

  • labels (list) – 可选项, 编码好的标签列表

  • colors (string) – 颜色设置

  • colormap (string or matplotlib cmap) –

  • kwargs (dict) – 可选项, 参数字典

Examples

>>> from sklearn.model_selection import train_test_split
>>> viz = TargetStatisticsPlot.visual(y)
>>> viz.show()
>>> _, _, y_train, y_test = train_test_split(X, y, test_size=0.2)
>>> viz = TargetStatisticsPlot()
>>> viz.visual(y_train, y_test)
>>> viz.show()
draw()

确定ax轴的值以及一些设定

finalize(**kwargs)

设置图的一些参数,如title,legend等等

Parameters

kwargs (dict) – 参数字典

visual(y_train, y_test=None)
两种模式通过输入的参数个数决定:

只输入y_train是统计模式 二者都输入是对比模式

Parameters
  • y_train (array-like) – 一维数组,shape(n,)

  • y_test (array-like) – 可选项, 一维数组,shape(m,)

bm.visual.visual_base module

继承于sklearn的可视化base类

class bm.visual.visual_base.ClassificationVisualizer(estimator, ax=None, fig=None, classes=None, encoder=None, is_fitted='auto', force_model=False, **kwargs)

Bases: ScoreVisual

分类模型训练、预测可视化监控

Parameters
  • estimator (sklearn的estimator) – sklearn的学习器,也就是分类、回归等模型

  • ax (matplotlib轴, 默认值:None) – 绘制图的轴

  • fig (matplotlib图, 默认值:None) – 绘图实例

  • classes (list or str, 默认值:None) – 分类类别列表

  • is_fitted (bool or str, 默认值:"auto") –

  • force_model (Boolean,默认值:False) – 模型检查

  • kwargs (dict) – 参数字典

property class_colors_
fit(X, y=None, **kwargs)

设置数据

Parameters
  • X (ndarray or DataFrame, shape(n,m)) – 实例特征矩阵

  • y (ndarray or Series, shape(n,)) – 标签矩阵

Returns

self – estimator实例

Return type

instance

score(X, y, **kwargs)

测试评估值

Parameters
  • X (array-like) – 输入的测试数据

  • y (array-like) – 输入相应的测试标签

Returns

score – 输出值

Return type

float

class bm.visual.visual_base.FeatureVisualizer(ax=None, fig=None, **kwargs)

Bases: VisualBase, TransformerMixin

特征可视化基类

Parameters
  • ax (matplotlib.Axes, 默认值: None) –

  • fig (matplotlib Figure, 默认值: None) –

  • kwargs (dict) – 要传递给基本可视化工具的任何其他关键字参数。

transform(X, y=None)

父类,提供给子类进行重写

Parameters
  • X (array-like, shape (n_samples, n_features)) – 需要转换的特征

  • y (array-like, shape (n_samples,)) – 输入特征所对应的标签

Returns

X – 原始的输入特征

Return type

array-like, shape (n_samples, n_features)

class bm.visual.visual_base.ModelVisualizer(estimator, ax=None, fig=None, is_fitted='auto', **kwargs)

Bases: VisualBase, Wrapper

封装sklearn的模型工具,可视化工具作为模型对象的代理,只需代表包装的模型进行绘制。

Parameters
  • estimator (sklearn的estimator) – sklearn的学习器,也就是分类、回归等模型

  • ax (ax, 默认值:None) – 绘制图的轴

  • fig (matplotlib, 默认值:None) – 绘图实例

  • is_fitted (Boolean or str,默认值:auto) – 判断是否进行模型训练、预测

  • kwargs (dict) – 参数字典

fit(X, y=None, **kwargs)
Parameters
  • X (Dataframe) – 输入的数据

  • y (ndarray or list) – 对应的标签

  • kwargs (dict) – 参数字典

get_params(deep=True)
Parameters

deep (bool, 默认: True) –

set_params(**params)
Parameters

params (dict) – 参数字典

class bm.visual.visual_base.RegressionVisualizer(estimator, ax=None, fig=None, force_model=False, **kwargs)

Bases: ScoreVisual

回归模型基类

包装回归模型,以在调用评分方法时生成可视化,通常允许用户有效地比较模型之间的性能。

Parameters
  • estimator (sklearn的estimator) – sklearn的学习器,也就是分类、回归等模型

  • ax (ax, 默认值:None) – 绘制图的轴

  • fig (matplotlib, 默认值:None) – 绘图实例

  • force_model (Boolean,默认值:False) – 模型检查

  • kwargs (dict) – 参数字典

score(X, y, **kwargs)

测试评估值

Parameters
  • X (array-like) – 输入的测试数据

  • y (array-like) – 输入相应的测试标签

Returns

score – 输出值

Return type

float

class bm.visual.visual_base.ScoreVisual(estimator, ax=None, fig=None, is_fitted='auto', **kwargs)

Bases: ModelVisualizer

返回模型预测性能

Parameters
  • model (sklearn的estimator) – sklearn的学习器,也就是分类、回归等模型

  • ax (matplotlib轴, 默认值:None) – 绘制图的轴

  • fig (matplotlib图, 默认值:None) – 绘图实例

  • is_fitted (Boolean or str,默认值:auto) – 判断是否进行模型训练、预测

  • kwargs (dict) – 参数字典

score(X, y, **kwargs)
class bm.visual.visual_base.TargetVisualizer(ax=None, fig=None, **kwargs)

Bases: VisualBase

标签可视化基类

Parameters
  • ax (matplotlib Axes, default: None) – 标签轴

  • fig (matplotlib Figure, default: None) – 标签画布

  • kwargs (dict) – 一些必要的参数,继承于sklearn

label_encoder(y)

标签编码

class bm.visual.visual_base.VisualBase(ax=None, fig=None, **kwargs)

Bases: BaseEstimator

定义使用matplotlib创建、存储以及可视化展示的基类。 继承于sklearn的BaseEstimator类。 主要是定义可视化的数据输入规范等作用。

Parameters
  • ax (matplotlib的轴,默认值:None) – 绘制图形的轴。如果在当前轴中没有传递将使用(或者根据需要生成)。

  • fig (matplotlib初始化绘制图,默认值:None) – 通过初始化图绘制可视化的图形,如果没有传参则会使用(或者根据需要生成)。

  • kwargs (dict) – 绘图所需要的关键参数

property ax
draw(**kwargs)

连接到matplotlib接口,并创建以图形或轴的形式对可视化工具进行训练的数据

Parameters

kwargs (dict) – 通用的关键词字典

property fig
finalize()

返回轴的装饰器

Parameters

kwargs (dict) – 通用的关键词字典

set_title(title=None)

设置当前轴的标题

Parameters

title (string, 默认值: None) – 增加图形的标题

show(outpath=None, clear_figure=False, **kwargs)

图形展示方法

Parameters
  • outpath (string, 默认值: None) – 图形保存路径

  • clear_figure (Boolean, 默认值: False) – 如果为True,保存到文件或显示在屏幕上后清除图形。

  • kwargs (dict) – 通用的关键词字典

Notes

property size
vis(X, y=None, **kwargs)

可视化的主要入口,方便后续继承重写

Parameters
  • X (ndarray or DataFrame, shape(n,m)) – 输入的DataFrame或者numpy.ndarray类型的数据

  • y (ndarray or Series, shape(n,)) – 输入的类型为numpy.ndarray或者Series的类别标签

  • kwargs (dict) – 继承于sklean的一些必要参数

Returns

self – 返回基类以此来支持后续的pipelines

Return type

VisualBase

bm.visual.visual_utils module

exception bm.visual.visual_utils.BrickError

Bases: Exception

The root exception for all yellowbrick related errors.

class bm.visual.visual_utils.ColorPalette(name_or_list)

Bases: list

A wrapper for functionality surrounding a list of colors, including a context manager that allows the palette to be set with a with statement.

as_hex()

Return a color palette with hex codes instead of RGB values.

as_rgb()

Return a color palette with RGB values instead of hex codes.

plot(size=1)

Plot the values in the color palette as a horizontal array. See Seaborn’s palplot function for inspiration.

Parameters

size (int) – scaling factor for size of the plot

class bm.visual.visual_utils.ContribEstimator(estimator, estimator_type=None)

Bases: object

包装器

exception bm.visual.visual_utils.ModelError

Bases: BrickError

A problem when interacting with sklearn or the ML framework.

exception bm.visual.visual_utils.NotFitted

Bases: ModelError

An action was called that requires a fitted model.

classmethod from_estimator(estimator, method=None)
class bm.visual.visual_utils.Wrapper(obj)

Bases: object

对象包装类

提供getatter方法获取对象方法

Parameters

obj (object) – 需要进行包装的object对象

bm.visual.visual_utils.bar_stack(data, ax=None, labels=None, ticks=None, colors=None, colormap=None, orientation='vertical', legend=True, legend_kws=None, **kwargs)

An advanced bar chart plotting utility that can draw bar and stacked bar charts from data, wrapping calls to the specified matplotlib.Axes object.

Parameters
  • data (2D array-like) – The data passed to the Visualizer. Rows represent each stack in the bar chart and columns represent each bar. Therefore, a single bar chart is created by passing a 2D array containing a single row, while the data to create a bar chart with 3 stacks would have a shape of (3, b).

  • ax (matplotlib.Axes, default: None) – The axes object to draw the barplot on, uses plt.gca() if not specified.

  • labels (list of str, default: None) – The labels for each row in the bar stack, used to create a legend.

  • ticks (list of str, default: None) – The labels for each bar, added to the x-axis for a vertical plot, or the y-axis for a horizontal plot.

  • colors (array-like, default: None) – Specify the colors of each bar, each row in the stack, or every segment.

  • colormap (string or matplotlib cmap) – Specify a colormap for each bar, each row in the stack, or every segment.

  • orientation (‘vertical’ or ‘horizontal’) – Specifies a horizontal or vertical bar chart.

  • legend (boolean, default: True) – If True, the function add a legend with the plot

  • legend_kws (dict, default: None) – Additional keyword arguments for the legend components.

  • kwargs (dict) – Additional keyword arguments to pass to ax.bar.

bm.visual.visual_utils.check_fitted(estimator, is_fitted_by='auto', **kwargs)
Parameters
  • estimator (sklearn.Estimator) – 模型

  • is_fitted_by (bool or str, default: 'auto') –

kwargsdict

参数字典

Returns

is_fitted – Whether or not the model is already fitted

Return type

bool

bm.visual.visual_utils.color_palette(palette=None, n_colors=None)

Return a color palette object with color definition and handling.

Calling this function with palette=None will return the current matplotlib color cycle.

This function can also be used in a with statement to temporarily set the color cycle for a plot or set of plots.

Parameters
  • palette (None or str or sequence) –

    Name of a palette or None to return the current palette. If a sequence the input colors are used but possibly cycled.

    Available palette names from yellowbrick.colors.palettes are:

    • accent

    • dark

    • paired

    • pastel

    • bold

    • muted

    • colorblind

    • sns_colorblind

    • sns_deep

    • sns_muted

    • sns_pastel

    • sns_bright

    • sns_dark

    • flatui

    • neural_paint

  • n_colors (None or int) – Number of colors in the palette. If None, the default will depend on how palette is specified. Named palettes default to 6 colors which allow the use of the names “bgrmyck”, though others do have more or less colors; therefore reducing the size of the list can only be done by specifying this parameter. Asking for more colors than exist in the palette will cause it to cycle.

Returns

  • list(tuple) – Returns a ColorPalette object, which behaves like a list, but can be used as a context manager and possesses functions to convert colors.

  • .. seealso::

    set_palette()

    Set the default color cycle for all plots.

    set_color_codes()

    Reassign color codes like "b", "g", etc. to colors from one of the yellowbrick palettes.

    colors.resolve_colors()

    Resolve a color map or listed sequence of colors.

bm.visual.visual_utils.color_sequence(palette=None, n_colors=None)

Return a ListedColormap object from a named sequence palette. Useful for continuous color scheme values and color maps.

Calling this function with palette=None will return the default color sequence: Color Brewer RdBu.

Parameters
  • palette (None or str or sequence) –

    Name of a palette or None to return the default palette. If a sequence the input colors are used to create a ListedColormap.

    The currently implemented color sequences are from Color Brewer.

    Available palette names from yellowbrick.colors.palettes are:

    • py:const

      Blues

    • py:const

      BrBG

    • py:const

      BuGn

    • py:const

      BuPu

    • py:const

      GnBu

    • py:const

      Greens

    • py:const

      Greys

    • py:const

      OrRd

    • py:const

      Oranges

    • py:const

      PRGn

    • py:const

      PiYG

    • py:const

      PuBu

    • py:const

      PuBuGn

    • py:const

      PuOr

    • py:const

      PuRd

    • py:const

      Purples

    • py:const

      RdBu

    • py:const

      RdGy

    • py:const

      RdPu

    • py:const

      RdYlBu

    • py:const

      RdYlGn

    • py:const

      Reds

    • py:const

      Spectral

    • py:const

      YlGn

    • py:const

      YlGnBu

    • py:const

      YlOrBr

    • py:const

      YlOrRd

    • py:const

      ddl_heat

  • n_colors (None or int) – Number of colors in the palette. If None, the default will depend on how palette is specified - selecting the largest sequence for that palette name. Note that sequences have a minimum lenght of 3 - if a number of colors is specified that is not available for the sequence a ValueError is raised.

Returns

Returns a ListedColormap object, an artist object from the matplotlib library that can be used wherever a colormap is necessary.

Return type

colormap

bm.visual.visual_utils.div_safe(numerator, denominator)

Ufunc-extension that returns 0 instead of nan when dividing numpy arrays

Parameters
  • numerator (array-like) –

  • denominator (scalar or array-like that can be validly divided by the numerator) –

  • array (returns a numpy) –

  • example (div_safe( [-1, 0, 1], 0 ) == [0, 0, 0]) –

bm.visual.visual_utils.get_color_cycle()

Returns the current color cycle from matplotlib.

bm.visual.visual_utils.get_model_name(model)

获取模型的名称

Parameters

model (class or instance) – 模型对象

Returns

name – 模型的名称

Return type

string

bm.visual.visual_utils.is_classifier(estimator)
bm.visual.visual_utils.is_dataframe(obj)

Returns True if the given object is a Pandas Data Frame.

Parameters

obj (instance) – The object to test whether or not is a Pandas DataFrame.

bm.visual.visual_utils.is_estimator(model)

判断模型是否为estimator

Parameters

estimator (class or instance) –

bm.visual.visual_utils.is_fitted(estimator)

确保模型已经训练过

bm.visual.visual_utils.is_regressor(estimator)
bm.visual.visual_utils.memoized(fget)
bm.visual.visual_utils.resolve_colors(n_colors=None, colormap=None, colors=None)

Generates a list of colors based on common color arguments, for example the name of a colormap or palette or another iterable of colors. The list is then truncated (or multiplied) to the specific number of requested colors.

Parameters
  • n_colors (int, default: None) – Specify the length of the list of returned colors, which will either truncate or multiple the colors available. If None the length of the colors will not be modified.

  • colormap (str, yellowbrick.style.palettes.ColorPalette, matplotlib.cm, default: None) – The name of the matplotlib color map with which to generate colors.

  • colors (iterable, default: None) – A collection of colors to use specifically with the plot. Overrides colormap if both are specified.

Returns

colors – A list of colors that can be used in matplotlib plots.

Return type

list

Notes

This function was originally based on a similar function in the pandas plotting library that has been removed in the new version of the library.

bm.visual.visual_utils.set_color_codes(palette='accent')

Change how matplotlib color shorthands are interpreted.

Calling this will change how shorthand codes like “b” or “g” are interpreted by matplotlib in subsequent plots.

Parameters

palette (str) – Named yellowbrick palette to use as the source of colors.

See also

set_palette

Color codes can also be set through the function that sets the matplotlib color cycle.

bm.visual.visual_utils.set_palette(palette, n_colors=None, color_codes=False)

Set the matplotlib color cycle using a seaborn palette.

Parameters
  • palette (yellowbrick color palette | seaborn color palette (with sns_ prepended)) – Palette definition. Should be something that color_palette() can process.

  • n_colors (int) – Number of colors in the cycle. The default number of colors will depend on the format of palette, see the color_palette() documentation for more information.

  • color_codes (bool) – If True and palette is a seaborn palette, remap the shorthand color codes (e.g. “b”, “g”, “r”, etc.) to the colors from this palette.