bm.visual package

可视化

bm.visual.color_style module

bm.visual.color_style.find_text_color(base_color, dark_color='black', light_color='white', coef_choice=0)

背景和文本颜色选择用户可以指定深色和浅色文本颜色，或接受默认值黑色和白色

Parameters

base_color – RGB的背景颜色
dark_color – matplotlib颜色
light_color – 文本高亮
coef_choice – 输入0或1进行索引，默认为0

bm.visual.features_visual module

特征列之间的相关度、特征标签之间的相关度可视化

class bm.visual.features_visual.BinningPlot(ax, bin_method='interpolate', picture_name='feature_bin', save_file=False, **kwargs)

Bases: FeatureVisualizer

woe可视化，通过计算woe以及iv进行分箱

Parameters

ax (ax, 默认值 : None) –
bin_X (Dataframe) – 分箱数据
title (str, 默认值 : None) – 特征名称
display_iv (bool, 默认值 : False) – 是否展示iv

draw(bad_rate, samples_num, column, **kwargs)

Parameters

bad_rate (DataFrame) – bad的数量
samples_num (DataFrame) – 相应的样本数量

finalize(**kwargs)

返回轴的装饰器

Parameters: kwargs (dict) – 通用的关键词字典

init_bin_method(data, column, target=None, target_value=None, num_clusters=5, max_interval=10, special_attributes=None, tree_params=None)

Parameters

data (Dataframe) – 输入的数据
column (str) – 特征列
target (str) – 标签列名
target_value (Any) – 目标列值
num_clusters (int) – 聚类簇数
max_interval (int, 默认值 10) – 分箱最大间隔
special_attributes (str, 默认值 None) – 特殊属性
tree_params (dict, 默认值 None) – 特征数参数

visual(data, column, target=None, target_value=None, num_clusters=5, max_interval=10, special_attributes=None, tree_params=None, bad_rate_plot=False, **kwargs)

通过不同的分箱方法，绘制分箱图

Parameters

data (DataFrame, shape(n,m)) – 输入数据
column (str) – 分箱特征列
target (str, 默认值 : None) – 目标值(分类标签列)
target_value (str, 默认值 : None) – 现支持二分类(如 : bad, good)
num_clusters (int, 默认值 : 5) – 聚类簇数
max_interval (int, 默认值 : 10) – 最大间隔数
special_attributes (str, 默认值: None) – 特殊特征名
tree_params (dict, 默认值 : None) – 决策树参数字典
bad_rate_plot (bool, 默认值 : False) – 绘制分箱bad_rate

class bm.visual.features_visual.FeatureCorrelationPlot(ax, columns, picture_name='feature_correlation', save_file=False, **kwargs)

Bases: FeatureVisualizer

类别特征统计图，对于非数值型的特征进行统计

Parameters

ax (matplotlib Axes, default: None) – 如果hist=True，则添加到上方（xhax）和右侧（yhax）
X (Dataframe, 默认值 : None) – 输入的数据
columns (list, default: None) –
sub_col (int, 默认值 : 2) –
label (str, default : None) –
kwargs (dict) – 关键参数字典

Examples

>>> viz = FeatureCorrelationPlot()
>>> viz.visual(X)
>>> viz.show()

draw(corr_df, colormap, mask=False, **kwargs)

Parameters

corr_df (Dataframe) – 正确的特征
colormap (plt) – 颜色设置
mask (bool) – 是否需要mask
kwargs (dict) – 参数字典

finalize()

返回轴的装饰器

Parameters: kwargs (dict) – 通用的关键词字典

visual(X)

Parameters: X (Dataframe) – 输入的类别型特征

class bm.visual.features_visual.FeaturesBoxPlot(ax, columns, sub_col, picture_name='feature_box', save_file=False, **kwargs)

Bases: FeatureVisualizer

Parameters

ax (matplotlib Axes, default: None) – 如果hist=True，则添加到上方（xhax）和右侧（yhax）
X (Dataframe, 默认值 : None) – 输入的数据
columns (list, default: None) – 特征列表
sub_col (int, 默认值 : 2) –
kwargs (dict) – 关键参数字典

Examples

>>> viz = FeaturesBoxPlot()
>>> viz.visual(X, y)
>>> viz.show()

draw(x, idx, feat, **kwargs)

生成画布,处理输入数据，计算最小值（min），下四分位数（Q1），中位数（median），上四分位数（Q3），最大值（max）

Parameters

x (Dataframe, 默认值 : None) – 输入数据
idx (int) – 特征列索引
feat (any) – 每个索引对应的特征
kwargs (dict) – 参数字典

finalize(): 修改图片的一些参数

visual(X)

Parameters: X (Dataframe) – 输入数据

class bm.visual.features_visual.FeaturesCategoryCount(ax, columns, label, sub_col, picture_name='feature_category', save_file=False, **kwargs)

Bases: FeatureVisualizer

类别型特征统计图，统计非数值类型的类别特征

Parameters

ax (matplotlib Axes, default: None) – 如果hist=True，则添加到上方（xhax）和右侧（yhax）
X (Dataframe, 默认值 : None) – 输入的数据
columns (list, default: None) –
sub_col (int, 默认值 : 2) –
label (str, default : None) –
kwargs (dict) – 关键参数字典

Examples

>>> viz = FeaturesCategoryCount()
>>> viz.visual(X, y)
>>> viz.show()

draw(x, idx, feat, **kwargs)

生成画布,处理输入数据

Parameters

x (Dataframe, 默认值 : None) – 输入数据
idx (int) – 每一个类别特征的索引
feat (any) – 每一个索引对应的特征

finalize(): 修改一些图片的属性

visual(X)

Parameters: X (Dataframe) – 输入的类别特征数据

class bm.visual.features_visual.FeaturesDistributionPlot(ax, columns, label, sub_col, picture_name='feature_distribute', save_file=False, **kwargs)

Bases: FeatureVisualizer

特征分布图, 绘制特征的分布

Parameters

ax (轴，默认值 : None) –
columns (str or list，默认值 : None) – 特征列名
label (str) – 标签
sub_col (int, 默认值 : 5) –
kwargs (dict) – 关键参数字典

Examples

>>> viz = FeaturesDistributionPlot()
>>> viz.visual(X, y)
>>> viz.show()

draw(x, idx, feat, **kwargs)

Parameters

x (Dataframe, 默认值 : None) – 输入数据
idx (int) – 每一列特征索引
feat (str) – 索引对应的特征值

finalize(**kwargs): 修改图的属性

visual(X)

Parameters: X (Dataframe) – 输入的数据

class bm.visual.features_visual.FeaturesVisualPlot(ax=None, columns=None, correlation='pearson', kind='scatter', hist=True, alpha=0.65, joint_kws=None, hist_kws=None, picture_name='features_visual', save_file=False, **kwargs)

Bases: FeatureVisualizer

特征数据可视化，允许不同特征之间的对比交互可视化。可以实现特征与标签之间通过不同算法计算其相关性并进行可视化

“columns”参数可以用于指定“X”中两个所需列的索引。

通过将参数“hist”设置为“True”，可以包含直方图、频率分布，或概率密度函数的“密度”。

Parameters

ax (matplotlib Axes, default: None) – 如果hist=True，则添加到上方（xhax）和右侧（yhax）
columns (int, str, [int, int], [str, str], default: None) –
correlation (str, default: 'pearson') – 相关性计算方法:可选’pearson’, ‘covariance’, ‘spearman’, ‘kendalltau’
kind (str in {'scatter', 'hex'}, default: 'scatter') – 图形打印类型。注意，当kind=’hex’时，目标不能按颜色绘制。
hist ({True, False, None, 'density', 'frequency'}, default: True) – 默认绘制直方图，显示两个输入变量分布。如果设置为“density”，将绘制概率密度函数。如果设置为True或“frequency”，则将绘制频率。
alpha (float, default: 0.65) – 指定透明度，其中1完全不透明，0完全透明。该特性使密集聚集点更为可见。
kwargs (dict) – 关键参数字典

Examples

>>> viz = FeaturesVisualPlot(columns=["temp", "humidity"])
>>> viz.visual(X, y)
>>> viz.show()

draw(x, y, xlabel=None, ylabel=None)

Parameters

x (1D array-like) – x每一列的与y
y (1D array-like) – x每一列的与y
xlabel (str) – x轴与y轴的标签
ylabel (str) – x轴与y轴的标签

finalize(**kwargs): 修改图像属性

is_dataframe(data)

对输入的数据进行转化，使其变为DataFrame类型

Parameters: data (instance) – 输入的数据

visual(X, y=None)

可视化处理，输入数据进行传递

Parameters

X (array-like) – 一维或二维的numpy数组，通常为二维。
y (array-like, 默认值: None) – 一维的标签数组

property xhax: 直方图的x轴

property yhax: 直方图的y轴

class bm.visual.features_visual.WoeIvPlot(ax, title=None, display_iv=False, picture_name='woe_iv', save_file=False, **kwargs)

Bases: FeatureVisualizer

WOE-IV分箱可视化

binxDataFrame: 分箱结果
titlestr: 图片标题
display_ivbool: 是否显示对应的IV值

Examples

>>> viz = WoeIvPlot()
>>> viz.visual(X)
>>> viz.show()

draw(binx, ind, y_left_max, y_right_max, **kwargs)

Parameters

binx (Dataframe) – 分箱结果
ind (list) – x轴的刻度值
y_left_max (int) – 左偏移量
y_right_max (int) – 右偏移量
kwargs (dict) – 参数字典

finalize()

返回轴的装饰器

Parameters: kwargs (dict) – 通用的关键词字典

visual(binx, **kwargs)

Parameters

binx (Dataframe) – 分箱结果
kwargs (dict) – 参数字典

bm.visual.interpretability_visual module

class bm.visual.interpretability_visual.FeatureImportancePlot(ax, picture_name='features_importance_visual', save_file=False, **kwargs)

Bases: FeatureVisualizer

特征重要性可视化

Parameters

ax (matplotlib Axes, default: None) – 如果hist=True，则添加到上方（xhax）和右侧（yhax）
picture_name (str, default: features_importance_visual) – 图片保存名称
save_file (boolean, default: False) – 图片保存路径
kwargs (dict) – 关键参数字典

draw(X, top_n=50, figsize=(20, 12))

Parameters

X (Dataframe) – 特征重要性dataframe
top_n (int) – top-n的特征
figsize (tuple) – 图像的size

finalize(X)

Parameters: X (Dataframe) – 特征重要度

visual(X)

Parameters: X (dataframe) – 输入的数据

class bm.visual.interpretability_visual.ShapPlot(estimator, ax=None, picture_name='feature_shap_plot', save_file=False, mode=None, **kwargs)

Bases: FeatureVisualizer

筛选的特征的解释性可视化

Parameters

ax (matplotlib Axes, default: None) – 如果hist=True，则添加到上方（xhax）和右侧（yhax）
picture_name (str, default: features_importance_visual) –
save_file (boolean, default: False) –
mode (str, default: None) – mode可选项为force或summary
kwargs (dict) – 关键参数字典

draw(explainer, shap_values, X, feature_names, show)

Parameters

estimator (pipeline) – 模型
X (Dataframe) – 输入的筛选特征

finalize()

返回轴的装饰器

Parameters: kwargs (dict) – 通用的关键词字典

visual(estimator, X, feature_names, show)

Parameters

estimator (pipeline) – 模型
X (Dataframe) – 输入的筛选特征

bm.visual.model_visual module

模型训练、预测的评估绘图

class bm.visual.model_visual.CNTPlot(estimator, ax=None, per_class=True, binary=False, classes=None, encoder=None, is_fitted='auto', force_model=False, picture_name='cnt', save_file=False, **kwargs)

Bases: ClassificationVisualizer

Parameters

estimator (pipeline) – 使用的模型
ax (轴, 默认值: None) –
per_class (bool, 默认值: True) – 值为True则绘制每个类别的ROC曲线，如果只需要宏观或微观平均曲线，则应将其设置为False
binary (bool, 默认值: False) – 二分类
classes (list of str, 默认值: None) – 标签类别
encoder (dict or LabelEncoder, 默认值:None) – 标签编码器，sklean方法
is_fitted (boolean or str , 默认值: 'auto') – 判断模型是否训练
force_model (bool, 默认值:False) – 线条颜色
kwargs (dict) – 参数字典

Examples

>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.model_selection import train_test_split
>>> data = load_data("occupancy")
>>> features = ["temp", "relative humidity", "light", "C02", "humidity"]
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
>>> oz = CNTPlot(LogisticRegression())
>>> oz.fit(X_train, y_train)
>>> oz.score(X_test, y_test)
>>> oz.show()

draw()

finalize(**kwargs)

图形设置

Parameters: kwargs (dict) – 参数字典

fit(X, y=None, **kwargs): 运行模型

score(X, y, **kwargs)

模型训练、预测值

Parameters

X (ndarray or DataFrame, shape(n,m)) – 有m个特征的输入矩阵
y (ndarray or Series, shape(n,)) – 一维类别标签

Returns

score_ – 评估得到的值

Return type

float

class bm.visual.model_visual.ClassificationReportPlot(estimator, ax=None, classes=None, cmap='YlOrRd', support=None, encoder=None, is_fitted='auto', force_model=False, colorbar=True, fontsize=None, picture_name='classification_report', save_file=False, **kwargs)

Bases: ClassificationVisualizer

混淆矩阵可视化视图

Parameters

estimator (pipeline) – 模型
ax (matplotlib轴，默认值:None) –
classes (list or str, 默认值: None) – 类别标签
cmap (str，默认值: 'YlOrRd') – 颜色集合
support ({True, False, None, 'percent', 'count'}, 默认值: None) – 模型训练时的颜色
encoder (dict or LabelEncoder, 默认值:None) – 标签编码器，sklean方法
is_fitted (boolean or str , 默认值: 'auto') – 判断模型是否训练
force_model (bool, 默认值:False) – 线条颜色
colorbar (bool, 默认值:True) – 图形颜色
fontsize (int or None, 默认值:None) – 字体大小
kwargs (dict) – 参数字典

Examples

>>> from sklearn.linear_model import LogisticRegression
>>> viz = ClassificationReportPlot(LogisticRegression())
>>> viz.fit(X_train, y_train)
>>> viz.scores(X_test, y_test)
>>> viz.show()

draw()

finalize(**kwargs)

对于生成的图像进行标题等属性设置

Parameters: kwargs (dict) – 参数字典

scores(X, y)

生成 classification report.

Parameters

X (ndarray or DataFrame of shape n x m) – 一个(n x m)的特征矩阵
y (ndarray or Series of length n) – 对应的标签

Returns

score_ – 准确率(accuracy)

Return type

float

class bm.visual.model_visual.ConfusionMaxtrixPlot(estimator, ax=None, sample_weight=None, percent=False, classes=None, encoder=None, cmap='YlOrRd', fontsize=None, is_fitted='auto', force_model=False, label_transfer=None, picture_name='confusion_maxtrix', save_file=False, **kwargs)

Bases: ClassificationVisualizer

分类混淆矩阵可视化

Parameters

estimator (模型) – 使用的模型
ax (matplotlib轴，默认值:None) –
sample_weight (array-like, shape(n_samples,)) – 可选项,样本权重
percent (bool, 默认值False) – 数字或百分比展示
classes (list or str, 默认值: None) – 类别标签
cmap (str，默认值: 'YlOrRd') – 颜色集合
encoder (dict or LabelEncoder, 默认值:None) – 标签编码器，sklean方法
is_fitted (boolean or str , 默认值: 'auto') – 判断模型是否训练
force_model (bool, 默认值:False) – 线条颜色
fontsize (int or None, 默认值:None) – 字体大小
kwargs (dict) – 参数字典

Examples

>>> from sklearn.linear_model import LogisticRegression
>>> viz = ConfusionMaxtrixPlot(LogisticRegression())
>>> viz.fit(X_train, y_train)
>>> viz.score(X_test, y_test)
>>> viz.show()

draw(): 生成相应的混淆矩阵图

finalize(**kwargs)

返回轴的装饰器

Parameters: kwargs (dict) – 通用的关键词字典

score(X, y, **kwargs)

通过比较实例X上的预测与目标向量y指定的真值，根据提供的测试数据绘制混淆矩阵。

Parameters

X (ndarray or DataFrame of shape n x m) – 一个(n x m)的特征矩阵
y (ndarray or Series of length n) – 对应的标签

Returns

score_ – 准确率(accuracy)

Return type

float

show(outpath=None, **kwargs)

图形展示方法

Parameters

outpath (string, 默认值: None) – 图形保存路径
clear_figure (Boolean, 默认值: False) – 如果为True，保存到文件或显示在屏幕上后清除图形。
kwargs (dict) – 通用的关键词字典

Notes

class bm.visual.model_visual.FeatureImportancePlot(estimator, ax=None, labels=None, relative=True, absolute=False, xlabel=None, stack=False, colors=None, colormap=None, is_fitted='auto', topn=None, picture_name='feature_important', save_file=False, **kwargs)

Bases: ModelVisualizer

对特征按照重要程度进行排序

Parameters

estimator (Estimator) – 初始化好的模型
ax (matplotlib轴, 默认值: None) – 画图的轴
labels (list, 默认值: None) – 标签列表
relative (bool, 默认值: True) – 相对重要程度
absolute (bool, 默认值: False) – 绝对重要程度
xlabel (str, 默认值: None) – x轴的标签
stack (bool, 默认值: False) – 绘图类型
colors (list of strings) – 如果“stack==False”，请为图表中的每个条指定颜色。
colormap (string or matplotlib cmap) – 如果“stack==True”，请指定一个colormap来为类着色。
is_fitted (bool or str, 默认值 : 'auto') – 判断是否进行fit
topn (int, 默认值 : None) – 展示top-n的结果，默认全部展示
kwargs (dict) – 参数字典

Examples

>>> from sklearn.ensemble import GradientBoostingClassifier
>>> visualizer = FeatureImportancePlot(GradientBoostingClassifier())
>>> visualizer.fit(X, y)
>>> visualizer.show()

draw(**kwargs): 绘制特征重要度图

Note

不经过特征筛选，直接利用模型生成的特征排序

finalize(**kwargs): 图形属性修改

fit(X, y=None, **kwargs)

训练模型

Parameters

X (numpy.ndarray or DataFrame, shape(n,m)) – 输入的训练数据
y (numpy.ndarray or Series, shape(n,)) – 输入的标签
kwargs (dict) – 参数字典

class bm.visual.model_visual.KDEPlot(estimator, ax=None, per_class=True, binary=False, classes=None, encoder=None, is_fitted='auto', force_model=False, picture_name='kde', save_file=False, **kwargs)

Bases: ClassificationVisualizer

Kernel Density Estimator Plot

Parameters

estimator (estimator) – 使用的模型
ax (matplotlib 轴, 默认值: None) –
per_class (bool, 默认值: True) – 值为True则绘制每个类别的ROC曲线，如果只需要宏观或微观平均曲线，则应将其设置为False
binary (bool, 默认值: False) – 二分类
classes (list of str, 默认值: None) – 标签类别
encoder (dict or LabelEncoder, 默认值:None) – 标签编码器，sklean方法
is_fitted (boolean or str , 默认值: 'auto') – 判断模型是否训练
force_model (bool, 默认值:False) –
kwargs (dict) – 参数字典

Examples

>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.model_selection import train_test_split
>>> data = load_data("occupancy")
>>> features = ["temp", "relative humidity", "light", "C02", "humidity"]
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
>>> oz = KDEPlot(LogisticRegression())
>>> oz.fit(X_train, y_train)
>>> oz.score(X_test, y_test)
>>> oz.show()

draw()

根据数据画图

Returns: ax – matlibplot ax
Return type: ax

finalize(**kwargs)

ROCAUC图形修改

Parameters: kwargs (dict) – 参数字典

fit(X, y=None, **kwargs): 重构fit过程，继承于sklearn的base模块

score(X, y, **kwargs)

模型训练、预测值

Parameters

X (ndarray or DataFrame, shape(n,m)) – 有m个特征的输入矩阵
y (ndarray or Series, shape(n,)) – 一维类别标签

Returns

score_ – 评估得到的值

Return type

float

class bm.visual.model_visual.KSPlot(estimator, ax=None, classes=None, encoder=None, is_fitted='auto', force_model=False, picture_name='ks', save_file=False, **kwargs)

Bases: ClassificationVisualizer

绘制KS曲线

Parameters

estimator (pipeline) – 模型
ax (轴，默认值: None) –
classes (list or str, 默认值:None) – 类别
encoder (dict or LabelEncoder, 默认值: None) – 标签编码器
is_fitted (bool or str, 默认值 : "auto") – 是否进行过fit
force_model (bool, 默认值 : False) –
kwargs (dict) – 参数字典

draw(score_bin, good_rates, bad_rates, ks_lst, **kwargs)

Parameters

score_bin (list) – score的index
good_rates (list) – 好样本率
bad_rates (list) – 坏样本率
ks_lst (list) – 对应的ks值
kwargs (dict) – 参数字典

finalize()

返回轴的装饰器

Parameters: kwargs (dict) – 通用的关键词字典

fit(X, y=None, **kwargs)

重构模型fit过程，继承于sklearn的base类

Parameters

X (Dataframe) – 训练数据
y (list) – 训练数据对应的标签
kwargs (dict) – 参数字典

score(X_train, y_train, X_test=None, y_test=None, **kwargs)

返回模型训练过程中的预测值

Parameters

X_train (Dataframe) – 训练数据
y_train (list or ndarray) – 训练数据标签
X_test (Dataframe) – 测试数据
y_test (list or ndarray) – 测试数据对应标签
kwargs (dict) – 参数字典

class bm.visual.model_visual.LearningCurve(estimator, ax=None, train_sizes=array([0.1, 0.325, 0.55, 0.775, 1.0]), cv=None, scoring=None, exploit_incremental_learning=False, n_jobs=1, pre_dispatch='all', shuffle=False, random_state=None, picture_name='learning_curve', save_file=False, **kwargs)

Bases: ModelVisualizer

绘制数据在模型上的学习曲线

Parameters

estimator (pipeline) – 学习器(初始化模型)
ax (轴, default : None) –
train_sizes (array-like, shape (n_ticks,), default: np.linspace(0.1,1.0,5)) –
cv (int, default: None, 做cross-validation的时候，数据分成的份数，其中一份作为cv集，其余n-1份作为training) –
scoring (string, callable or None, default: None) – optional[‘accuracy’, ‘adjusted_rand_score’, ‘average_precision’, ‘f1’, ‘f1_macro’, ‘f1_micro’, ‘f1_samples’, ‘f1_weighted’, ‘neg_log_loss’, ‘neg_mean_absolute_error’, ‘neg_mean_squared_error’, ‘neg_median_absolute_error’, ‘precision’, ‘precision_macro’, ‘precision_micro’, ‘precision_samples’, ‘precision_weighted’, ‘r2’, ‘recall’, ‘recall_macro’, ‘recall_micro’, ‘recall_samples’, ‘recall_weighted’, ‘roc_auc’]
exploit_incremental_learning (boolean, default: False) – 如果估计器支持增量学习，这将用于加速不同训练集大小的拟合。
n_jobs (int, optional, default : 1) – 并行数
pre_dispatch (integer or string, optional, default : all) – 并行执行的预调度作业数
shuffle (boolean, optional) – shuffle operation
random_state (int, RandomState instance or None, optional (default=None)) – 设置随机种子，当shuffle = True时设置
kwargs (dict) – 参数字典

Examples

>>> from sklearn.naive_bayes import GaussianNB
>>> model = LearningCurve(GaussianNB())
>>> model.fit(X, y)
>>> model.show()

draw(**kwargs): Renders the training and test learning curves.

finalize(**kwargs): 设置title以及轴标签

fit(X, y=None, **kwargs)

重构fit过程，继承于sklearn的base类

Parameters

X (Dataframe) – 训练数据
y (list or ndarray) – 训练数据对应的标签
kwargs (dict) – 参数字典

class bm.visual.model_visual.LiftPlot(estimator, ax=None, per_class=True, binary=False, classes=None, encoder=None, is_fitted='auto', force_model=False, picture_name='lit', save_file=False, **kwargs)

Bases: ClassificationVisualizer

提升度曲线

Parameters

estimator (pipeline) – 使用的模型
ax (matplotlib 轴, 默认值: None) –
per_class (bool, 默认值: True) – 值为True则绘制每个类别的ROC曲线，如果只需要宏观或微观平均曲线，则应将其设置为False
binary (bool, 默认值: False) – 二分类
classes (list of str, 默认值: None) – 标签类别
encoder (dict or LabelEncoder, 默认值:None) – 标签编码器，sklean方法
is_fitted (boolean or str , 默认值: 'auto') – 判断模型是否训练
force_model (bool, 默认值:False) – 线条颜色
kwargs (dict) – 参数字典

Examples

>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.model_selection import train_test_split
>>> data = load_data("occupancy")
>>> features = ["temp", "relative humidity", "light", "C02", "humidity"]
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
>>> oz = LiftPlot(LogisticRegression())
>>> oz.fit(X_train, y_train)
>>> oz.score(X_test, y_test)
>>> oz.show()

draw()

连接到matplotlib接口，并创建以图形或轴的形式对可视化工具进行训练的数据

Parameters: kwargs (dict) – 通用的关键词字典

finalize(**kwargs)

Parameters: kwargs (dict) – 参数字典

fit(X, y=None, **kwargs)

重构fit过程，继承于sklearn的base基类

Parameters

X (Dataframe) – 训练数据
y (list or ndarray) – 训练数据对应的标签
kwargs (dict) – 参数字典

score(X, y, **kwargs)

模型训练、预测值

Parameters

X (ndarray or DataFrame, shape(n,m)) – 有m个特征的输入矩阵
y (ndarray or Series, shape(n,)) – 一维类别标签

Returns

score_ – 评估得到的值

Return type

float

class bm.visual.model_visual.MarketingPlot(estimator, ax=None, per_class=True, binary=False, classes=None, encoder=None, is_fitted='auto', force_model=False, picture_name='market_form', save_file=False, **kwargs)

Bases: ClassificationVisualizer

营销报表生成，综合了营销所需的各种图像

Parameters

estimator (pipeline) – 使用的模型
ax (matplotlib 轴, 默认值: None) –
per_class (bool, 默认值: True) – 值为True则绘制每个类别的ROC曲线，如果只需要宏观或微观平均曲线，则应将其设置为False
binary (bool, 默认值: False) – 二分类
classes (list of str, 默认值: None) – 标签类别
encoder (dict or LabelEncoder, 默认值:None) – 标签编码器，sklean方法
is_fitted (boolean or str , 默认值: 'auto') – 判断模型是否训练
force_model (bool, 默认值:False) –
kwargs (dict) – 参数字典

Examples

>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.model_selection import train_test_split
>>> data = load_data("occupancy")
>>> features = ["temp", "relative humidity", "light", "C02", "humidity"]
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
>>> oz = LiftPlot(LogisticRegression())
>>> oz.fit(X_train, y_train)
>>> oz.score(X_test, y_test)
>>> oz.show()

cob_draw()

cpr_draw()

draw()

连接到matplotlib接口，并创建以图形或轴的形式对可视化工具进行训练的数据

Parameters: kwargs (dict) – 通用的关键词字典

finalize(**kwargs)

图形属性设置

Parameters: kwargs (dict) – 参数字典

fit(X, y=None, **kwargs)

重构fit过程，继承于sklearn的base类

Parameters

X (Dataframe) – 训练数据
y (list or ndarray) – 训练数据对应的标签
kwargs (dict) – 参数字典

ker_draw()

ks_draw()

prc_draw()

roc_draw()

score(X, y, **kwargs)

模型训练、预测值

Parameters

X (ndarray or DataFrame, shape(n,m)) – 有m个特征的输入矩阵
y (ndarray or Series, shape(n,)) – 一维类别标签

Returns

score_ – 评估得到的值

Return type

float

class bm.visual.model_visual.PrecisionRecallPlot(estimator, ax=None, classes=None, colors=None, cmap=None, encoder=None, fill_area=None, ap_score=True, micro=True, iso_f1_curves=False, iso_f1_values=(0.2, 0.4, 0.6, 0.8), per_class=False, fill_opacity=0.2, line_opacity=0.8, is_fitted='auto', force_model=False, pr_change=False, picture_name='precision_recall', save_file=False, **kwargs)

Bases: ClassificationVisualizer

精确率(precision)和召回率(recall)的对应图

Parameters

estimator (pipeline) – 模型
ax (matplotlib 轴, 默认值:None) –
classes (list or str, 默认值:None) – 类别标签
cmap (str or colormap, 默认值:None) – 颜色选择
encoder (dict or LabelEncoder, 默认值: None) – 标签编码器
fill_area (bool, 默认值: True) – 覆盖区域颜色
ap_score (bool, 默认值 : True) – 图注释
micro (bool, 默认值 : True) – micro average
iso_f1_curves (bool, 默认值 : None) – ISO F1-Curves
iso_f1_values (tuple, 默认值 : (0.2,0.4,0.6,0.8)) – 刻度
pre_class (bool, 默认值 : False) – 在多标签是否画每个类别的图
fill_opacity (float, 默认值 : 0.2) – 填充区域alpha 偏移值
line_opacity (float, 默认值 : 0.8) – 线条偏移值
if_fitted (bool or str, 默认值 : auto) – 学习器是否进行fit
force_model (bool, 默认值 : False) –
kwargs (dict) – 参数字典

Examples

>>> from sklearn.model_selection import train_test_split
>>> from sklearn.svm import LinearSVC
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
>>> viz = PrecisionRecallPlot(LinearSVC())
>>> viz.fit(X_train, y_train)
>>> viz.score(X_test, y_test)
>>> viz.show()

draw()

连接到matplotlib接口，并创建以图形或轴的形式对可视化工具进行训练的数据

Parameters: kwargs (dict) – 通用的关键词字典

finalize(): 修改轴信息

fit(X, y=None, **kwargs)

重构模型fit过程，继承于sklearn的base类

Parameters

X (Dataframe) – 训练数据
y (list) – 训练数据对应的标签
kwargs (dict) – 参数字典

score(X, y, **kwargs)

Parameters

X (Dataframe) – 训练数据
y (list) – 训练数据对应的标签
kwargs (dict) – 参数字典

class bm.visual.model_visual.PredictErrorPlot(estimator, ax=None, classes=None, encoder=None, is_fitted='auto', force_model=False, picture_name='predict_error', save_file=False, **kwargs)

Bases: ClassificationVisualizer

预测错误可视化，各个类别预测错误的统计

Parameters

estimator (estimator) – 学习器
ax (轴，默认值: None) –
classes (list or str, 默认值:None) – 类别
encoder (dict or LabelEncoder, 默认值: None) – 标签编码器
is_fitted (bool or str, 默认值 : "auto") – 是否进行过fit
force_model (bool, 默认值 : False) –
kwargs (dict) – 参数字典

draw()

Renders the class prediction error across the axis.

Returns: ax – The axes on which the figure is plotted
Return type: Matplotlib Axes

finalize(**kwargs): 修改图片信息

score(X, y, **kwargs)

预测

Parameters

X (ndarray or DataFrame, shape(n,m)) – 一个n行m列的矩阵
y (ndarray or Series, shape(n,)) – 一个标签array

Returns

score_ – accuracy score

Return type

float

class bm.visual.model_visual.ROCAUCPlot(estimator, ax=None, micro=True, macro=True, per_class=True, binary=False, classes=None, encoder=None, is_fitted='auto', force_model=False, picture_name='roc_auc', save_file=False, **kwargs)

Bases: ClassificationVisualizer

ROC & AUC曲线图

Parameters

estimator (pipeline) – 使用的模型
ax (matplotlib 轴, 默认值: None) –
micro (bool, 默认值: True) – 微平均
macro (bool, 默认值: True) – 宏平均
per_class (bool, 默认值: True) – 值为True则绘制每个类别的ROC曲线，如果只需要宏观或微观平均曲线，则应将其设置为False
binary (bool, 默认值: False) – 二分类
classes (list of str, 默认值: None) – 标签类别
encoder (dict or LabelEncoder, 默认值:None) – 标签编码器，sklean方法
is_fitted (boolean or str , 默认值: 'auto') – 判断模型是否训练
force_model (bool, 默认值:False) – 检查模型类别
kwargs (dict) – 参数字典

Examples

>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.model_selection import train_test_split
>>> data = load_data("occupancy")
>>> features = ["temp", "relative humidity", "light", "C02", "humidity"]
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
>>> oz = ROCAUCPlot(LogisticRegression())
>>> oz.fit(X_train, y_train)
>>> oz.score(X_test, y_test)
>>> oz.show()

draw()

连接到matplotlib接口，并创建以图形或轴的形式对可视化工具进行训练的数据

Parameters: kwargs (dict) – 通用的关键词字典

finalize(**kwargs)

ROCAUC图形修改

Parameters: kwargs (dict) – 参数字典

fit(X, y=None, **kwargs)

重构模型fit过程，继承于sklearn的base类

Parameters

X (Dataframe) – 训练数据
y (list) – 训练数据对应的标签
kwargs (dict) – 参数字典

score(X, y, **kwargs)

模型训练、预测值

Parameters

X (ndarray or DataFrame, shape(n,m)) – 有m个特征的输入矩阵
y (ndarray or Series, shape(n,)) – 一维类别标签

Returns

score_ – 评估得到的值

Return type

float

class bm.visual.model_visual.RedidualsPlot(estimator, ax=None, hist=True, qqplot=False, train_color='b', test_color='g', line_color='#111111', train_alpha=0.75, test_alpha=0.75, is_fitted='auto', picture_name='redidual', save_file=False, **kwargs)

Bases: RegressionVisualizer

预测残差可视化

预测值与真实值之间的残差plot

Parameters

estimator (回归模型) – 训练好的回归模型
ax (matplotlib轴，默认值:None) –
hist ({True, False, None, 'density', 'frequency'}, 默认值: True) – 残差分布图，设置为density是密度图， frequency是频率图
qqplot ({True, False}，默认值: False) – 残差的分位数
train_color (color, 默认值: 'g') – 模型训练时的颜色
test_color (color, 默认值:'g') – 模型测试的图颜色
line_color (color, 默认值:dark grey) – 线条颜色
train_alpha (float, 默认值:0.75) – 训练数据透明度
test_alpha (float, 默认值:0.75) – 测试数据透明度
is_fitted (boolean or str , 默认值: 'auto') – 判断模型是否训练
kwargs (dict) – 参数字典

Examples

>>> from sklearn.linear_model import Ridge
>>> model = RedidualsPlot(Ridge())
>>> model.fits(X_train, y_train)
>>> model.score(X_test, y_test)
>>> model.show()

draw(y_pred, residuals, train=False, **kwargs)

根据数据绘制图形

Parameters

y_pred (ndarray) – 一维的预测值
residuals (ndarray) – 一维的残差值
train (boolean, 默认值: False) – 是否训练模式
kwargs (dict) – 参数字典

finalize(**kwargs)

图形的title等属性修改

Parameters: kwargs (dict) – 参数字典

fits(X, y, **kwargs)

Parameters

X (ndarray or DataFrame，shape(n,m)) – 输入数据
y (ndarray or Series, shape(n,)) – 输入的标签
kwargs (dict) – 参数字典

Returns

self – 对象实例

Return type

ResidualsPlot

property hax: Returns the histogram axes, creating it only on demand.

property qqax: 返回相应ax的轴

score(X, y=None, train=False, **kwargs)

生成预测值

Parameters

X (array-like) – 输入数据
y (array-like) – 输入标签
train (boolean) – 分流，训练和预测

Returns

score – 相应模式的输出

Return type

float

bm.visual.quick_visual module

对封装的可视化进行快速使用

bm.visual.quick_visual.binning_plot(bin_method, data, column, ax=None, target=None, target_value=None, num_clusters=5, max_interval=10, special_attributes=None, tree_params=None, bad_rate_plot=False, show=False, picture_name='binning_plot', save_file=True, **kwargs)

Parameters

bin_method (str, 默认值 : interpolate) – 可选项,[interpolate, quantile, distance, mixed, decision_tree, chi_square, kmeans, best_ks]
data (Dataframe) – 输入的数据
column (str) –
分箱特征列

axax, 默认值:None
自定义轴,自动设置
target (str, 默认值 : None) – 目标值(分类标签列)
target_value (str or int, 默认值 : None) – 现支持二分类(如 : bad, good)
num_clusters (int, 默认值 : 5) – 聚类簇数
max_interval (int, 默认值 : 10) – 最大间隔数
special_attributes (str, 默认值 : None) – 特殊特征名
tree_params (dict, 默认值 : None) – 决策树参数字典
bad_rate_plot (bool, 默认值 : False) – 绘制分箱bad_rate
show (bool) – 是否进行可视化展示
picture_name (str) – 图片保存路径名称
save_file (bool) – 判断是否保存图
kwargs (dict) – 参数字典

bm.visual.quick_visual.feature_importance_plot(X, ax=None, picture_name='特征重要性', save_file=True, show=False, **kwargs)

Parameters

X (Dataframe) – 筛选保留的特征
ax (ax, 默认值:None) – 自定义轴,自动设置
picture_name (str) – 保存图片时的名称
save_file (bool) – 是否进行图片保存
show (bool) – 是否进行可视化展示
kwargs (dict) – 参数字典

bm.visual.quick_visual.featurebox_plot(X, ax=None, columns=None, sub_col=None, show=False, picture_name='箱形图', save_file=True, **kwargs)

Parameters

X (Dataframe) – 输入的数据
ax (ax, 默认值:None) – 自定义轴, 自动设置
columns (list) – 特征列表
sub_col (int) – 子列
show (bool) – 是否进行可视化展示
picture_name (str) – 图片保存路径名称
save_file (bool) – 判断是否保存图
kwargs (dict) – 参数字典

bm.visual.quick_visual.featurecategory_plot(X, ax=None, columns=None, label=None, sub_col=None, show=False, picture_name='类别型特征分布图', save_file=True, **kwargs)

Parameters

X (Dataframe) – 输入的数据
ax (ax, 默认值:None) – 自定义轴,自动设置
columns (list) – 特征列表
label (str) – 目标标签名称
sub_col (int) – 子列
show (bool) – 是否进行可视化展示
picture_name (str) – 图片保存路径名称
save_file (bool) – 判断是否保存图
kwargs (dict) – 参数字典

bm.visual.quick_visual.featurecor_plot(X, ax=None, columns=None, show=False, picture_name='相关性热图', save_file=True, **kwargs)

Parameters

X (Dataframe) – 输入的数据
ax (ax, 默认值:None) – 自定义轴,自动设置
columns (list) – 特征列表
show (bool) – 是否进行可视化展示
picture_name (str) – 图片保存路径名称
save_file (bool) – 判断是否保存图
kwargs (dict) – 参数字典

bm.visual.quick_visual.featuredis_plot(X, ax=None, columns=None, label=None, sub_col=None, show=False, picture_name='数值型特征分布图', save_file=True, **kwargs)

Parameters

X (Dataframe) – 输入的数据
ax (ax, 默认值:None) – 自定义轴,自动设置
columns (list) – 特征列表
label (str) – 目标标签名称
sub_col (int) – 子列
show (bool) – 是否进行可视化展示
picture_name (str) – 图片保存路径名称
save_file (bool) – 判断是否保存图
kwargs (dict) – 参数字典

bm.visual.quick_visual.reports_plot(estimator, X_train, y_train, X_test=None, y_test=None, fit_params={}, ax=None, per_class=True, picture_name='模型报表', binary=False, classes=None, encoder=None, is_fitted='auto', force_model=False, show=False, save_file=True, **kwargs)

Parameters

estimator (pipeline) – pipeline模型
X_train (Dataframe) – 训练数据
y_train (list or ndarray) – 训练数据对应的标签
X_test (Dataframe) – 测试数据
y_test (list or ndarray) – 测试数据对应的标签
fit_params (dict) – 模型初始化参数
ax (ax, 默认值:None) – 自定义轴,自动设置
per_class (bool) – 值为True则绘制每个类别的ROC曲线，如果只需要宏观或微观平均曲线，则应将其设置为False
picture_name (str) – 保存图片时的名称
binary (bool) – 是否为二分类
classes (list) – 类别标签, 可设置为[0,1]
encoder (bool) – 是否对标签进行编码, 默认不需要
is_fitted (bool, 默认值:auto) – 是否进行训练
force_model (False) –
show (bool) – 是否进行可视化展示
save_file (bool) – 是否进行保存
kwargs (dict) – 参数字典

bm.visual.quick_visual.shap_plot(estimator, X, feature_names, ax=None, picture_name='SHAP', save_file=True, mode=None, show=False, **kwargs)

Parameters

estimator (pipeline) – pipeline模型
X (Dataframe) – 筛选后的训练数据
feature_names (list) – 筛选后的特征名称,包括categorical和numeric特征名称
ax (ax, 默认值:None) – 自定义轴,自动设置
picture_name (str) – 保存图片时的名称
save_file (bool) – 是否进行图片保存
mode (str) – 保存的可视化图类型, 可选force和summary
show (bool) – 是否进行可视化展示(当设置save_file为Ture时, show必须设为False)
kwargs (dict) – 参数字典

bm.visual.quick_visual.wiplot(binx, title, ax=None, display_iv=False, show=False, picture_name='WOE-IV', save_file=True, **kwargs)

Parameters

binx (Dataframe) – 分箱数据
title (str) – 目标标签
ax (ax, 默认值:None) – 自定义轴,自动设置
display_iv (bool) – 是否进行打印展示
show (bool) – 是否进行可视化展示
picture_name (str) – 图片保存路径名称
save_file (bool) – 判断是否保存图
kwargs (dict,) – 参数字典

bm.visual.target_visual module

class bm.visual.target_visual.FeatureCorrelationPlot(ax=None, method='pearson', labels=None, sort=False, feature_index=None, feature_names=None, color=None, picture_name='feature_correlation', save_file=False, **kwargs)

Bases: TargetVisualizer

该可视化工具计算Pearson相关系数以及特征和因变量之间的互信息。该可视化可用于特征选择。

Parameters

ax (ax，默认值:None) – 画布的轴
method (string, 默认值:"pearson") – 计算特征与标签相关性的方法，包括:pearson, mutual_info-regression, mutual_info-classification
labels (list，默认值:None) – 特征列名列表
sort (boolean，默认值:False) – 绘制图形时是否进行排序绘制
feature_index (list) – 特征在列表中的index索引
feature_names (list) – 特征名称列表
color (string) – 绘图颜色
kwargs (dict) – 参数字典

Examples

>>> viz = FeatureCorrelationPlot()
>>> viz.visual(X, y)
>>> viz.show()

draw(): 绘制特征相关度图

finalize(): 设置图形的标签和title

is_dataframe(data)

对输入的数据进行转化，使其变为DataFrame类型

Parameters: data (instance) – 输入的数据

visual(X, y, **kwargs)

计算特征与标签的相关度

Parameters

X (numpy.ndarray or DataFrame, shape(n,m)) – 一个n条数据m个特征的矩阵
y (numpy.ndarray or DataFrame, shape(n,)) – 一个n个标签的实例矩阵
kwargs (dict) – 参数字典

Returns

self

Return type

visualbase

class bm.visual.target_visual.TargetBalancedReferencePlot(ax=None, target=None, bins=4, picture_name='target_balance', save_file=False, **kwargs)

Bases: TargetVisualizer

考虑到标签存在不平衡的问题，对数据标签进行可视化分箱，各个类别标签的指向数据的建议

Parameters

ax (matplotlib轴，默认值:None) – 继承于visual_base类
target (string, 默认值:"y") – 数据集中的变量y
bins (分箱数量, 默认值:4) –
kwargs (dict) – 基类继承的参数字典

Examples

>>> visualizer = TargetBalancedReferencePlot()
>>> visualizer.visual(y)
>>> visualizer.show()

draw(y, **kwargs)

绘制分箱直方图

Parameters

y (ndarray or Series) – 一维的numpy.ndarray或Series
kwargs (dict) – 参数字典

finalize(**kwargs)

添加x轴标签并管理刻度标签，以确保其可见。

Parameters: kwargs (dict) – 通用参数字典

visual(y, **kwargs)

为图形设置y并且检查输入的数据类型

Parameters

y (ndarray or Series) – 一维的numpy.ndarray或Series
kwargs (dict) – 参数字典

class bm.visual.target_visual.TargetStatisticsPlot(ax=None, labels=None, colors=None, colormap=None, picture_name='target_statis', save_file=False, **kwargs)

Bases: TargetVisualizer

对数据中的标签进行统计，生成图形

展示存在两种模式:: 统计模式(Statistics mode):每个标签在数据中出现的频率对比模式(Compare mode):标签在测试数据和训练数据中的数量对比

Parameters

ax (ax, 默认值:None) – 图形中的轴
labels (list) – 可选项, 编码好的标签列表
colors (string) – 颜色设置
colormap (string or matplotlib cmap) –
kwargs (dict) – 可选项, 参数字典

Examples

>>> from sklearn.model_selection import train_test_split
>>> viz = TargetStatisticsPlot.visual(y)
>>> viz.show()

>>> _, _, y_train, y_test = train_test_split(X, y, test_size=0.2)
>>> viz = TargetStatisticsPlot()
>>> viz.visual(y_train, y_test)
>>> viz.show()

draw(): 确定ax轴的值以及一些设定

finalize(**kwargs)

设置图的一些参数，如title,legend等等

Parameters: kwargs (dict) – 参数字典

visual(y_train, y_test=None)

两种模式通过输入的参数个数决定:: 只输入y_train是统计模式二者都输入是对比模式

Parameters

y_train (array-like) – 一维数组，shape(n,)
y_test (array-like) – 可选项, 一维数组，shape(m,)

bm.visual.visual_base module

继承于sklearn的可视化base类

class bm.visual.visual_base.ClassificationVisualizer(estimator, ax=None, fig=None, classes=None, encoder=None, is_fitted='auto', force_model=False, **kwargs)

Bases: ScoreVisual

分类模型训练、预测可视化监控

Parameters

estimator (sklearn的estimator) – sklearn的学习器，也就是分类、回归等模型
ax (matplotlib轴，默认值:None) – 绘制图的轴
fig (matplotlib图, 默认值:None) – 绘图实例
classes (list or str, 默认值:None) – 分类类别列表
is_fitted (bool or str，默认值:"auto") –
force_model (Boolean，默认值:False) – 模型检查
kwargs (dict) – 参数字典

property class_colors_

fit(X, y=None, **kwargs)

设置数据

Parameters

X (ndarray or DataFrame， shape(n,m)) – 实例特征矩阵
y (ndarray or Series, shape(n,)) – 标签矩阵

Returns

self – estimator实例

Return type

instance

score(X, y, **kwargs)

测试评估值

Parameters

X (array-like) – 输入的测试数据
y (array-like) – 输入相应的测试标签

Returns

score – 输出值

Return type

float

class bm.visual.visual_base.FeatureVisualizer(ax=None, fig=None, **kwargs)

Bases: VisualBase, TransformerMixin

特征可视化基类

Parameters

ax (matplotlib.Axes, 默认值: None) –
fig (matplotlib Figure, 默认值: None) –
kwargs (dict) – 要传递给基本可视化工具的任何其他关键字参数。

transform(X, y=None)

父类，提供给子类进行重写

Parameters

X (array-like, shape (n_samples, n_features)) – 需要转换的特征
y (array-like, shape (n_samples,)) – 输入特征所对应的标签

Returns

X – 原始的输入特征

Return type

array-like, shape (n_samples, n_features)

class bm.visual.visual_base.ModelVisualizer(estimator, ax=None, fig=None, is_fitted='auto', **kwargs)

Bases: VisualBase, Wrapper

封装sklearn的模型工具，可视化工具作为模型对象的代理，只需代表包装的模型进行绘制。

Parameters

estimator (sklearn的estimator) – sklearn的学习器，也就是分类、回归等模型
ax (ax，默认值:None) – 绘制图的轴
fig (matplotlib, 默认值:None) – 绘图实例
is_fitted (Boolean or str，默认值:auto) – 判断是否进行模型训练、预测
kwargs (dict) – 参数字典

fit(X, y=None, **kwargs)

Parameters

X (Dataframe) – 输入的数据
y (ndarray or list) – 对应的标签
kwargs (dict) – 参数字典

get_params(deep=True)

Parameters: deep (bool, 默认: True) –

set_params(**params)

Parameters: params (dict) – 参数字典

class bm.visual.visual_base.RegressionVisualizer(estimator, ax=None, fig=None, force_model=False, **kwargs)

Bases: ScoreVisual

回归模型基类

包装回归模型，以在调用评分方法时生成可视化，通常允许用户有效地比较模型之间的性能。

Parameters

estimator (sklearn的estimator) – sklearn的学习器，也就是分类、回归等模型
ax (ax，默认值:None) – 绘制图的轴
fig (matplotlib, 默认值:None) – 绘图实例
force_model (Boolean，默认值:False) – 模型检查
kwargs (dict) – 参数字典

score(X, y, **kwargs)

测试评估值

Parameters

X (array-like) – 输入的测试数据
y (array-like) – 输入相应的测试标签

Returns

score – 输出值

Return type

float

class bm.visual.visual_base.ScoreVisual(estimator, ax=None, fig=None, is_fitted='auto', **kwargs)

Bases: ModelVisualizer

返回模型预测性能

Parameters

model (sklearn的estimator) – sklearn的学习器，也就是分类、回归等模型
ax (matplotlib轴，默认值:None) – 绘制图的轴
fig (matplotlib图, 默认值:None) – 绘图实例
is_fitted (Boolean or str，默认值:auto) – 判断是否进行模型训练、预测
kwargs (dict) – 参数字典

score(X, y, **kwargs)

class bm.visual.visual_base.TargetVisualizer(ax=None, fig=None, **kwargs)

Bases: VisualBase

标签可视化基类

Parameters

ax (matplotlib Axes, default: None) – 标签轴
fig (matplotlib Figure, default: None) – 标签画布
kwargs (dict) – 一些必要的参数，继承于sklearn

label_encoder(y): 标签编码

class bm.visual.visual_base.VisualBase(ax=None, fig=None, **kwargs)

Bases: BaseEstimator

定义使用matplotlib创建、存储以及可视化展示的基类。继承于sklearn的BaseEstimator类。主要是定义可视化的数据输入规范等作用。

Parameters

ax (matplotlib的轴，默认值:None) – 绘制图形的轴。如果在当前轴中没有传递将使用(或者根据需要生成)。
fig (matplotlib初始化绘制图，默认值:None) – 通过初始化图绘制可视化的图形，如果没有传参则会使用(或者根据需要生成)。
kwargs (dict) – 绘图所需要的关键参数

property ax

draw(**kwargs)

连接到matplotlib接口，并创建以图形或轴的形式对可视化工具进行训练的数据

Parameters: kwargs (dict) – 通用的关键词字典

property fig

finalize()

返回轴的装饰器

Parameters: kwargs (dict) – 通用的关键词字典

set_title(title=None)

设置当前轴的标题

Parameters: title (string, 默认值: None) – 增加图形的标题

show(outpath=None, clear_figure=False, **kwargs)

图形展示方法

Parameters

outpath (string, 默认值: None) – 图形保存路径
clear_figure (Boolean, 默认值: False) – 如果为True，保存到文件或显示在屏幕上后清除图形。
kwargs (dict) – 通用的关键词字典

Notes

property size

vis(X, y=None, **kwargs)

可视化的主要入口，方便后续继承重写

Parameters

X (ndarray or DataFrame, shape(n,m)) – 输入的DataFrame或者numpy.ndarray类型的数据
y (ndarray or Series, shape(n,)) – 输入的类型为numpy.ndarray或者Series的类别标签
kwargs (dict) – 继承于sklean的一些必要参数

Returns

self – 返回基类以此来支持后续的pipelines

Return type

VisualBase

bm.visual.visual_utils module

exception bm.visual.visual_utils.BrickError

Bases: Exception

The root exception for all yellowbrick related errors.

class bm.visual.visual_utils.ColorPalette(name_or_list)

Bases: list

A wrapper for functionality surrounding a list of colors, including a context manager that allows the palette to be set with a with statement.

as_hex(): Return a color palette with hex codes instead of RGB values.

as_rgb(): Return a color palette with RGB values instead of hex codes.

plot(size=1)

Plot the values in the color palette as a horizontal array. See Seaborn’s palplot function for inspiration.

Parameters: size (int) – scaling factor for size of the plot

class bm.visual.visual_utils.ContribEstimator(estimator, estimator_type=None)

Bases: object

包装器

exception bm.visual.visual_utils.ModelError

Bases: BrickError

A problem when interacting with sklearn or the ML framework.

exception bm.visual.visual_utils.NotFitted

Bases: ModelError

An action was called that requires a fitted model.

classmethod from_estimator(estimator, method=None)

class bm.visual.visual_utils.Wrapper(obj)

Bases: object

对象包装类

提供getatter方法获取对象方法

Parameters: obj (object) – 需要进行包装的object对象

bm.visual.visual_utils.bar_stack(data, ax=None, labels=None, ticks=None, colors=None, colormap=None, orientation='vertical', legend=True, legend_kws=None, **kwargs)

An advanced bar chart plotting utility that can draw bar and stacked bar charts from data, wrapping calls to the specified matplotlib.Axes object.

Parameters

data (2D array-like) – The data passed to the Visualizer. Rows represent each stack in the bar chart and columns represent each bar. Therefore, a single bar chart is created by passing a 2D array containing a single row, while the data to create a bar chart with 3 stacks would have a shape of (3, b).
ax (matplotlib.Axes, default: None) – The axes object to draw the barplot on, uses plt.gca() if not specified.
labels (list of str, default: None) – The labels for each row in the bar stack, used to create a legend.
ticks (list of str, default: None) – The labels for each bar, added to the x-axis for a vertical plot, or the y-axis for a horizontal plot.
colors (array-like, default: None) – Specify the colors of each bar, each row in the stack, or every segment.
colormap (string or matplotlib cmap) – Specify a colormap for each bar, each row in the stack, or every segment.
orientation (‘vertical’ or ‘horizontal’) – Specifies a horizontal or vertical bar chart.
legend (boolean, default: True) – If True, the function add a legend with the plot
legend_kws (dict, default: None) – Additional keyword arguments for the legend components.
kwargs (dict) – Additional keyword arguments to pass to ax.bar.

bm.visual.visual_utils.check_fitted(estimator, is_fitted_by='auto', **kwargs)

Parameters

estimator (sklearn.Estimator) – 模型
is_fitted_by (bool or str, default: 'auto') –

kwargsdict: 参数字典

Returns: is_fitted – Whether or not the model is already fitted
Return type: bool

bm.visual.visual_utils.color_palette(palette=None, n_colors=None)

Return a color palette object with color definition and handling.

Calling this function with palette=None will return the current matplotlib color cycle.

This function can also be used in a with statement to temporarily set the color cycle for a plot or set of plots.

Parameters

palette (None or str or sequence) –
Name of a palette or None to return the current palette. If a sequence the input colors are used but possibly cycled.

Available palette names from yellowbrick.colors.palettes are:
accent

dark

paired

pastel

bold
muted

colorblind

sns_colorblind

sns_deep

sns_muted
sns_pastel

sns_bright

sns_dark

flatui

neural_paint
n_colors (None or int) – Number of colors in the palette. If None, the default will depend on how palette is specified. Named palettes default to 6 colors which allow the use of the names “bgrmyck”, though others do have more or less colors; therefore reducing the size of the list can only be done by specifying this parameter. Asking for more colors than exist in the palette will cause it to cycle.

Returns

list(tuple) – Returns a ColorPalette object, which behaves like a list, but can be used as a context manager and possesses functions to convert colors.
.. seealso:: –

set_palette()
Set the default color cycle for all plots.

set_color_codes()
Reassign color codes like "b", "g", etc. to colors from one of the yellowbrick palettes.

colors.resolve_colors()
Resolve a color map or listed sequence of colors.

bm.visual.visual_utils.color_sequence(palette=None, n_colors=None)

Return a ListedColormap object from a named sequence palette. Useful for continuous color scheme values and color maps.

Calling this function with palette=None will return the default color sequence: Color Brewer RdBu.

Parameters

palette (None or str or sequence) –

Name of a palette or None to return the default palette. If a sequence the input colors are used to create a ListedColormap.

The currently implemented color sequences are from Color Brewer.

Available palette names from yellowbrick.colors.palettes are:

py:const

Blues
py:const

BrBG
py:const

BuGn
py:const

BuPu
py:const

GnBu
py:const

Greens
py:const

Greys
py:const

OrRd
py:const

Oranges
py:const

PRGn

py:const

PiYG
py:const

PuBu
py:const

PuBuGn
py:const

PuOr
py:const

PuRd
py:const

Purples
py:const

RdBu
py:const

RdGy
py:const

RdPu

py:const

RdYlBu
py:const

RdYlGn
py:const

Reds
py:const

Spectral
py:const

YlGn
py:const

YlGnBu
py:const

YlOrBr
py:const

YlOrRd
py:const

ddl_heat

n_colors (None or int) – Number of colors in the palette. If None, the default will depend on how palette is specified - selecting the largest sequence for that palette name. Note that sequences have a minimum lenght of 3 - if a number of colors is specified that is not available for the sequence a ValueError is raised.

Returns

Returns a ListedColormap object, an artist object from the matplotlib library that can be used wherever a colormap is necessary.

Return type

colormap

bm.visual.visual_utils.div_safe(numerator, denominator)

Ufunc-extension that returns 0 instead of nan when dividing numpy arrays

Parameters

numerator (array-like) –
denominator (scalar or array-like that can be validly divided by the numerator) –
array (returns a numpy) –
example (div_safe( [-1, 0, 1], 0 ) == [0, 0, 0]) –

bm.visual.visual_utils.get_color_cycle(): Returns the current color cycle from matplotlib.

bm.visual.visual_utils.get_model_name(model)

获取模型的名称

Parameters: model (class or instance) – 模型对象
Returns: name – 模型的名称
Return type: string

bm.visual.visual_utils.is_classifier(estimator)

bm.visual.visual_utils.is_dataframe(obj)

Returns True if the given object is a Pandas Data Frame.

Parameters: obj (instance) – The object to test whether or not is a Pandas DataFrame.

bm.visual.visual_utils.is_estimator(model)

判断模型是否为estimator

Parameters: estimator (class or instance) –

bm.visual.visual_utils.is_fitted(estimator): 确保模型已经训练过

bm.visual.visual_utils.is_regressor(estimator)

bm.visual.visual_utils.memoized(fget)

bm.visual.visual_utils.resolve_colors(n_colors=None, colormap=None, colors=None)

Generates a list of colors based on common color arguments, for example the name of a colormap or palette or another iterable of colors. The list is then truncated (or multiplied) to the specific number of requested colors.

Parameters

n_colors (int, default: None) – Specify the length of the list of returned colors, which will either truncate or multiple the colors available. If None the length of the colors will not be modified.
colormap (str, yellowbrick.style.palettes.ColorPalette, matplotlib.cm, default: None) – The name of the matplotlib color map with which to generate colors.
colors (iterable, default: None) – A collection of colors to use specifically with the plot. Overrides colormap if both are specified.

Returns

colors – A list of colors that can be used in matplotlib plots.

Return type

list

Notes

This function was originally based on a similar function in the pandas plotting library that has been removed in the new version of the library.

bm.visual.visual_utils.set_color_codes(palette='accent')

Change how matplotlib color shorthands are interpreted.

Calling this will change how shorthand codes like “b” or “g” are interpreted by matplotlib in subsequent plots.

Parameters: palette (str) – Named yellowbrick palette to use as the source of colors.