ShapModel

class ShapModel(explainer_class=<class 'shap.explainers._tree.TreeExplainer'>, explainer_kwargs=None, list_model_classes=None, list_model_kwargs=None, verbose=True, random_state=None)[source]

Bases: Wrapper

SHAP Model class ([pro], requires aaanalysis[pro]): A wrapper for SHAP (SHapley Additive exPlanations) [Lundberg20] explainers to obtain Monte Carlo estimates for feature impact [Breimann25].

As a Wrapper, it implements the .fit / .eval model contract.

SHAP is an explainable Artificial Intelligence (AI) framework and game-theoretic approach to explain the output of any machine learning model using SHAP values. These SHAP values represent a feature’s responsibility for a change in the model output to increase or decrease a sample prediction score due to the positive or negative impact of its features, respectively.

Added in version 0.1.3.

shap_values

2D array with Monte Carlo estimates of SHAP values obtained by SHAP explainer models averaged across all rounds, feature selections, and trained models from list_model_classes.

Type:: array-like, shape(n_samples, n_features)

exp_value

Expected value for explaining the model output obtained by SHAP explainer model averaged across all rounds, feature selections, and trained models from list_model_classes. Typically, 0.5 for binary classification and balanced dataset.

Type:: float

Parameters:

explainer_class (Callable)
explainer_kwargs (Optional[dict])
list_model_classes (Optional[List[Type[BaseEstimator]]])
list_model_kwargs (Optional[List[dict]])
verbose (bool)
random_state (Optional[int])

Methods

`add_feat_impact`(df_feat[, drop, samples, ...])	Compute SHapley Additive exPlanations (SHAP) feature impact (or importance) from SHAP values and add to the feature DataFrame.
`add_sample_mean_dif`(X, labels[, label_ref, ...])	Compute the feature value difference between selected samples and a reference group average.
`eval`([shap_values, is_selected])	Evaluate convergence of the Monte Carlo SHAP-value estimates over rounds.
`fit`(X, labels[, label_target_class, ...])	Obtain SHapley Additive exPlanations (SHAP) values aggregated across prediction models and training rounds.

__init__(explainer_class=<class 'shap.explainers._tree.TreeExplainer'>, explainer_kwargs=None, list_model_classes=None, list_model_kwargs=None, verbose=True, random_state=None)[source]

Parameters:

explainer_class (model, default=TreeExplainer) – The SHAP Explainer model. Must be one of the following: shap.TreeExplainer, shap.LinearExplainer, shap.KernelExplainer, shap.DeepExplainer, shap.GradientExplainer.
explainer_kwargs (dict, optional) – Keyword arguments for the explainer class. Defaults to None (no extra arguments); passing explainer_class=None selects shap.TreeExplainer with {'model_output': 'probability'}.
list_model_classes (list of Type[BaseEstimator], default=[RandomForestClassifier, ExtraTreesClassifier]) – A list of prediction model classes used to obtain SHapley Additive exPlanations (SHAP) values.
list_model_kwargs (list of dict, optional) – A list of dictionaries containing keyword arguments for each model in list_model_classes.
verbose (bool, default=True) – If True, verbose outputs are enabled.
random_state (int, optional) – The seed used by the random number generator. If a positive integer, results of stochastic processes are consistent, enabling reproducibility. If None, stochastic processes will be truly random. For fuzzy_aggregation='interpolate' it is the initial seed and each round re-seeds with random_state + round (see ShapModel.fit() Notes).

Notes

All attributes are set during fitting via the ShapModel.fit() method and can be directly accessed.
The Explainer models should be provided from the SHAP package
SHAP model fitting messages appear in red and are not controlled by verbose, unlike AAanalysis progress messages in blue.
The selection of the SHAP explainer must align with the machine learning models used. Following explainer model types are allowed:
- shap.TreeExplainer: Ideal for tree-based models (by default, random forests and extra trees; further recommended are XGBoost and CatBoost). Efficient in computing SHAP values by leveraging the tree structure.
- shap.LinearExplainer: Suited for linear models (e.g., logistic regression, linear regression). Computes SHAP values directly from model coefficients.
- shap.KernelExplainer: Model-agnostic, works with any model type. Uses weighted linear regression to approximate SHAP values. Versatile but less computationally efficient, which can be increased by a background dataset.
- shap.DeepExplainer: Designed for deep learning models (e.g., models from TensorFlow, Keras). Approximates SHAP values by analyzing neuron groups, suitable for complex networks.
- shap.GradientExplainer: Also for deep learning, but uses expected gradients. Effective for models with differentiable components.
Proper explainer choice is key for accurate model explanations.
By default, shap.TreeExplainer is used with random forest and extra trees models.

See also

sklearn.ensemble.RandomForestClassifier for random forest model.
sklearn.ensemble.ExtraTreesClassifier for extra trees model.
ShapModel.add_feat_impact() for details on feature impact and SHAP value-based feature importance.

Warning

This class requires SHAP, which is automatically installed via pip install aaanalysis[pro].

Examples

The ShapModel object can be instantiated without providing any parameter:

import aaanalysis as aa
sm = aa.ShapModel()

Two types of models can be provided:

Shap Explainer model: Using the explainer_class parameter, you can select one SHAP explainer model.
Prediction models: One or more prediction models (machine learning or deep learning models) can be provided via the list_model_class parameter. The models must align with the chosen SHAP explainer. For example, the TreeExplainer is used by default with three tree-based machine learning models, such as random forest:

import shap
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier

# Create explainer and list of models (default)
explainer_class = shap.TreeExplainer
list_model_classes = [RandomForestClassifier, ExtraTreesClassifier]

sm = aa.ShapModel(explainer_class=explainer_class,
                      list_model_classes=list_model_classes)

Parameters can be provided to the explainer model using the explainer_kwargs parameter:

# Use probability output for SHAP values (default)
explainer_kwargs = dict(model_output="probability")

sm = aa.ShapModel(explainer_class=explainer_class, explainer_kwargs=explainer_kwargs)

To provide arguments to the prediction models, you should create a size-matching list of kwargs dictionaries called list_model_kwargs:

# Create non-default kwargs for tree-based models
list_model_classes = [RandomForestClassifier, RandomForestClassifier]
list_model_kwargs = [{"n_estimators": 64, "max_depth": 4}, {"n_estimators": 32, "max_depth": 3}]

# Explainer does not have to change since TreeExplainer is default
sm = aa.ShapModel(list_model_classes=list_model_classes, list_model_kwargs=list_model_kwargs)

If non-tree-based model type is provided, explainer_class must be adjusted

from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.svm import SVC, SVR

# Use LinearExplainer for linear models
sm = aa.ShapModel(explainer_class=shap.LinearExplainer, list_model_classes=[LogisticRegression, LinearRegression])

# Use KernelExplainer for any model type
sm = aa.ShapModel(explainer_class=shap.KernelExplainer, list_model_classes=[SVC])

The KernelExplainer is a model-agnostic method (i.e., it works with any prediction model), but it is computationally expensive. In contrast, the TreeExplainer and the LinearExplainer are optimized for tree-based and linear model types, respectively, and are ,therefore, more efficient.

You can moreover adjust the verbosity mode via verbose (default=True) and set a random state sed for the prediction models using random_state (default=None):

sm = aa.ShapModel(verbose=False, random_state=42)