aaanalysis.ShapModel

class aaanalysis.ShapModel(explainer_class=<class 'shap.explainers._tree.TreeExplainer'>, explainer_kwargs=None, list_model_classes=None, list_model_kwargs=None, verbose=True, random_state=None)[source]

Bases: object

SHAP Model class: A wrapper for SHAP (SHapley Additive exPlanations) explainers to obtain Monte Carlo estimates for feature impact [Breimann25a].

SHAP is an explainable Artificial Intelligence (AI) framework and game-theoretic approach to explain the output of any machine learning model using SHAP values. These SHAP values represent a feature’s responsibility for a change in the model output to increase of decrease a sample prediction score due to the positive or negative impact of its features, respectively.

Added in version 0.1.3.

shap_values

2D array with Monte Carlo estimates of SHAP values obtained by SHAP explainer models averaged across all rounds, feature selections, and trained models from list_model_classes.

Type:

array-like, shape(n_samples, n_features)

exp_value

Expected value for explaining the model output obtained by SHAP explainer model averaged across all rounds, feature selections, and trained models from list_model_classes. Typically, 0.5 for binary classification and balanced dataset.

Type:

int

Parameters:

Methods

add_feat_impact([df_feat, drop, ...])

Compute SHAP feature impact (or importance) from SHAP values and add to the feature DataFrame.

add_sample_mean_dif(X[, labels, label_ref, ...])

Compute the feature value difference between selected samples and a reference group average.

eval([shap_values, is_selected])

Evaluate convergence of the Monte Carlo SHAP-value estimates over rounds.

fit(X[, labels, label_target_class, ...])

Obtain SHAP values aggregated across prediction models and training rounds.

__init__(explainer_class=<class 'shap.explainers._tree.TreeExplainer'>, explainer_kwargs=None, list_model_classes=None, list_model_kwargs=None, verbose=True, random_state=None)[source]
Parameters:
  • explainer_class (model, default=TreeExplainer) – The SHAP Explainer model. Must be one of the following: shap.TreeExplainer, shap.LinearExplainer, shap.KernelExplainer, shap.DeepExplainer, shap.GradientExplainer.

  • explainer_kwargs (dict, default={model_output='probability'}) – Keyword arguments for the explainer class model.

  • list_model_classes (list of Type[BaseEstimator], default=[RandomForestClassifier, ExtraTreesClassifier]) – A list of prediction model classes used to obtain SHAP values.

  • list_model_kwargs (list of dict, optional) – A list of dictionaries containing keyword arguments for each model in list_model_classes.

  • verbose (bool, default=True) – If True, verbose outputs are enabled.

  • random_state (int, optional) – The seed used by the random number generator. If a positive integer, results of stochastic processes are consistent, enabling reproducibility. If None, stochastic processes will be truly random.

Notes

  • All attributes are set during fitting via the ShapModel.fit() method and can be directly accessed.

  • The Explainer models should be provided from the SHAP package

  • SHAP model fitting messages appear in red and are not controlled by verbose, unlike AAanalysis progress messages in blue.

  • The selection of the SHAP explainer must align with the machine learning models used. Following explainer model types are allowed:

    • shap.TreeExplainer: Ideal for tree-based models (by default, random forests and extra trees; further recommended are XGBoost and CatBoost). Efficient in computing SHAP values by leveraging the tree structure.

    • shap.LinearExplainer: Suited for linear models (e.g., logistic regression, linear regression). Computes SHAP values directly from model coefficients.

    • shap.KernelExplainer: Model-agnostic, works with any model type. Uses weighted linear regression to approximate SHAP values. Versatile but less computationally efficient, which can be increased by a background dataset.

    • shap.DeepExplainer: Designed for deep learning models (e.g., models from TensorFlow, Keras). Approximates SHAP values by analyzing neuron groups, suitable for complex networks.

    • shap.GradientExplainer: Also for deep learning, but uses expected gradients. Effective for models with differentiable components.

    Proper explainer choice is key for accurate model explanations.

  • By default, shap.TreeExplainer is used with random forest, extra trees, and gradient boosting models.

See also

Warning

  • This class requires SHAP, which is automatically installed via pip install aaanalysis[pro].

Examples

The ShapModel object can be instantiated without providing any parameter:

import aaanalysis as aa
sm = aa.ShapModel()

Two types of models can be provided:

  • Shap Explainer model: Using the explainer_class parameter, you can select one SHAP explainer model.

  • Prediction models: One or more prediction models (machine learning or deep learning models) can be provided via the list_model_class parameter. The models must align with the chosen SHAP explainer. For example, the TreeExplainer is used by default with three tree-based machine learning models, such as random forest:

import shap
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier

# Create explainer and list of models (default)
explainer_class = shap.TreeExplainer
list_model_classes = [RandomForestClassifier, ExtraTreesClassifier]

sm = aa.ShapModel(explainer_class=explainer_class,
                      list_model_classes=list_model_classes)

Parameters can be provided to the explainer model using the explainer_kwargs parameter:

# Use probability output for SHAP values (default)
explainer_kwargs = dict(model_output="probability")

sm = aa.ShapModel(explainer_class=explainer_class, explainer_kwargs=explainer_kwargs)

To provide arguments to the prediction models, you should create a size-matching list of kwargs dictionaries called list_model_kwargs:

# Create non-default kwargs for tree-based models
list_model_classes = [RandomForestClassifier, RandomForestClassifier]
list_model_kwargs = [{"n_estimators": 64, "max_depth": 4}, {"n_estimators": 32, "max_depth": 3}]

# Explainer does not have to change since TreeExplainer is default
sm = aa.ShapModel(list_model_classes=list_model_classes, list_model_kwargs=list_model_kwargs)

If non-tree-based model type is provided, explainer_class must be adjusted

from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.svm import SVC, SVR

# Use LinearExplainer for linear models
sm = aa.ShapModel(explainer_class=shap.LinearExplainer, list_model_classes=[LogisticRegression, LinearRegression])

# Use KernelExplainer for any model type
sm = aa.ShapModel(explainer_class=shap.KernelExplainer, list_model_classes=[SVC])

The KernelExplainer is a model-agnostic method (i.e., it works with any prediction model), but it is computationally expensive. In contrast, the TreeExplainer and the LinearExplainer are optimized for tree-based and linear model types, respectively, and are ,therefore, more efficient.

You can moreover adjust the verbosity mode via verbose (default=True) and set a random state sed for the prediction models using random_state (default=None):

sm = aa.ShapModel(verbose=False, random_state=42)