aaanalysis.ShapModel
- class aaanalysis.ShapModel(explainer_class=<class 'shap.explainers._tree.TreeExplainer'>, explainer_kwargs=None, list_model_classes=None, list_model_kwargs=None, verbose=True, random_state=None)[source]
Bases:
objectSHAP Model class: A wrapper for SHAP (SHapley Additive exPlanations) explainers to obtain Monte Carlo estimates for feature impact [Breimann25a].
SHAP is an explainable Artificial Intelligence (AI) framework and game-theoretic approach to explain the output of any machine learning model using SHAP values. These SHAP values represent a feature’s responsibility for a change in the model output to increase of decrease a sample prediction score due to the positive or negative impact of its features, respectively.
Added in version 0.1.3.
- shap_values
2D array with Monte Carlo estimates of SHAP values obtained by SHAP explainer models averaged across all rounds, feature selections, and trained models from list_model_classes.
- Type:
array-like, shape(n_samples, n_features)
- exp_value
Expected value for explaining the model output obtained by SHAP explainer model averaged across all rounds, feature selections, and trained models from list_model_classes. Typically, 0.5 for binary classification and balanced dataset.
- Type:
- Parameters:
Methods
add_feat_impact([df_feat, drop, ...])Compute SHAP feature impact (or importance) from SHAP values and add to the feature DataFrame.
add_sample_mean_dif(X[, labels, label_ref, ...])Compute the feature value difference between selected samples and a reference group average.
eval([shap_values, is_selected])Evaluate convergence of the Monte Carlo SHAP-value estimates over rounds.
fit(X[, labels, label_target_class, ...])Obtain SHAP values aggregated across prediction models and training rounds.
- __init__(explainer_class=<class 'shap.explainers._tree.TreeExplainer'>, explainer_kwargs=None, list_model_classes=None, list_model_kwargs=None, verbose=True, random_state=None)[source]
- Parameters:
explainer_class (model, default=TreeExplainer) – The SHAP Explainer model. Must be one of the following:
shap.TreeExplainer,shap.LinearExplainer,shap.KernelExplainer,shap.DeepExplainer,shap.GradientExplainer.explainer_kwargs (dict, default={model_output='probability'}) – Keyword arguments for the explainer class model.
list_model_classes (list of Type[BaseEstimator], default=[RandomForestClassifier, ExtraTreesClassifier]) – A list of prediction model classes used to obtain SHAP values.
list_model_kwargs (list of dict, optional) – A list of dictionaries containing keyword arguments for each model in list_model_classes.
verbose (bool, default=True) – If
True, verbose outputs are enabled.random_state (int, optional) – The seed used by the random number generator. If a positive integer, results of stochastic processes are consistent, enabling reproducibility. If
None, stochastic processes will be truly random.
Notes
All attributes are set during fitting via the
ShapModel.fit()method and can be directly accessed.The Explainer models should be provided from the SHAP package
SHAP model fitting messages appear in red and are not controlled by
verbose, unlike AAanalysis progress messages in blue.The selection of the SHAP explainer must align with the machine learning models used. Following explainer model types are allowed:
shap.TreeExplainer: Ideal for tree-based models (by default, random forests and extra trees; further recommended are XGBoost and CatBoost). Efficient in computing SHAP values by leveraging the tree structure.shap.LinearExplainer: Suited for linear models (e.g., logistic regression, linear regression). Computes SHAP values directly from model coefficients.shap.KernelExplainer: Model-agnostic, works with any model type. Uses weighted linear regression to approximate SHAP values. Versatile but less computationally efficient, which can be increased by a background dataset.shap.DeepExplainer: Designed for deep learning models (e.g., models from TensorFlow, Keras). Approximates SHAP values by analyzing neuron groups, suitable for complex networks.shap.GradientExplainer: Also for deep learning, but uses expected gradients. Effective for models with differentiable components.
Proper explainer choice is key for accurate model explanations.
By default,
shap.TreeExplaineris used with random forest, extra trees, and gradient boosting models.
See also
sklearn.ensemble.RandomForestClassifierfor random forest model.sklearn.ensemble.ExtraTreesClassifierfor extra trees model.ShapModel.add_feat_impact()for details on feature impact and SHAP value-based feature importance.
Warning
This class requires SHAP, which is automatically installed via pip install aaanalysis[pro].
Examples
The
ShapModelobject can be instantiated without providing any parameter:import aaanalysis as aa sm = aa.ShapModel()
Two types of models can be provided:
Shap Explainer model: Using the
explainer_classparameter, you can select one SHAP explainer model.Prediction models: One or more prediction models (machine learning or deep learning models) can be provided via the
list_model_classparameter. The models must align with the chosen SHAP explainer. For example, theTreeExplaineris used by default with three tree-based machine learning models, such as random forest:
import shap from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier # Create explainer and list of models (default) explainer_class = shap.TreeExplainer list_model_classes = [RandomForestClassifier, ExtraTreesClassifier] sm = aa.ShapModel(explainer_class=explainer_class, list_model_classes=list_model_classes)Parameters can be provided to the explainer model using the
explainer_kwargsparameter:# Use probability output for SHAP values (default) explainer_kwargs = dict(model_output="probability") sm = aa.ShapModel(explainer_class=explainer_class, explainer_kwargs=explainer_kwargs)
To provide arguments to the prediction models, you should create a size-matching list of kwargs dictionaries called
list_model_kwargs:# Create non-default kwargs for tree-based models list_model_classes = [RandomForestClassifier, RandomForestClassifier] list_model_kwargs = [{"n_estimators": 64, "max_depth": 4}, {"n_estimators": 32, "max_depth": 3}] # Explainer does not have to change since TreeExplainer is default sm = aa.ShapModel(list_model_classes=list_model_classes, list_model_kwargs=list_model_kwargs)If non-tree-based model type is provided,
explainer_classmust be adjustedfrom sklearn.linear_model import LogisticRegression, LinearRegression from sklearn.svm import SVC, SVR # Use LinearExplainer for linear models sm = aa.ShapModel(explainer_class=shap.LinearExplainer, list_model_classes=[LogisticRegression, LinearRegression]) # Use KernelExplainer for any model type sm = aa.ShapModel(explainer_class=shap.KernelExplainer, list_model_classes=[SVC])
The
KernelExplaineris a model-agnostic method (i.e., it works with any prediction model), but it is computationally expensive. In contrast, theTreeExplainerand theLinearExplainerare optimized for tree-based and linear model types, respectively, and are ,therefore, more efficient.You can moreover adjust the verbosity mode via
verbose(default=True) and set a random state sed for the prediction models usingrandom_state(default=None):sm = aa.ShapModel(verbose=False, random_state=42)