aaanalysis.TreeModel
- class aaanalysis.TreeModel(list_model_classes=None, list_model_kwargs=None, is_preselected=None, verbose=True, random_state=None)[source]
Bases:
objectTree Model class: A wrapper for tree-based models to obtain Monte Carlo estimates of feature importance and predictions [Breimann25a].
Monte Carlo estimates are derived by averaging feature importance or prediction probabilities across various tree-based models and training rounds, enhancing the robustness and reproducibility of these estimates. Additionally, the class supports feature selection through recursive feature elimination (RFE) and offers comprehensive evaluation of feature selections.
Added in version 0.1.3.
- list_models_
List with fitted tree-based models for every round after calling the
fitmethod.- Type:
Nested list with objects, shape (n_rounds, n_models)
- feat_importance
An array containing importance of each feature averaged across all rounds and trained models from list_model_classes.
- Type:
array-like, shape (n_features)
- feat_importance_std
An array containing standard deviation for feature importance across all rounds and trained models from list_model_classes. Same order as
feature_importance.- Type:
array-like, shape (n_features)
- is_selected_
2D array indicating features being selected by recursive features selection (True) or not (False) for each round. Same order as
feature_importance.- Type:
array-like, shape (n_rounds, n_features)
- Parameters:
Methods
add_feat_importance([df_feat, drop])Include feature importance and its standard deviation to feature DataFrame.
eval(X[, labels, list_is_selected, ...])Evaluate the prediction performance for different feature selections.
fit(X[, labels, n_rounds, use_rfe, n_cv, ...])Fit tree-based models and compute average feature importance [Breimann25a].
Obtain Monte Carlo estimate of class prediction probabilities for the positive class in X.
- __init__(list_model_classes=None, list_model_kwargs=None, is_preselected=None, verbose=True, random_state=None)[source]
- Parameters:
list_model_classes (list of Type[ClassifierMixin or BaseEstimator], default=[RandomForestClassifier, ExtraTreesClassifier]) – A list of tree-based model classes to be used for feature importance analysis.
list_model_kwargs (list of dict, optional) – A list of dictionaries containing keyword arguments for each model in list_model_classes.
is_preselected (array-like, shape (n_features)) – Boolean array indicating features being preselected before applying recursive features selection.
Trueindicates that a feature is preselected andFalsethat it is not.verbose (bool, default=True) – If
True, verbose outputs are enabled.random_state (int, optional) – The seed used by the random number generator. If a positive integer, results of stochastic processes are consistent, enabling reproducibility. If
None, stochastic processes will be truly random.
Notes
All attributes are set during fitting via the
TreeModel.fit()method and can be directly accessed.
See also
sklearn.ensemble.RandomForestClassifierfor random forest model.sklearn.ensemble.ExtraTreesClassifierfor extra trees model.
Warning
This class belongs to the explainable AI module requiring SHAP, which is automatically installed via pip install aaanalysis[pro].
Examples
The
TreeModelobject can be instantiated without providing any parameter:import aaanalysis as aa tm = aa.TreeModel()
You can provide a list of tree-based models and their respective arguments using the
list_model_classesandlist_model_kwargsparameters:from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier, GradientBoostingClassifier # Classes used as default list_model_classes = [RandomForestClassifier, ExtraTreesClassifier, GradientBoostingClassifier] print("Default model arguments: ", tm._list_model_kwargs) # Adjust default parameters list_model_kwargs = [dict(n_estimators=64)] * 3 tm = aa.TreeModel(list_model_classes=list_model_classes, list_model_kwargs=list_model_kwargs) print("New model arguments: ", tm._list_model_kwargs)Default model arguments: [{'random_state': None}, {'random_state': None}] New model arguments: [{'n_estimators': 64, 'random_state': None}, {'n_estimators': 64, 'random_state': None}, {'n_estimators': 64, 'random_state': None}]
You can set the
random_stateandverboseparameters:# Set random sed and disable verbosity tm = aa.TreeModel(random_state=42, verbose=False) print("New model arguments: ", tm._list_model_kwargs)New model arguments: [{'random_state': 42}, {'random_state': 42}]
You compare different feature pre-filtering strategies by utilizing the
is_preselectedparameter, which we will demonstrate using theDOM_GSECexample dataset and its respective feature set (see [Breimann25a]):import numpy as np aa.options["verbose"] = False # Disable verbosity df_seq = aa.load_dataset(name="DOM_GSEC") labels = df_seq["label"].to_list() df_feat = aa.load_features(name="DOM_GSEC").head(100) # Create feature matrix sf = aa.SequenceFeature() df_parts = sf.get_df_parts(df_seq=df_seq) X = sf.feature_matrix(features=df_feat["feature"], df_parts=df_parts) # Pre-select top 10 and top 50 features mask_top10 = np.asarray(df_feat.index < 10) mask_top50 = np.asarray(df_feat.index < 50)
We can now compare the prediction performance for these preselected feature sets using the
TreeModel().eval()method:df_eval = tm.eval(X, labels=labels, list_is_selected=[np.array([mask_top10]), np.array([mask_top50])]) aa.display_df(df_eval)
name accuracy precision recall f1 1 Set 1 0.762200 0.769900 0.769200 0.762600 2 Set 2 0.842200 0.838600 0.875000 0.849000