aaanalysis.CPPPlot
- class aaanalysis.CPPPlot(df_scales=None, df_cat=None, jmd_n_len=10, jmd_c_len=10, accept_gaps=False, verbose=True)[source]
Bases:
objectPlotting class for
CPP(Comparative Physicochemical Profiling) results [Breimann25a].This class supports multiple plot types for group or sample-level analysis, including ranking plots, profiles, heatmaps, and feature maps.
Added in version 0.1.2.
- Parameters:
Methods
eval([df_eval, figsize, dict_xlims, legend, ...])Plot evaluation output of CPP comparing multiple sets of identified feature sets.
feature([feature, df_seq, labels, ...])Plot distributions of CPP feature values for test and reference datasets highlighting their mean difference.
feature_map([df_feat, col_cat, col_val, ...])Plot CPP feature map showing feature value mean difference and feature importance per scale subcategory (y-axis) and residue position (x-axis).
heatmap([df_feat, shap_plot, col_cat, ...])Plot a CPP/-SHAP heatmap showing the feature value mean difference/feature impact per scale subcategory (y-axis) and residue position (x-axis).
profile([df_feat, shap_plot, col_imp, ...])Plot CPP/-SHAP profile showing feature importance/impact per residue position.
ranking([df_feat, shap_plot, col_dif, ...])Plot CPP/-SHAP feature ranking based on feature importance or sample-specif feature impact.
update_seq_size([ax, fig, max_x_dist, ...])Update the font size of the sequence characters to prevent overlap.
- __init__(df_scales=None, df_cat=None, jmd_n_len=10, jmd_c_len=10, accept_gaps=False, verbose=True)[source]
- Parameters:
df_scales (pd.DataFrame, shape (n_letters, n_scales), optional) – DataFrame of scales with letters typically representing amino acids. Default from
load_scales()unless specified inoptions['df_scales'].df_cat (pd.DataFrame, shape (n_scales, n_scales_info), optional) – DataFrame of categories for physicochemical scales. Must contain all scales from
df_scales. Default fromload_scales()withname='scales_cat', unless specified inoptions['df_cat'].jmd_n_len (int, default=10) – Length of JMD-N (>=0).
jmd_c_len (int, default=10) – Length of JMD-C (>=0).
accept_gaps (bool, default=False) – Whether to accept missing values by enabling omitting for computations (if
True).verbose (bool, default=True) – If
True, verbose outputs are enabled.
Notes
Several methods provide the
shap_plotparameter, which allows to specify whether a plot visualizes the results of the group-level CPP analysis or the sample-level CPP-SHAP analysis (ifshap_plot=True).CPP Analysis: Group-level analysis of the most discriminant features between a test and a reference group. The overall results are visualized by the
CPPPlot.feature_map(), revealing the characteristic physicochemical signature of the test group.CPP-SHAP Analysis: Sample-level analysis of the CPP feature impact with single-residue resolution.
The methods showing the CPP and CPP-SHAP analysis results are as follows:
CPPPlot.ranking(): the ‘CPP/-SHAP ranking plot’ shows the top n ranked features, their feature value differences, and feature importance/impact.CPPPlot.profile(): the ‘CPP/-SHAP profile’ shows the cumulative feature importance/impact per residue position.CPPPlot.heatmap(): the ‘CPP/-SHAP heatmap’ shows the feature value difference or feature impact per residue position (x-axis) and scale subcategory (y-axis).
See also
CPP: the respective computation class for the CPP Analysis.ShapModel: the class combining CPP with the SHAP explainable Artificial Intelligence (AI) framework.Anatomy of a figure matplotlib guide on figure elements.
Examples
The
CPPPlotobject offers various visualizations of the feature obtainedCPPand can be instantiated without setting any parameter:import aaanalysis as aa cpp_plot = aa.CPPPlot()
If the used
Scalesare deviating from the AAanalysis default ones, you should provide matchingdf_scalesanddf_catDataFrames:import pandas as pd df_scales = pd.DataFrame({"Scale 1": [1, 1, 2]}, index=["A", "B", "C"]) df_cat = pd.DataFrame({"scale_id": ["Scale 1"], "category": ["Category 1"], "subcategory": ["subcategory 1"], "scale_name": ["Scale_name 1"]}) cpp_plot = aa.CPPPlot(df_scales=df_scales, df_cat=df_cat)Adjust the length of the N- and C-terminal JMDs by using
jmd_n_lenandjmd_c_lenparameters:cpp_plot = aa.CPPPlot(jmd_c_len=0, jmd_n_len=20)