CPPPlot

class CPPPlot(df_scales=None, df_cat=None, jmd_n_len=10, jmd_c_len=10, accept_gaps=False, verbose=True)[source]

Bases: object

Plotting class for CPP (Comparative Physicochemical Profiling) results [Breimann25].

This class supports multiple plot types for group or sample-level analysis, including ranking plots, profiles, heatmaps, and feature maps.

Every plotting method returns a (fig, ax) pair (a thin tuple subclass): unpack as fig, ax = .... For backward compatibility, the returned object also forwards attribute access to ax, so legacy ax = ...; ax.set_title(...) keeps working.

Added in version 0.1.2.

Notes

The jmd_n_len and jmd_c_len values supplied at construction are stored as _jmd_n_len and _jmd_c_len and are reused by all plot methods (ranking, profile, heatmap, feature_map, update_seq_size) so that juxta middle domain (JMD) lengths are consistent across a single CPPPlot instance.
Parameters ending in _kws (e.g. cbar_kws, legend_kws) bundle related keyword arguments into one dict; see the keyword-dict parameters overview.

Parameters:

df_scales (Optional[DataFrame])
df_cat (Optional[DataFrame])
jmd_n_len (int)
jmd_c_len (int)
accept_gaps (bool)
verbose (bool)

Methods

`eval`(df_eval[, figsize, dict_xlims, legend, ...])	Plot evaluation output of Comparative Physicochemical Profiling (CPP) comparing multiple sets of identified feature sets.
`feature`(feature[, feat_rank, label_test, ...])	Plot distributions of Comparative Physicochemical Profiling (CPP) feature values for test and reference datasets highlighting their mean difference.
`feature_map`(df_feat[, shap_plot, col_cat, ...])	Plot Comparative Physicochemical Profiling (CPP) feature map showing feature value mean difference and feature importance per scale subcategory (y-axis) and residue position (x-axis).
`heatmap`(df_feat[, shap_plot, col_cat, ...])	Plot a CPP/-SHAP heatmap showing the feature value mean difference/feature impact per scale subcategory (y-axis) and residue position (x-axis).
`profile`(df_feat[, shap_plot, col_imp, ...])	Plot CPP/-SHAP profile showing feature importance/impact per residue position.
`ranking`(df_feat[, shap_plot, col_dif, ...])	Plot CPP/-SHAP feature ranking based on feature importance or sample-specific feature impact.
`update_seq_size`(ax[, fig, max_x_dist, ...])	Update the font size of the sequence characters to prevent overlap.

__init__(df_scales=None, df_cat=None, jmd_n_len=10, jmd_c_len=10, accept_gaps=False, verbose=True)[source]

Parameters:

df_scales (pd.DataFrame, shape (n_letters, n_scales), optional) – DataFrame of scales with letters typically representing amino acids. Default from load_scales() unless specified in options['df_scales'].
df_cat (pd.DataFrame, shape (n_scales, n_scales_info), optional) – DataFrame of categories for physicochemical scales. Must contain all scales from df_scales. Default from load_scales() with name='scales_cat', unless specified in options['df_cat'].
jmd_n_len (int, default=10) – Length of JMD-N (>=0).
jmd_c_len (int, default=10) – Length of JMD-C (>=0).
accept_gaps (bool, default=False) – Whether to accept missing values by enabling omitting for computations (if True).
verbose (bool, default=True) – If True, verbose outputs are enabled.

Notes

Several methods provide the shap_plot parameter, which allows to specify whether a plot visualizes the results of the group-level Comparative Physicochemical Profiling (CPP) analysis or the sample-level CPP-SHAP analysis (if shap_plot=True).

CPP Analysis: Group-level analysis of the most discriminant features between a test and a reference group. The overall results are visualized by the CPPPlot.feature_map(), revealing the characteristic physicochemical signature of the test group.
CPP-SHAP Analysis: Sample-level analysis of the CPP feature impact with single-residue resolution.

The methods showing the CPP and CPP-SHAP analysis results are as follows:

CPPPlot.ranking(): the ‘CPP/-SHAP ranking plot’ shows the top n ranked features, their feature value differences, and feature importance/impact.
CPPPlot.profile(): the ‘CPP/-SHAP profile’ shows the cumulative feature importance/impact per residue position.
CPPPlot.heatmap(): the ‘CPP/-SHAP heatmap’ shows the feature value difference or feature impact per residue position (x-axis) and scale subcategory (y-axis).

See also

CPP: the respective computation class for the CPP Analysis.
ShapModel: the class combining CPP with the SHAP explainable Artificial Intelligence (AI) framework.
Anatomy of a figure matplotlib guide on figure elements.

Examples

The CPPPlot object offers various visualizations of the feature obtained CPP and can be instantiated without setting any parameter:

import aaanalysis as aa
cpp_plot = aa.CPPPlot()

If the used Scales are deviating from the AAanalysis default ones, you should provide matching df_scales and df_cat DataFrames:

import pandas as pd
df_scales = pd.DataFrame({"Scale 1": [1, 1, 2]}, index=["A", "B", "C"])
df_cat = pd.DataFrame({"scale_id": ["Scale 1"], "category": ["Category 1"],
                       "subcategory": ["subcategory 1"], "scale_name": ["Scale_name 1"]})
cpp_plot = aa.CPPPlot(df_scales=df_scales, df_cat=df_cat)

Adjust the length of the N- and C-terminal JMDs by using jmd_n_len and jmd_c_len parameters:

cpp_plot = aa.CPPPlot(jmd_c_len=0, jmd_n_len=20)

Further parameters. CPPPlot.__init__ also accepts: accept_gaps; verbose.

# Further parameters: accept_gaps enables gap-tolerant scale lookups; verbose toggles logging
cpp_plot = aa.CPPPlot(accept_gaps=True, verbose=False)